Anderson

Computational Audition at
AFRL/HE:
Past, Present, and Future
Dr. Timothy R. Anderson

Human Effectiveness Directorate
Air Force Research Laboratory
Biologically Based Signal Processing
• research, development and applications of:

– Biologically based algorithms
– Perceptually relevant features
– Human-centered metrics and models
– to improve robustness of speech processing
systems
AWACS
Sensor-decision maker- Future JAOC

shooter Speech Command & Control
Technologies
JAOC
Chem-bio Defense 2
Combat Plans
Environment
Why Is This Area Important?
• Present signal processing systems (i.e. speech and speaker

recognition, speech coding, etc.) are not robust in adverse
military environments.
• Biological principles offer potential to provide improved
performance in military environments.
3
Biologically Based Signal Processing
Dom inant
Strong
Favorable
Tenable
W eak
Em bryonic Growth M ature Aging
Technical Challenges Approach

• Identification and modeling of features and • Develop psychoacoustic testing procedures
processes used by biological systems • Characterize key features and processes
• Incorporation of those key features and • Developed human-centered model and metrics
processes into computationally efficient • Implement computationally efficient algorithms
algorithms and structures • Provide support to operational test and
warfighting exercises to evaluate system utility
4
Research Areas
• Cockpit Speech Recognition

• Robust Speech Recognition
– Monaural Speech Recognition
– Binaural Speech Recognition
– Auditory Model Front-ends
• Speaker Recognition/Verification
– Biologically Based Speaker ID
– Channel Robustness
– Speaker Recognizability Test
5
Phoneme Classification
• Kohonen Self-Organizing Feature Map

– 16 X 16
• 10 Speaker Database (TIMIT)
• 10 sentences/speaker
• Leaving one out method (per speaker)
• Features calculated with
– 16 ms window
– 5 ms frame step
6
TRADITIONAL VS. AUDITORY
MONAURAL
Phoneme Recognition Rate

50
45
40
35
30
% 25 AIM
20 MFCC
15
10
5
0
1 5 10 15 20 32 Clean
Signal-to-Noise Ratio (dB)
7
Binaural Speech Recognition
• Past
• Present
• Future
9
• Stereausis
• Cocktail Party Processor
• BAIM
• BINAP
10
EXPERIMENT SETUP
SOUND
SOURCE
X
X NOISE
SOURCE
11
MONAURAL VS. BINAURAL
COCKTAIL PARTY PROCESSOR

50
45
40
35
30
% 25 CPP
20 MONO
15
10
5
0
1 5 10 15 20 32 Clean
12
MONAURAL VS. BINAURAL
AUDITORY IMAGE MODEL

50
45
40
35
30
% 25 BAIM
20 AIM
15
10
5
0
1 5 10 15 20 32 Clean
13
BINAURAL

50
45
40
35
30
% 25 CPP
20 BAIM
15
10
5
0
1 5 10 15 20 32 Clean
14
MONAURAL

50
45
40
35
30
AIM
% 25
20 MONO
15
10
5
0
1 5 10 15 20 32 Clean
15
BAIM VS. CPP-AIM

50
45
40
35
30 BAIM
% 25 AIM
20 CPP-AIM
15
10
5
0
1 5 10 15 20 32 Clean
16
COINCIDENCE

50
45
40
35
30
BAIM
% 25
BINAP
20
15
10
5
0
1 5 10 15 20 32 Clean
17
MONAURAL, BINAURAL AND
TRADITIONAL

50
45
40 CPP
35 BAIM
30 AIM
% 25 MONO
20 MFCC
15 BINAP
10 CPP-AIM
5
0
1 5 10 15 20 32 Clean
18
RESULTS
BINAURAL AUDITORY MODEL
PROVIDES BETTER REPRESENTATION
THAN TRADITIONAL TECHNIQUES:
TASK SPEECH RESULTS
PHONEME 7-12 dB BINAURAL

LOW TO HIGH SNR ADVANTAGE
RECOGNITION
19
• Past
• Present
– No Current Work
• Future
20
• Past
• Present
• Future
– Implement binaural ASR system
– Investigate further binaural fusion mechanisms
– Meeting room data
– Implement binaural system using AIM chips
21
Auditory Model Front Ends
• Past
• Present
• Future
22
• Tanner Research “Analog Speech Recognition”

– Implementation of AIM
– 56 channels Analog Filter bank
– Single SBUS board
– 1.5 X Real-time
23
• AFIT
– Designed Digital Implementation
• Middle ear, BMM, adaptive thresholding
– 32 channels per chip
– 300 Hz – 7 kHz
– 44.1 KHz sampling rate
– 2 chips provide 64 channels in real-time
24
• Past
• Present
– Single board system designed and prototyped - USB
– Current chip design undergoing debug
– Second fabrication run this fall
• Future
27
• Past
• Present
• Future
– Debug and verify chip fabrication
– Debug PC based real-time auditory model front end
– Implement complete end-to-end auditory ASR
– Investigate feedback mechanisms in auditory model
for ASR
28
Biologically Based SID
• Past
• Present
• Future
29
• Auditory Models Investigated

– Payton’s Auditory Model (PAM)
– Auditory Image Model (AIM)
• VQ Codebook used to model speaker
• 37 Speakers from TIMIT (dr1,2 12F 25M)
– MFCC 94%
– PAM 67%
– AIM 91%
30
• Past
• Present
• Future
31
• Using perceptual features

– Formants, formant bandwidths, and pitch
• Voiced Frames
• Using GMM classifier
• Conducting experiments on larger databases
– Switchboard
32
MFCCs,
no Deltas,
no CMS
MFCCs,
no CMS
F0 Base
33
MFCCs,
MFCCs, no Deltas,
no CMS no CMS
F0 Base
34
MFCCs,
no Deltas,
MFCCs, no CMS
no CMS
F0 Base
35
MFCCs,
no Deltas,
MFCCs, no CMS
no CMS
F0 Base
36
• Performance isn’t the best, but this feature set…

– Uses only 9 features versus 19–38 for MFCCs
– Hasn’t been as heavily researched as MFCCs
37
• Determine reasons for performance differences

between various databases
• Channel & score normalizations
• Pitch-synchronous features
• Closed-phase analysis
• Glottal model features
38
39
• Past
• Present
• Future
40
• Investigate other auditory based features

– Vocal agitation
– Formants, formant bandwidths, and pitch calculated
from the auditory model
– Auditory model features
• Conduct experiments on other databases
– Broadcast news
– Military training exercises
41
Speaker Recognizability Test
• Past
• Present
• Future
42
• Dynastat “The Development of a Method for

Evaluating and Predicting Speaker Recognizability in
Voice Communication Systems”
– Determined perceptually relevant features
• Perceptual voice traits (PVT)
• 21 traits currently identified
– Developed methodology to measure these traits
• Human listeners
– Developed measure to determine loss due to channel
• Diagnostic Speaker Recogniziability Test (DSRT)
43
• Past
• Present
• Future
44
• Use perceptual voice traits to identify groups of

similar and distinctive speakers
• Determine if current SID systems have difficulty
with these similar speakers
• Implementing in-house
– Web-based listening test for
• PVT rating
• DSRT
45
• Past
• Present
• Future
46
• Obtain PVT ratings for larger database

– Switchboard
• Determine acoustic correlates of perceptually
relevant features
• Use as features for speaker recognition
• Utilize DSRT for communication system testing
47
Summary
• Computational Audition offers potential for

improved performance in adverse military
environments
• Still lots of research needs to be accomplished
– Fidelity of model
– Model feedback pathways
• Computation issues no longer limiting factor in
performing meanful experiments
48
Questions?
49

Anderson

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Anderson

Diunggah oleh

Hak Cipta:

Format Tersedia

Computational Audition at

Dr. Timothy R. Anderson

• research, development and applications of:

Sensor-decision maker- Future JAOC

• Present signal processing systems (i.e. speech and speaker

Technical Challenges Approach

• Cockpit Speech Recognition

• Kohonen Self-Organizing Feature Map

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

Phoneme Recognition Rate

TASK SPEECH RESULTS

PHONEME 7-12 dB BINAURAL

• Tanner Research “Analog Speech Recognition”

• Auditory Models Investigated

• Using perceptual features

• Performance isn’t the best, but this feature set…

• Determine reasons for performance differences

• Investigate other auditory based features

• Dynastat “The Development of a Method for

• Use perceptual voice traits to identify groups of

• Obtain PVT ratings for larger database

• Computational Audition offers potential for

Anda mungkin juga menyukai