Agenda
Introduction Audio Visual Modeling Spectrogram Reading
Spectrogram Filtering
Introduction
What is Speech Recognition? Speech Recognition Problems
noise inter and intra speaker variation continuous: no boundaries between words
Demo
Ready for the challenge ? Listen to this audio and try to understand the speech content: vox_mix[1].mov Listen to speech with video image: dig_tranexp[1].mov Did you understand the content? Get a prize from IBM Play the answer: vox_clean[1].mov (5893642)
How?
Visual Features
Geometric lip dimensions
Lip shape: height/width of the inner/outer lip
Audio-Visual Integration
Feature Fusion Synchronization Problem
Spectrograms
A Spectrogram:
Translation of speech into the visual domain
frequency
time
Spectrogram Reading
Spectrogram Filtering
Required:
How?
Morphological Processing
Based on the theory of Mathematical Morphology ?!! Stresses the role of shape in image preprocessing used for region identification. Important Morphological operations:
Erosion Dilation Opening Closing
Spectrogram Filtering
convert
thresholding
erosion
dilation
ANDing
convert