Anda di halaman 1dari 34

Acoustic Theory of Speech Production

Overview
Soundsources
Vocaltracttransferfunction
Waveequations
Soundpropagationinauniformacoustictube
Representingthevocaltractwithsimpleacoustictubes
Estimatingnaturalfrequenciesfromareafunctions
Representingthevocaltractwithmultipleuniformtubes
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction1
Lecture # 2
Session 2003
Anat omi cal St r uct ur es f or Speech Pr oduct i on
6. 345 Automatic Speech Recognition Acoustic Theory of Speech Production 2
Phonemes in American English
PHONEME EXAMPLE PHONEME EXAMPLE PHONEME EXAMPLE
/i/ beat
/I/ bit
/e/ bait
/E/ bet
/@/ bat
/a/ Bob
/O/ bought
/^/ but
/o/ boat
/U/ book
/u/ boot
/5/ Burt
/a/ bite
/O/ Boyd
/a/ bout
/{/ about
/s/ see /w/ wet
/S/ she /r/ red
/f/ fee /l/ let
/T/ thief /y/ yet
/z/ z /m/ meet
/Z/ Gigi /n/ neat
/v/ v /4/ sing
/D/ thee /C/ church
/p/ pea /J/ judge
/t/ tea /h/ heat
/k/ key
/b/ bee
/d/ Dee
/g/ geese
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction3
Places of Articulation for Speech Sounds
Palato-Alveolar
Velar
Alveolar
Labial
Uvular
Dental
Palatal
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction4
Speech Waveform: An Example
Twoplussevenislessthanten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction5
A Wideband Spectrogram
Twoplussevenislessthanten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction6
Acoustic Theory of Speech Production
Theacousticcharacteristicsofspeechareusuallymodelledasa
sequenceofsource,vocaltractlter,andradiationcharacteristics
U
G
U
L
P
r
r
P
r
(j)=S(j)T(j)R(j)
Forvowelproduction:
S(j) = U
G
(j)
T(j) = U
L
(j)/ U
G
(j)
R(j) = P
r
(j)/ U
L
(j)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction7
Sound Source: Vocal Fold Vibration
Modelledasavolumevelocitysourceatglottis,U
G
(j)
P
r
( t )
U
G
( t )
T 1/F
o o
=
t
t
U
G
( f )
1 / f
2
f
F
0
ave(Hz) F
0
min(Hz) F
0
max(Hz)
Men 125 80 200
Women 225 150 350
Children 300 200 500
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction8

Sound Source: Turbulence Noise


Turbulencenoiseisproducedataconstrictioninthevocaltract
Aspirationnoiseisproducedatglottis
Fricationnoiseisproducedabovetheglottis
Modelledasseriespressuresourceatconstriction,P
S
(j)
P ( f )
s
f
0.2
V
D
4A

V: Velocityatconstriction D: Criticaldimension= A

6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction9


Vocal Tract Wave Equations
Dene: u(x, t) =
U(x, t) =
p(x, t) =
=
c =
particlevelocity
volumevelocity(U =uA)
soundpressurevariation(P =P
O
+p)
densityofair
velocityofsound
Assumingplanewavepropagation(foracrossdimension ),
andaone-dimensionalwavemotion,itcanbeshownthat

p
=
u

u
=
1 p
2
u 1
2
u
=
x t x c
2
t x
2
c
2
t
2
Timeandfrequencydomainsolutionsareoftheform
u(x, t)=u
+
(t
x
)u

(t+
x
) u(x, s)=
1
P
+
e
sx/c
P

e
sx/c
c c c
x x
p(x, t)=c u
+
(t ) + u

(t+ ) p(x, s)=P


+
e
sx/c
+P

e
sx/c
c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction10
U
G
Propagation of Sound in a Uniform Tube
A
x = - l x = 0
Thevocaltracttransferfunctionofvolumevelocitiesis
U
L
(j) U(, j)
T(j)=
U
G
(j)
=
U(0, j)
UsingtheboundaryconditionsU(0, s) = U
G
(s)andP(, s) = 0
2 1
T(s) =
e
s/c
+e
s/c
T(j)=
cos(/c)
ThepolesofthetransferfunctionT(j)arewherecos(/c) = 0
4 (2f
n
)
=
(2n
2
1)
f
n
=
4
c

(2n1)
n
=
(2n1)
n= 1,2, . . .
c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction11
Propagation of Sound in a Uniform Tube (cont)
Forc= 34,000cm/sec,= 17cm,thenaturalfrequencies(also
calledtheformants)areat500Hz,1500Hz,2500Hz,. . .
j
x
x
x
x
x
x

40 )

T

(

j

20
1
0

2
0

l
o
g

0

0 1 2 3 4 5
Frequency ( kHz )
Thetransferfunctionofatubewithnosidebranches,excitedat
oneendandresponsemeasuredatanother,onlyhaspoles
Theformantfrequencieswillhavenitebandwidthwhenvocal
tractlossesareconsidered(e.g.,radiation,walls,viscosity,heat)
4

1
,
4

2
,
4

3
,..., Thelengthofthevocaltract,,correspondsto
1 3 5
where
i
isthewavelengthofthei
th
naturalfrequency
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction12
Standing Wave Patterns in a Uniform Tube
Auniformtubeclosedatoneendandopenattheotherisoften
referredtoasaquarterwavelengthresonator
x
glottis lips
SWP for
F
1
|
U(x)
|
SWP for
F
2
2
3
SWP for
F
3
2 4
5 5
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction13
Natural Frequencies of Simple Acoustic Tubes
z
-l
A
z
-l
A
x = - l x = 0 x = - l x = 0
Quarterwavelengthresonator Half-wavelengthresonator
P(x, j) = 2P
+
cos
x
P(x, j) =j2P
+
sin
x
c c
U(x,j)=j
A A
c
2P
+
sin
x
U(x, j) =
c
2P
+
cos
x
c c
c
tan
c
cot Y

= j
A
Y

=j
A
c c
j
A A 1
c
2
=jC
A
/c 1 j

=j
M
A
/c 1
C
A
=A/c
2
=acousticcompliance M
A
= /A =acousticmass
c c
f
n
=
4
(2n1) n= 1, 2, . . . f
n
=
2
n n = 0, 1, 2, . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction14
Approximating Vocal Tract Shapes
[ i ] [ a ] [ u ]
A
1
A
2
1
l
2
l
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction15
2
1 2
l
Estimating Natural Resonance Frequencies
Resonancefrequenciesoccurwhereimpedance(oradmittance)
functionequalsnatural(e.g.,opencircuit)boundaryconditions
U
G
A
1
A
2
U
L
1
l
Y + Y = 0
ForatwotubeapproximationitiseasiesttosolveforY
1
+Y
2
= 0
j
A
1
tan

1
j
A
2
cot

2
= 0
c c c c
sin

1
sin

2

A
2
cos

1
cos

2
= 0
c c A
1
c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction16
Decoupling Simple Tube Approximations
IfA
1
A
2
, orA
1
A
2
,thetubescanbedecoupledandnatural
frequenciesofeachtubecanbecomputedindependently
Forthevowel/i/,theformantfrequenciesareobtainedfrom:
A
1
A
2
1
l
2
l
c c
f
n
=
2
1
n plus f
n
=
2
2
n
Atlowfrequencies:

A
2

1/2
1

1

1/2
c
f = =
2 A
1

2
2 C
A
1
M
A
2
ThislowresonancefrequencyiscalledtheHelmholtzresonance
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction17
Vowel Production Example
7 cm
2
1 cm
2
8 cm
2
1 cm
2
9 cm 8 cm
9 cm 6 cm
+
+ +
1093 268 1944 2917 972
2917 . . .
. . . .
. . . .
. . . .
Formant Actual Estimated Formant Actual
F1 789 972 F1 256
F2 1276 1093 F2 1905
F3 2808 2917 F3 2917
. . . . .
. . . . .
Estimated
268
1944
2917
.
.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction18
Example of Vowel Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
/bit/ bat/
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 19
/
Estimating Anti-Resonance Frequencies (Zeros)
Zerosoccuratfrequencieswherethereisnomeasurableoutput
U
N
U
G
A
p
A
o
A
n
Y
p
Y
o
Y
n
n
l
A
b
A
c
A
f
P
s
U
L
l
p
l
o
l
b
l
c
l
f
Fornasalconsonants,zerosinU
N
occurwhereY
O
=
Forfricativesorstopconsonants,zerosinU
L
occurwherethe
impedancebehindsourceisinnite(i.e.,ahardwallatsource)
Y = 0 Y + Y = 0
1 3 4
Zerosoccurwhenmeasurementsaremadeinvocaltractinterior
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction20
Consonant Production
A
b
A
c
A
f
P
s
l
b
l
c
l
f
POLES
ZEROS
+ + + +
A
b
A
c
A
f

f
[g] 5 0.2 4 9 3 5
[s] 5 0.5 4 11 3 2.5
[g] [s]
poles zeros poles zeros
215 0 306 0
1750 1944 1590 1590
1944 2916 3180 2916
3888 3888 3500 3180
. . . .
. . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction21
Example of Consonant Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
/kip/ si/
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 22
/
A

A
Y

j
Y
l
Perturbation Theory
forsmall
l
Considerauniformtube,closedatoneendandopenattheother
l
x
Reducingtheareaofasmallpieceofthetubeneartheopening
(whereU ismax)hasthesameeectaskeepingtheareaxed
andlengtheningthetube
Sincelengtheningthetubelowerstheresonantfrequencies,
narrowingthetubenearpointswhereU(x)ismaximuminthe
standingwavepatternforagivenformantdecreasesthevalueof
thatformant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction23
A
Perturbation Theory (contd)
A
Y

j
c
2
forsmall
Y
l
l
l
x
Reducingtheareaofasmallpieceofthetubeneartheclosure
(wherepismax)hasthesameeectaskeepingtheareaxedand
shorteningthetube
Sinceshorteningthetubewillincreasethevaluesoftheformants,
narrowingthetubenearpointswherep(x)ismaximuminthe
standingwavepatternforagivenformantwillincreasethevalue
ofthatformant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction24
Summary of Perturbation Theory Results
x
glottis lips
SWP for
F
1
|
U(x)
|
SWP for
F
2
2
3
SWP for
F
3
2 4
5 5
x
glottis lips
F
1
1
2
+

(as a consequence of decreasing A)
F
2
1
2
+ +

F
3
1
2

+ +

+

6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction25
Illustration of Perturbation Theory
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction26
Illustration of Perturbation Theory
Theshipwastornapartonthesharp(reef)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction27
Illustration of Perturbation Theory
(Theshipwastornapartonthesh)arpreef
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction28

Multi-Tube Approximation of the Vocal Tract


WecanrepresentthevocaltractasaconcatenationofN lossless
tubeswithconstantarea{A
k
} andequallengthx=/N
Thewavepropagationtimethrougheachtubeis=
x
=
Nc c
A A
7
x
x
x
x
x
x
x
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction29
Wave Equations for Individual Tube
Thewaveequationsforthek
th
tubehavetheform
c x
A
k
k
(t
x
) + U

c
p
k
(x, t) = [U
+
k
(t+ )]
c
U
k
(x, t) = U
+
c
) U

c
)
k
(t
x
k
(t+
x
wherexismeasuredfromtheleft-handside(0 x x)
+ + + +
U
k
( t ) U
k
( t - ) U
k+1
( t )
U
k+1
( t - )
-
- -
-
U
k
( t ) U
k
( t + ) U
k+1
( t )
U
k+1
( t + )
A
k
x
x
A
k+1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction30
Update Expression at Tube Boundaries
Wecansolveupdateexpressionsusingcontinuityconstraintsat
tubeboundariese.g.,p
k
(x, t) = p
k+1
(0, t), andU
k
(x, t) = U
k+1
(0, t)
+
k + 1
U
+
k + 1
U
-
k
U )
-
k
U )
+
1 - r
1 + r
k
k
r
k k
- r

DELAY

DELAY

DELAY

DELAY
k th ( k + 1 ) st
k
(t ) + r
k
U

( t )
( t ) ( t +
( t -
tube tube
+
U
k
( t )
U
k + 1
( t - )
-
-
U
k
( t ) U
k + 1
( t + )
U
k
+
+1
(t)=(1+r
k
)U
+
k+1
(t)
U
k

(t+)=r
k
U
k
+
(t ) + (1 r
k
)U

k+1
(t)
r
k
=
A
k+1
A
k
note | r
k
| 1
A
k+1
+A
k
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction31
Digital Model of Multi-Tube Vocal Tract
Updatesattubeboundariesoccursynchronouslyevery2
Ifexcitationisband-limited,inputscanbesampledeveryT = 2
Eachtubesectionhasadelayofz
1/2
1
+
z
2 1 + r
k
+
U
k
( z )
k
r
1
k
-r
U
k + 1
( z )
- -
U
k
( z ) U
k + 1
( z )
z
2 1 - r
k
ThechoiceofN dependsonthesamplingrateT
T = 2 = 2

= N =
2
Nc cT
Seriesandshuntlossescanalsobeintroducedattubejunctions
Bandwidthsareproportionaltoenergylosstostorageratio
Storedenergyisproportionaltotubelength
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction32
Assignment 1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction33
References
Zue,6.345CourseNotes
Stevens,AcousticPhonetics,MITPress,1998.
Rabiner&Schafer,DigitalProcessingofSpeechSignals,
Prentice-Hall,1978.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction34

Anda mungkin juga menyukai