A Study On The Speech Acoustic-To-Articulatory Mapping Using Morphological Constraints

A STUDY ON THE SPEECH
ACOUSTIC-TO-ARTICULATORY MAPPING USING

MORPHOLOGICAL CONSTRAINTS
a dissertation
submitted to the graduate school of engineering
of nagoya university
in partial fulfillment of the requirements
for the degree of
doctor (engineering)
By
Hani Camille Yehia
Abstract
The representation of speech based on articulatory parameters provides a fertile
paradigm for a better modeling of the speech process. This modeling is important,
for example, for the development of applications, such as speech synthesis, coding
and even recognition, whose performance is directly related the method used to rep-
resent speech. However, articulatory representation of speech is a goal that, to be
achieved, still requires the solution of several problems. A key issue in this context is
the inversion of the articulatory-to-acoustic mapping in speech. This study is focused
on this point.
The articulatory-to-acoustic mapping in speech is dened here as the mapping
that, to every possible articulatory conguration of the vocal-tract, associates a
unique acoustic image. This image is dened as the set of acoustic properties inherent
in the vocal apparatus. The spaces formed by all possible articulatory congurations
and by their acoustic images are called articulatory space and acoustic space respec-
tively. The articulatory space maps onto the acoustic space, i.e. every point in the
articulatory space has a unique image in the acoustic space, and every point in the
acoustic space has an inverse image in the articulatory space. However, since the
inverse image is not unique, the map is not a bijection (i.e. it is not a one-to-one
mapping).
The objective of this study is to estimate a restricted case of the acoustic-to-
articulatory mapping, using constraints imposed by the human morphology and by
the dynamics of the vocal-tract to determine the point in the articulatory space most
likely to map onto a given point contained in the acoustic space. In the restricted
case under analysis, only oral vowels are considered. The vocal-tract is represented
articulatorily by the log-area function (i.e. the logarithm of the cross-sectional area
along the vocal-tract) and acoustically by the set formed by the rst three formant
ii
frequencies (i.e. resonant frequencies of the vocal-tract). The area function was chosen
to represent the vocal-tract articulatorily because it is the articulatory characteristic
most directly related to the acoustic characteristics of the vocal-tract. By its turn,
the set formed by the rst three formant frequencies was chosen to represent the
vocal-tract acoustically because it is the acoustic characteristic of the vocal-tract
least in uenced by factors other than the area function.
A major diculty in estimating appropriately the area function from formant
frequencies is the one-to-many characteristic of the acoustic-to-articulatory mapping:
there are many area functions that map onto the same set of formant frequencies.
The main contribution of this study is the formulation of a framework to incorpo-
rate constraints imposed by the human morphology into the mapping of the articu-
latory space formed by all possible area functions onto the acoustic space formed by
all possible sets of formant frequencies. (The morphological information is extracted
from a corpus of midsagittal vocal-tract proles obtained with cineradiography.) By
doing so, the articulatory space is limited to contain only area functions compati-
ble with the human morphology and, consequently, the ambiguity observed in the
acoustic-to-articulatory mapping is reduced. Subsequently, positional and continuity
constraints are added to complete the model. Tests carried out conrmed that, under
the constraints of the model proposed, plausible sequences of area functions can be
estimated from sequences of formant frequencies.
The procedure followed to formulate and evaluate the method proposed is de-
scribed step by step along the chapters of the study. Chapter 1 is the introduction.
Chapter 2 describes the corpus and the physical model used to represent the relation
between the area function and formant frequencies. Chapter 3 explains the paramet-
ric models used to represent the area function. Chapter 4 shows the procedure used to
estimate the area function from formant frequencies under morphological, continuity
and positional constraints. Chapter 5 presents and interprets the results obtained
with the tests carried out with the model. Finally, Chapter 6 concludes the study
summarizing the main points of the previous chapters, pointing the strong and weak
points of the method, and indicating directions in which research eorts should be
carried out in the future. A brief description of each chapter is given in the following
paragraphs.
iii
Chapter 1 gives an overview of the context in which the problem analyzed is
inserted, presents the problem and its importance, summarizes the history and al-
ternative methods available in the literature, explains in general lines the method
proposed to solve the problem, and describes the general organization of the remain-
ing chapters.
The rst part of Chapter 2 describes the procedure used to obtain a corpus of 519
area functions from midsagittal tracings extracted from cineradiograhic data and from
labiograms synchronously acquired at a rate of 50 frames per second. The process
to obtain the area functions from the raw data is as follows: rst, each midsagittal
prole is plotted on a semi-polar grid which follows approximately the orientation
of the tube dened by the vocal-tract walls. The grid lines divide the vocal-tract
into sections. After that, each section is represented by its mean length and by its
mean midsagittal distance, i.e. the mean distance between anterior and posterior
walls of the tract. In the next step, the natural logarithm of the mean midsagittal
distance of each section is converted into the natural logarithm of the cross-sectional
area by means of a linear transformation. (This transformation is, however, only an
approximation because, although strongly associated with, the cross-sectional area is
not completely determined by the midsagittal distance.) At this point, each vocal-
tract shape is represented by a set of log-areas (i.e. logarithms of cross-sectional
areas) and corresponding section lengths. The nal step is to resample the log-areas
so that they become evenly spaced along the tract. By doing so, each vocal-tract
shape can be represented as a vector of log-areas plus the vocal-tract length. The
corpus is then arranged in a matrix whose rows contain the log-area vectors. This
matrix provides an ecient way to handle the corpus of area functions throughout
the study.
The second part of Chapter 2 explains the physical model used to calculate for-
mant frequencies from the area function. At rst, the vocal-tract is modeled as a
rigid, lossless tube. After that, viscous losses, glottal opening, lip radiation load and
yielding walls are incorporated into the sound propagation model, and their eects
on the formant frequencies are analyzed. Finally, a numerical procedure to compute
formant frequencies from the area function, approximated by a concatenation of uni-
form tubes, is described. This procedure is the tool used in this study to express
formant frequencies as a function of the area function (or, more specically, of the
iv
log-area function.)
The objective of Chapter 3 is to obtain an ecient parametric representation of
the log-area function. The number of uniform sections necessary to obtain a good
approximation of the vocal-tract log-area function (32 sections in this study) is consid-
erably larger than the number of dimensions of the articulatory space. More ecient
representations can be achieved if each log-area function is expressed as the sum of a
few basis functions weighted by a set of coecients. Since the basis functions (basis
vectors, in the discretized case) are xed, only the coecients are enough to represent
a given log-area function (log-area vector, in the discretized case.) Two possibilities
of parametric representation are used in this work: Fourier analysis, and principal
component analysis (PCA).
In the rst part of Chapter 3, a reasonably good approximation of the log-area
function is obtained when it is represented by the rst eight terms of its Fourier
cosine series expansion. In this case, the set of basis functions is formed by cosine
functions. An important property of this representation is that there is a one-to-
one relationship between the rst three odd coecients of the Fourier cosine series
expansion of the log-area function and the rst three formant frequencies determined
by it. This property is used in the method proposed here to determine the area
function most likely to have produced a given set of formant frequencies.
Cosine functions form a set \general purpose" basis functions which, in principle,
do not contain any information about the morphology of the vocal-tract. In the
second part of Chapter 3, a more ecient representation of the log-area vector (from
now on, the description will be done for the discretized case), which makes use of
such information, is obtained by means of principal component analysis (PCA). In
this case, the set of basis vectors is formed by the eigenvectors associated with the ve
largest eigenvalues of the covariance matrix of log-area vectors, which is estimated
using all vectors present in the corpus described in Chapter 2.
After the PCA procedure is carried out, some transformations are still necessary
to nd a representation for the log-area vectors which exhibits a one-to-one mapping
between a subset of its components and the corresponding sets of formant frequen-
cies (as in the case of Fourier representation.) In this procedure, rst, independent
component analysis (ICA) is used to nd ane transformations which reduce the de-
pendence among the principal components used to represent the articulatory space,
v
and among the formant frequencies (represented in log-scale) used to represent the
acoustic space. Next, singular value decomposition (SVD) is used to nd rotations of
both articulatory and acoustic spaces so that each component of the acoustic space
is subject to the major in uence of one and only one component of the articulatory
space. By its turn, each component of the articulatory space has major in uence on
at most one component of the acoustic space.
When either PCA or Fourier representation is used, it is observed that the mapping
of the articulatory space onto the acoustic space, albeit nonlinear, has a strongly linear
characteristic. The Fourier representation is useful to study analytical aspects of the
articulatory-to-acoustic mapping, whereas the PCA representation is optimal from a
statistical point of view.
The last part of Chapter 3 takes advantage of the fact that the vocal-tract moves
continuously in time, and so do articulatory parameters. Based on this fact, it is
possible to think about the parametrization of trajectories of articulatory components
and form the concept of an articulatory trajectory space. This procedure is successfully
carried out using either Fourier or PCA approach. In this temporal parametrization
the basis functions (of time) obtained with PCA have basically the shape of cosine
functions. In the tests realized, sequences of ten frames were well represented as linear
combinations of four basis functions.
At the end of Chapter 3, sequences of log-area vectors can be represented in
a very compact parametric form. As a quantitative example, the log-area vectors
contained in a sequence of 10 frames (200 ms of speech) can be represented by only
20 parameters.
Chapter 4 focuses on the main objective of this work: the inversion of the articulatory-
to-acoustic mapping or, equivalently, the acoustic-to-articulatory mapping. This is,
as already stated, a one-to-many mapping, since the same set of formant frequencies
can be produced by an innite number of area functions. In other words, a given point
in the acoustic space is associated with an entire subspace in the articulatory space.
The number of dimensions of this subspace is equal to the number of dimensions of
the articulatory space minus the number of dimensions of the acoustic space.
Among all the points contained in the articulatory space mapping onto a given
point in the acoustic space, it is possible to look for the point that can be reached with
minimum eort by the vocal-tract. This procedure is carried out by representing the
vi
vocal-tract articulatory eort as a quadratic cost function and, subsequently, nding
the point of minimum cost among all points in the articulatory space that map onto
a given point in the acoustic space. The same procedure can be used in the case
of articulatory trajectories. In this case, the problem is to nd the trajectory of
minimum cost among all points in the articulatory trajectory space that map onto a
given point in the acoustic trajectory space.
The mathematical formulation of this problem results in a non-linear system which
is numerically solved using a Newton-Raphson procedure. A stable solution was al-
ways achieved, taking three iterations on average to converge. At the end of Chapter 4
the method used to estimate area functions from formant frequencies is complete.
In Chapter 5 the method is tested and the results obtained are analyzed. The
258 area vectors of the corpus that correspond to oral vowels were used for this
purpose. Isolated articulatory positions were analyzed rst. In this case, continuity
constraints are not imposed. Using the numerical procedure described in Chapter 2,
the rst three formant frequencies were extracted from the transfer functions of each
area vector under analysis. These sets of formant frequencies were then used to
estimate the area vectors from where they had been extracted. A comparison of the
original area vectors with those estimated from the formant frequencies shows that the
inversion procedure works satisfactorily for most of the analyzed cases. Nevertheless,
despite the good agreement observed between practically all the acoustic transfer
functions derived from original and estimated vectors; large articulatory distortions
were observed in some cases. Part of these distortions was considerably reduced
when continuity constraints were imposed, i.e. when articulatory trajectories were
estimated instead of isolated articulatory positions. The distortions that remained
can be mainly attributed to the quadratic cost function adopted to represent the
vocal-tract articulatory eort. The quadratic cost allows an ecient mathematical
solution, but does not have a physiological meaning.
Still in the scope of Chapter 5, it is interesting to note that, since the obtained
transfer functions were derived only from the formant frequencies, and since a good
spectral matching was obtained; it is possible to say that, if morphological information
is available, it is possible to estimate the vocal-tract transfer function from the formant
frequencies. It remains to be shown if the human being makes use of such redundancy
and, if so, in what way.
vii
Chapter 6 concludes the study. In summary, it is possible to say that the use
of morphological, positional, and continuity constraints can be eciently combined
in the analysis of the acoustic-to-articulatory mapping during speech. A method
to combine these constraints was described and tested. Correlation coecients of
0.83 were found in the articulatory domain. In the acoustic domain, the correlation
coecients found were above 0.999, conrming that the acoustic constraints were
respected.
As a nal note about the future, there is still a lot to be done to improve the
model. A physiologically meaningful function to measure the cost of vocal-tract
positions must be obtained to reduce the discrepancies observed. Also important is
the development of a better representation of the acoustic space, so that the model
can be applied to sounds other than oral vowels.
viii
Acknowledgements
This work would not be concluded without the support received from several people.
First of all, I would like to thank Prof. Fumitada Itakura, my PhD advisor, for
guiding me through the PhD course, for teaching me many important things, and for
giving me the freedom necessary to learn several more in Nagoya University.
I would like to thank also Prof. Noboru Ohnishi, for analyzing and discussing the
contents of this study. In the same way I thank Prof. Kazuya Takeda for interest-
ing comments and suggestions about this work, and for helping me with important
problems that I would not be able to solve alone.
In Nagoya University I appreciated very much to study together with Shoji Kajita,
who was always by my side during the PhD program, helped me every time I needed,
and divided with me all happy and dicult moments along these years. I would like to
thank also Hong Wang, now at PictureTel, who was always a good friend during the
years we were together in Nagoya; and Motohiko Yada, from whom I always received
all possible cooperation and incentive.
I cannot forget the guidance received previously from Osvaldo Catsumi Imamura
and Prof. Fernando Toshinori Sakane, my MSc advisors, and from Prof. Marcos
Botelho, my personal counselor, who taught me how to learn during the time I was
at the Instituto Tecnologico de Aeronautica in Brazil.
Also fundamental for this work was the support received from Shinji Maeda and
Rafael Laboissiere, from whom I received most of the data used as base for this
research, and with whom I had several fruitful discussions.
I am grateful for having had the opportunity of carrying out experiments at NTT
Basic Research Laboratories in Atsugi, Japan. There I had the chance of working
with several good scientists (and persons). In special, I would like to thank Masaaki
Honda, Tokihiko Kaburagi and Takemi Mochida, for their perfect collaboration during
ix
the time I was at NTT.
Currently, I am with ATR-HIP (in Kyoto, Japan), where I could nd an incredibly
fertile environment for scientic studies. In special, I would like to thank Yoh'ichi
Tohkura for his constant support and encouragement, Eric Bateson and Mark Tiede
for the several discussions we had together, and for the cooperation on the devel-
opment of our research goals, Tatsuya Nomura and Shinobu Masaki for translating
the abstract of the thesis into Japanese, and Erik McDermott for sweating over his
dissertation at the same time I was sweating over mine.
I cannot forget also my friends in Brazil, Japan, and all over the world, who were
always ready to collaborate when I asked, and even when I did not ask. Friendship
is undoubtedly the most precious thing existent in the world. I hope we have the
opportunity to help each other many other times in the future.
Finally, but most importantly, I am extremely grateful to all members of my
family for all the orientation, incentive and support that they have been giving me
all through my life. In special, I thank my wife, Ana Helena da Costa Fragoso Yehia,
for the years of my life that she made happier; my parents, Camille Hani Yehia and
and Tamira Hamdan Yehia, for their unconditional encouragement since I was born;
and my sister Aline, my brother Salim, and my aunt Badiah, who were also by my
side during all these years.
I apologize for not citing explicitly all the names I wanted to, including the scien-
tists whose works formed the theoretical base for this study. I hope I can return at
least partially all the things I have received through these years.
x
List of Symbols
x Distance from the glottis along the tract.
A(x) Cross-sectional area function.
d(x) Midsagittal distance.
(x) Proportional coecient used in model.
(x) Exponential coecient used in model.
yi Log-area vector of frame i.
Aki Area of the k-th uniform section used to approximate the area function
of frame i.
Li Vocal-tract length of frame i.
yki Natural logarithm of Aki, k = 1 : : : K . yKi = Li.
P Number of vectors present in the corpus.
K Number of uniform sections used to approximate the area function.
P Amplitude of the sound pressure.
U Amplitude of the volume velocity.
! Angular frequency of the sound pressure and volume velocity.
c Velocity of sound inside the vocal-tract.
Density of air inside the vocal-tract.
m m-th eigenvalue of the Webster's horn equation.
Fm m-th formant frequency.
a Eective lip radius.
!m Angular frequency of the mth formant for a tract with yielding walls.
!^m Angular frequency of the mth formant for a tract with rigid walls.
!0 Lowest angular resonance frequency when the tract is closed at both ends.
Lw Mass of the tract walls per unit length.
Cw Compliance of the tract walls per unit length.
xi
Rw Resistance of the tract walls per unit length.
a Ratio of wall resistance to mass.
b Squared angular frequency of the mechanical resonance.
c1 Correction for thermal conductivity and viscosity.
l Section length.
M Number of formants used as acoustic parameters.
N Number of articulatory components.
an n-th coecient of the Fourier cosine series expansion of the area function.
a Vector of Fourier cosine coecients.
Uay Matrix of cosines used to transform a into y.
Cy Covariance matrix of log-area vectors.
y Mean log-area vector.
S Diagonal matrix containing the eigenvalues of Cy in decreasing order.
U Unitary matrix whose columns contain the normalized eigenvectors of Cy .
Vector of principal component articulatory coecients.
Mean of the Q vectors used to \ll" the articulatory space.
Uy Matrix used to transform into y.
Vector of independent component articulatory coecients.
T Matrix used to transform into .
f Vector formed by the rst three formant log-frequencies associated with a given
area function.
f Mean of the Q vectors used to \ll" the acoustic space.
g Vector of independent component acoustic coecients.
Tfg Matrix used to transform f into g.
Tg Matrix that approximates g as a linear transformation of .
G Matrix containing an ensemble of vectors g.
B Matrix containing an ensemble of vectors .
Q Number of vectors contained in the ensembles G and B.
Sh \Pseudo diagonal" matrix containing the eigenvalues of TgTg t.
Uhg Unitary matrix containing the normalized eigenvectors of TgTgt.
U Unitary matrix containing the normalized eigenvectors of TgtTg .
Vector of articulatory variables used in the principal component representation.
h Vector of acoustic variables used in the principal component representation.
xii
Ty Matrix used to transform y into .
0y Mean of the Q log-area vectors used to \ll" the articulatory space.
Tfh Matrix used to transform f into h.
Rh Matrix of correlation coecients between h and .
Rf Matrix of correlation coecients between f and .
H Matrix containing an ensemble of Q acoustic vectors h.
Matrix containing an ensemble of Q articulatory vectors .
h Column vector containing the standard deviation of h.
Column vector containing the standard deviation of .
p Number of frames contained in a given articulatory trajectory.
q Number of coecient vectors necessary to parametrize an articulatory
trajectory of p frames.
i Matrix containing a sequence of p articulatory vectors, starting at frame i.
C \Covariance" matrix of .
P0 Number of sequences of length p contained in the corpus.
Diagonal matrix containing the eigenvalues of C .
V Matrix whose columns are the normalized eigenvectors of C .
i Principal component representation of .
V Matrix containing the rst q columns of V.
Yi Matrix containing a sequence of p log-area vectors, starting at frame i.
P Quadratic positional cost of a given articulatory vector .
Pa Quadratic positional cost of a given articulatory vector a.
Py Quadratic positional cost of a given log-area vector y.
H Weight matrix containing information about the morphology of the vocal-tract.
(Principal component representation case.)
Ha Weight matrix containing information about the morphology of the vocal-tract.
(Fourier representation case.)
Hy Weight matrix containing information about the morphology of the vocal-tract.
(Log-area representation case.)
C Covariance matrix of the articulatory vectors .
CP Average of the costs of all articulatory vectors in the corpus.
h Variation in the acoustic vector h.
Variation in the articulatory vector .
xiii
a Jacobian matrix dened by dh=d .
1 Vector containing the rst M components of .
2 Vector containing the last N M components of .
1 Jacobian matrix dened by @ h=@ 1.
2 Jacobian matrix dened by @ h=@ 2.
0 Articulatory vector corresponding to the neutral position of the vocal-tract.
0 d =d 2 .
IN M Identity matrix of order N M .
p 0 t H .
Matrix formed by the combination of a and p .
h Vector of variations containing containing acoustic and minimum eort targets.
M
kk Norm function dened by the largest component (absolute value) of a vector.
\Matrix of matrices" containing the locally linear relation between a sequence
of articulatory variations and their acoustic and cost counterparts.
G Sequence of articulatory variations i vertically arranged as a column vector.
V
H Sequence of vectors hi, vertically arranged as a column vector.
Np Nq matrix whose nonzero elements vij are the entries of V .
Each row of V is repeated N times.
X Columns of a variation i rearranged in a column vector.
E

MV
Np 1 column vector containing the approximation error between
X and H.
The squared error E t H E .
H Weight matrix used in the computation of the squared error .
0 Neutral trajectory determined by the vocal-tract sustaining the neutral
position along the analyzed interval.
xiv
Contents
Abstract iii
Acknowledgements x
List of Symbols xii
1 Introduction 1
2 Area Function and Formant Frequencies 6
2.1 Corpus : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
2.1.1 Sampling vocal-tract proles : : : : : : : : : : : : : : : : : : : 8
2.1.2 From proles to midsagittal distances : : : : : : : : : : : : : : 9
2.1.3 From midsagittal distances to area function : : : : : : : : : : 12
2.1.4 Log-area function : : : : : : : : : : : : : : : : : : : : : : : : : 12
2.2 Computation of the formant frequencies : : : : : : : : : : : : : : : : 15
2.2.1 Lossless tube : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
2.2.2 Lossy tube : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
2.2.3 Numerical determination of formant frequencies : : : : : : : : 18
2.2.4 Comparison of lossless and lossy models : : : : : : : : : : : : 20
3 Parametric Models for the Area Function 24
3.1 Fourier Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25
3.1.1 Truncation Eects : : : : : : : : : : : : : : : : : : : : : : : : 26
3.1.2 Formant frequencies as functions of Fourier coecients : : : : 27
3.2 Statistical Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29
3.2.1 Principal Component Analysis : : : : : : : : : : : : : : : : : : 29
3.2.2 Independent component analysis : : : : : : : : : : : : : : : : 38
xv
3.2.3 Singular Value Decomposition : : : : : : : : : : : : : : : : : : 41
3.3 Temporal Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 46
4 The Inverse Problem 54
4.1 Isolated frames : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 57
4.1.1 Representing Morphological Constraints : : : : : : : : : : : : 57
4.1.2 Solving the inverse problem : : : : : : : : : : : : : : : : : : : 63
4.2 Trajectories : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68
5 Results and Discussion 73
5.1 Isolated Frames : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74
5.2 Trajectories : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75
5.3 Quantitative Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : 83
6 Conclusion 86
A Numeric Information 88
B Results for Isolated Frames 92
Bibliography 126
List of Publications 134
xvi
List of Tables
2.1 Sentences contained in the corpus : : : : : : : : : : : : : : : : : : : : 8
5.1 Numerical Results: Areas : : : : : : : : : : : : : : : : : : : : : : : : : 84
5.2 Numerical Results: Length : : : : : : : : : : : : : : : : : : : : : : : : 84
5.3 Numerical Results: Formants : : : : : : : : : : : : : : : : : : : : : : : 85
xvii
List of Figures
2.1 Spectral properties of the speech signal determined by dierent properties
of the vocal apparatus. Only the vocal-tract shape has major in uence on
the formant frequencies. : : : : : : : : : : : : : : : : : : : : : : : : : 7
2.2 Left: midsagittal prole tracing extracted from cineradiographic frame and
labial tracing extracted from video frame recorded synchronously. Center:
points sampled from the grid and from the labial region to represent the
prole. Right: Prole approximated by sampled points. : : : : : : : : : : 9
2.3 Procedure used to determine the midsagittal distance and the length of a
given section. The section length is determined by the distance between
the intersections of the bisection of the angle formed by the lines dened
by the section walls with the grid lines that delimit the section. The
midsagittal distance is dened as the distance between the intersections of
the line passing through the midpoint of the segment that determines the
section length, and orthogonal to it, with the section walls. Section walls
and grid lines are represented by thick and thin solid lines, respectively. In
the case shown in the left, the grid lines are part of a Cartesian region of
the grid, whereas in the case shown in the right, the grid lines belong to
the polar region of the grid. : : : : : : : : : : : : : : : : : : : : : : : : 10
2.4 Top: midsagittal distances of the prole shown in Fig. 2.2 plotted as a
function of the distance from the glottis. Center: area function estimated
with the model. Bottom: Log-area function sampled at uniformly
spaced points along the vocal-tract. : : : : : : : : : : : : : : : : : : : : 11
xviii
2.5 Comparison between vocal-tract transfer function computed from the area
function estimated from midsagittal distances, and spectral envelope of
the speech recorded synchronously. For the example shown in the left, the
spectral envelope obtained from the pre-emphasized speech signal (gray
line) has its formants (peak frequencies) matching fairly well the formants
of the transfer function derived from the area function (black line). How-
ever, in the case shown in the right, due to the relatively low energy of
the speech signal, the colored noise generated by the experimental appa-
ratus produces a spurious peak around 1.3kHz, considerably aecting the
estimated spectral envelope. : : : : : : : : : : : : : : : : : : : : : : : : 13
2.6 On each column the second panel from the top shows a comparison of
the transfer functions estimated from lossy (black solid line) and lossless
(dashed line) models of the vocal-tract area function. The gray line rep-
resents the speech power spectrum envelope. The speech signal, area
function, midsagittal distance and vocal-tract prole are also given as ref-
erences. Note that there is little discrepancy between lossy and lossless
formant frequencies. : : : : : : : : : : : : : : : : : : : : : : : : : : : 22
2.7 Formant frequency variation due to yielding walls and radiation load. Each
chart shows a histogram of the relative deviation of formants derived from
a lossless vocal-tract model with respect to formants derived from a lossy
model: (Flossless Flossy )=Flossy . The ordinates show the percentage of
points in the corpus whose relative deviation falls within the the abscissa
interval under a given bin. : : : : : : : : : : : : : : : : : : : : : : : : : 23
3.1 Area function for the French /i/, and the same area approximated by
truncated Fourier cosine series expansions (thick lines), compared with
the original area (thin lines). : : : : : : : : : : : : : : : : : : : : : : : : 27
3.2 Formant frequencies as a function of the number of Fourier cosine coe-
cients used to represent the are function of the French /i/. (Obs. Note
that N includes the 0th order term.) : : : : : : : : : : : : : : : : : : : 28
3.3 First three formants as functions of the rst 6 Fourier cosine coecients.
In each graph, all other coecients are kept equal to zero. A histogram of
the coecients of the log-area functions of the corpus analyzed is plotted
above each chart. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
xix
3.4 First and second formant frequencies as functions of rst and third Fourier
cosine coecients a1 and a3; when all other coecients are equal to zero. 31
3.5 Eigenvalues of the log-area covariance matrix. : : : : : : : : : : : : : : 33
3.6 Eigenvectors corresponding to the rst 5 eigenvalues obtained from the
decomposition of the log-area covariance matrix. All eigenvectors are nor-
malized to have unit Euclidean norm. The rst K = 32 components
correspond to the log-area along the tract; and the last component corre-
sponds to the tract length. The corresponding eigenvalue square root is
given as a reference to the \importance" of each eigenvector. : : : : : : 35
3.7 Area function approximations by Fourier cosine series expansion (dashed
line), and by statistically optimum eigenvalue expansion (solid line). The
thick solid line shows the original area. Above: expansion with 3 compo-
nents. Below: expansion with 5 components. : : : : : : : : : : : : : : 36
3.8 Vocal-tract length, lip area, and alveopalatal area trajectories along the
sentence (in French): Ma chemise est roussie. The dashed lines show the
original measured trajectories, while the solid lines show the trajectories
parametrized by the model proposed here. For each case, the mean and the
standard deviation values of the relative dierence (in percentage) between
parametrized and original trajectories are also shown. : : : : : : : : : : : 37
3.9 (a) Parametric subspace determined by the rst two components of .
(b) Points corresponding to realistic area functions (articulatory space).
(c) The same points shown in a coordinate system with \less dependent"
components. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 40
3.10 (a) Normalized histograms of the rst 3 formant log-frequencies corre-
sponding to an articulatory space lled with approximately uniformly dis-
tributed points. (b) Histograms of the variables obtained after indepen-
dent component analysis (ICA) of the formant log-frequencies. (c) and
(d) Scatter plots of the rst 2 variables shown in (a) and (b), respectively. 42
3.11 Basis vectors (rows of Ty ) and mean vector (y ) used to represent log-
area vectors (y). Units: the rst K = 32 components are log-areas along
the tract expressed in log(cm2 ). The last component is the tract length
expressed in normalized units (1 unit = 0.53cm for the basis vectors and
1 unit = 200.53cm for the mean vector). : : : : : : : : : : : : : : : : 45
xx
3.12 Scatterings representing the joint distributions of the components of the
acoustic variable h and the components of the articulatory variable .
Note the high correlation between 1 and h1, and between 2 and h2. See
also the nonlinear relation between 1 and h3 . : : : : : : : : : : : : : : 47
3.13 First two acoustic components (h1; and h2) expressed as functions of the
rst two articulatory components ( 1 and 2 ), when all other components
( 3 ; 4 ; and 5 ) are equal to zero. Note that h1 is almost independent
of 2 , and that there are one-to-one relationships between h1 and 1 , and
between h2 and 2 . : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48
3.14 Articulatory and acoustic component trajectories along the sentence (in
French): Ma chemise est roussie. Note the similarity between the rst
two articulatory trajectories and the rst two acoustic trajectories. (The
dashed lines in the acoustic trajectories indicate the intervals where the
formants cannot be reliably extracted from the speech signal due to very
narrow constrictions in the area function.) : : : : : : : : : : : : : : : : 49
3.15 Eigenvalues of the \covariance" matrix of sequences of parametrized
log-area vectors, and corresponding rst four eigenvectors. : : : : : : : : 52
3.16 (a) Sequence of area functions, taken from the corpus, corresponding to
the diphthong /ui/, uttered in the (French) sentence \Luis pense a ca."
(b) Sequence of areas reconstructed from the parametric principal com-
ponent representation of the original areas shown in (a). (c) Sequence
of areas reconstructed from the parametric Fourier representation of the
original areas shown in (a). (d), (e) and (f) show formant frequency tra-
jectories corresponding to the sequences of areas shown in (a), (b) and
(c) respectively. The dashed lines shown in (e) and (f) are the original
formant trajectories shown in (d). For each pair of formant trajectories,
the maximum relative dierence (in percentage) is also shown. : : : : : : 53
4.1 Representation of the one-to-one relationship between the N M dimen-
sional subspaces that form the N dimensional articulatory space. Compare
with gures 4.2 and 4.3, where the level curves are N M = 1 dimen-
sional subspaces (contained in an N = 2 dimensional articulatory space)
which map onto an M = 1 dimensional acoustic space. : : : : : : : : : 55
xxi
4.2 Top left: the rst formant F1 as a function of the Fourier cosine coe-
cients a1 and a2; when all other coecients are equal to zero. Top right:
paraboloidal surface representing the cost function P a = at Ha a used to
quantify the vocal-tract eort. Bottom: The solid thick lines show level
curves of the surface shown in the top left panel. (Compare with the gen-
eral case in Fig. 4.1.) The solid thin ellipses show level curves of the cost
function shown in the top right panel. The dashed circles represent the
particular case when P y is an unweighted squared Euclidean distance (i.e.
when Hy is an identity matrix.) : : : : : : : : : : : : : : : : : : : : : : 58
4.3 Top left: the rst acoustic component h1 as a function of the principal
component coecients 1 and 2 ; when all other coecients are equal
to zero. Top right: paraboloidal surface representing the cost function
P = tH used to quantify the vocal-tract eort. Bottom: The solid
thick lines show level curves of the surface shown in the top left panel.
(Compare with the general case in Fig. 4.1.) The solid thin ellipses show
level curves of the cost function shown in the top right panel. The dashed
ellipses represent the particular case when P y is an unweighted squared
Euclidean distance (i.e. when Hy is an identity matrix.) : : : : : : : : : 59
5.1 Results obtained with the inversion technique for isolated frames. In each
column, the central panel shows original (thin line) and estimated area
(thick line). The estimated area is obtained from the formant frequencies
determined by the original area. Vocal-tract prole, midsagittal distances,
transfer functions and speech signal are also shown for reference purposes.
From left to right the columns correspond to the neutral, and French /a/,
/i/, and /u/ vowels. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 76
5.2 Problems with the inversion procedure. The columns show the following
cases. Left: French /a/ with excessively open lips. Center-left: French
/u/ with excessively large front cavity. Center-right: French /i/ with
excessively short length. Right: French /e/ with excessively closed lips
compensating underestimated length. As in Fig. 5.1, in the central panel
of each column, the thin line is the original area and the thick line is the
area estimated from the formant frequencies determined by the original
area. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77
xxii
5.3 Top: Sequence of area functions, taken from the corpus, corresponding
to the diphthong /ui/, uttered in the French sentence \Luis pense a ca."
Bottom: Formant frequency trajectories corresponding to the sequences
of areas shown in the top panel. : : : : : : : : : : : : : : : : : : : : : 79
5.4 Top: Sequence of areas estimated from the formant trajectories shown
in the bottom panel of Fig. 5.3, under continuity and minimum eort
constraints. Bottom: The solid lines show the formant frequency trajec-
tories corresponding to the sequences of areas shown in the top panel.
The dashed lines reproduce the original formant trajectories shown in the
bottom panel of Fig. 5.3. : : : : : : : : : : : : : : : : : : : : : : : : : 80
5.5 Top: Sequence of area functions estimated from the formant trajectories
shown in Fig. 5.3 under the following constraint: The areas are represented
by the rst six components of its Fourier cosine series expansion with the
even coecients set to zero. Bottom: The solid lines show the formant
frequency trajectories corresponding to the sequences of areas shown in
the top panel. The dashed lines reproduce the original formant trajectories
shown in the bottom panel of Fig. 5.3. : : : : : : : : : : : : : : : : : : 81
5.6 Top: Sequence of area functions estimated from the formant trajectories
shown in Fig. 5.3 under the following constraint: The areas are repre-
sented by the rst nine components of its Fourier cosine series expansion
determined under morphological and continuity constraints. Bottom: The
solid lines show the formant frequency trajectories corresponding to the
sequences of areas shown in the top panel. The dashed lines reproduce
the original formant trajectories shown in the bottom panel of Fig. 5.3. : 82
xxiii
5.7 (a) Scattering of the cross-sectional areas obtained from the parametric
principal component representation of the original areas plotted against
their original counterparts. The scattering of the formant frequencies de-
rived from the areas is shown in (e). (b) Cross-sectional areas estimated
from formant vectors in the case of isolated frames plotted against original
areas. The formant frequencies derived from the areas are plotted in (f).
(c) Cross-sectional areas estimated from formant vector trajectories plot-
ted against original areas. The formant frequencies derived from the areas
are plotted in (g). (d) Cross-sectional areas estimated from isolated frames
plotted against areas estimated from formant vector trajectories. The for-
mant frequencies derived from the areas are plotted in (h). The correlation
coecients are given in the top right corner. The 258 oral vowel frames
available in the corpus were used to generate the scatterings. : : : : : : 85
xxiv
Chapter 1
Introduction
\A voice cannot carry the tongue
and the lips that gave it wings.
Alone must it seek the ether."
Gibran Kahlil Gibran (1883{1931)
The Prophet
The speech production process is the result of the combination of articulatory

movements acting on and interacting with the air owing from the lungs to the mouth.
The part of the acoustic eects of such actions and interactions that is radiated
(mainly) through the lips and nostrils constitutes the speech signal (Flanagan, 1972).
An ecient representation of speech is fundamental for obtaining good perfor-
mance in applications such as speech synthesis, coding, and recognition (Rabiner and
Schafer, 1978). While parametrization of the speech waveform itself is often sucient,
it does not explicitly give speech features like spectral envelope and pitch informa-
tion that are important, for example, in speech recognition systems (Rabiner and
Juang, 1993). Moreover, waveform parametrization allows only a limited degree of
compression of the speech signal (Jayant and Noll, 1984).
1
2 CHAPTER 1. INTRODUCTION
Higher levels of compression, as well as a clearer representation of meaningful

speech acoustic parameters, can be achieved by modeling the physical process that
generates the speech signal. Today, most of the methods following this line are based
on linear prediction theory (Markel and Gray, 1976). Such methods are very ecient
in parametrizing short intervals (frames) of speech. However, since speech acoustic
parameters can vary abruptly, smoothness constraints, in general, can not be imposed.
If such constraints were possible, they could be invoked to improve the accuracy of
parameter estimation (especially under adverse conditions), to attain even higher
compression levels, and to simplify models for speech synthesis.
Although, in general, smoothness constraints can not be imposed on acoustic pa-
rameters, they can be used successfully to characterize vocal-tract articulatory move-
ments (Perkell, 1969). Therefore, assuming that articulatory synthesis (synthesis of
speech based on vocal-tract conguration parameters) is a goal that can, in principle,
be achieved (Maeda, 1982; Sondhi and Schroeter, 1987; Scully, 1990; Lin, 1990), the
use of articulatory parameters can be very useful for an ecient representation of the
speech process (Flanagan, 1972).
If speech is to be represented by articulatory parameters, then, besides developing
methods to generate speech from such parameters (the direct problem), it is necessary
to be able to estimate the vocal-tract conguration from the speech signal (the inverse
problem). This includes determination of subglottal and glottal conditions (voice
source), vocal-tract shape and losses, and radiation load. This study focuses on the
estimation of the vocal-tract shape, which is the primary determinant of the formant
structure of the speech signal (Fant, 1980).
The extraction of geometrical characteristics of the vocal-tract from its acoustic
features has been discussed in several previous studies: Schroeder (1967) analyti-
cally described the relationship between the singularities (poles and zeros) of the
vocal-tract admittance measured at the lips and the vocal-tract cross-sectional log-
area function (represented by its Fourier cosine series expansion). The analysis was
performed for variations within the limit of applicability of rst order perturbation
theory. For larger variations, Mermelstein (1967) developed a numerical procedure
to estimate the area function (parametrized by the rst 6 coecients of its Fourier
cosine series expansion) from the admittance singularities. He showed that the for-
mant frequencies, which correspond to the admittance poles, are not sucient to
3
uniquely determine the log-area function. The remaining necessary information can
be obtained from the admittance zeros but, unfortunately, these cannot be estimated
from the speech signal. Schroeder (1967) developed then an experimental apparatus
to measure the vocal-tract admittance at the lips and, using a frequency domain ap-
proach, was able to determine good approximations for the area function. However,
the problem of estimating the area function from the speech signal still remained to
be solved.
With the advent of linear prediction theory applied to speech (Itakura and Saito,
1968; Atal and Hanauer, 1971), Wakita (1973, 1979) developed an inverse ltering
technique to estimate the vocal-tract area function from the speech waveform. How-
ever, that technique makes use of information about voice source, loss distribution,
tract length, and lip radiation that can not be assumed to be accurately known a
priori. In fact, Sondhi (1979) showed that the speech signal alone does not con-
tain enough information for a unique determination of the vocal-tract area function,
conrming the conclusions of Mermelstein (1967) and Schroeder (1967).
Thus, on the one hand, in order to achieve a practical system of articulatory speech
representation, it is necessary to obtain the vocal-tract shape from the speech signal.
On the other hand, the speech signal itself does not contain enough information to
uniquely determine such shape. Therefore, it is necessary to constrain the universe of
possible tract congurations, so that the problem can be eciently solved. Since the
speech signal is assumed to be produced by a human vocal-tract, the human physi-
ology can be invoked as a natural possibility for a constraint formulation. In other
words, vocal-tract data obtained from acoustic (Sondhi and Resnick, 1983; Yehia et
al., 1995a, 1995b; Yehia and Itakura, 1995a), X-ray (Bothorel et al., 1986), magnetic
resonance imaging (MRI) (Baer et al., 1991; Tiede et al., 1996; Tiede and Yehia, 1996;
Yehia and Tiede, 1997), electromagnetic midsagittal articulometer (EMMA) (Perkell
et al., 1992; Yehia et al., 1996), or any other kind of tract measurement can be used
(directly or indirectly) as prior information for the vocal-tract shape estimation from
the speech signal.
Following this line, various frameworks have been formulated to combine the
acoustic information contained in the speech signal with the constraints determined
by the human physiology. A computer sorting technique followed by a ne optimiza-
tion procedure was used by Atal et al. (1978) and, in a more elaborate model, by
4 CHAPTER 1. INTRODUCTION
Schroeter and Sondhi (1991). Model matching techniques were used by Flanagan et
al. (1980) and by Shirai and Kobayashi (1986). Shirai (1993) also proposed a neural
network approach for the estimation of articulatory motion. Another connectionist
approach, making use of a control theory framework (whose principle was rst pro-
posed by Jordan, 1990) was presented by Bailly et al. (1991). As a nal example,
McGowan (1994) considered the use of genetic algorithms, obtaining interesting re-
sults for the dynamic case of the inverse problem. (A much more complete description
of the techniques developed to estimate vocal-tract shapes from the speech signal can
be found in Schroeter and Sondhi, 1994.)
The approaches cited above have two points in common: the rst one is that,
during the optimization procedure, acoustic and shape parameters are represented
in distinct spaces. This fact, besides resulting in a large number of optimization pa-
rameters, often leads to the problem of a cost function with local minima (Schroeter
and Sondhi, 1991). The second point is that an explicit articulatory model is al-
ways used. The problem here is that the design of articulatory models is normally
oriented toward the direct problem of speech production (Coker and Fujimura, 1966;
Mermelstein, 1973; Coker, 1976; Maeda, 1990). Although such models can be suc-
cessfully used in the inverse problem, problems of redundancy and ambiguity may
occur (Gupta and Schroeter, 1993).
Within the present study, these two problems are avoided. The rst, by mapping
acoustic and shape parameters into a common space. The second by using only the
statistical behavior of the vocal-tract, rather than an explicit articulatory model, to
formulate a cost function (Yehia and Itakura, 1996; Yehia, Takeda and Itakura, 1996).
In the case of the method proposed here, which can be included in the model
matching category, the following series of steps must be carried out:
Acquisition of vocal-tract morphological, dynamic and acoustic information.
Parametrization of the articulatory and acoustic spaces.
Representation of the mapping from the articulatory space onto the acoustic
space.
Formulation of a cost function which quanties morphological and dynamic
constraints.
5
Combination of acoustic, morphological and dynamic information to solve the

inverse problem.
These topics are addressed one by one in the following chapters. The main point
is the mapping from the articulatory onto the acoustic space. These spaces are appro-
priately represented such that the resulting mapping is simple enough to support a
one-to-one approximately linear relationship between the components of a subspace of
the articulatory space and the corresponding components of the acoustic space. This
fact is then exploited to nd a plausible solution for the restricted case of the inverse
problem considered here. The results obtained are then evaluated and interpreted.
Chapter 2
Area Function and
Formant Frequencies
\Everything ows."
Heraclitus (535{475 BC)
On Nature
The Area function and formant frequencies play an important role in the study
of speech production: they form a bridge between articulatory congurations of the
vocal-tract and acoustic characteristics of speech. Formant frequencies are primarily
determined by the vocal-tract shape, with little in uence from other articulatory
factors (Flanagan, 1972, pp. 58{69). This is in contrast with other spectral properties
of the speech signal, over which factors other than the vocal-tract shape can also have
considerable in uence (see Fig. 2.1).
For the case of sound plane wave propagation, the cross-sectional area along the
vocal-tract (the area function) is the geometric property of the vocal-tract shape that
determines the formant frequencies. But the converse does not hold; that is, the
formant frequencies do not in turn uniquely determine the area function.
In this chapter, the relationship between area function and formant frequencies
will be analyzed. The objective is to determine the amount of information about the
area function contained in the formant frequencies, and to characterize the mapping
between the spaces formed by the area function and by the formant frequencies. These
pieces of information are important in obtaining a consistent base to solve the inverse
problem.
6
7
Formant Vocal-tract
Frequencies Shape
Formant Wall and viscous

Bandwidths losses
Speech
Signal Spectral Tilt Radiation Load
Harmonic Exciatation
Structure Pulse
Energy
Figure 2.1: Spectral properties of the speech signal determined by dierent properties
of the vocal apparatus. Only the vocal-tract shape has major in uence on the formant
frequencies.
8 CHAPTER 2. AREA FUNCTION AND FORMANT FREQUENCIES
2.1 Corpus
The corpus used in this study consists of cineradiographic data described in Both-
orel et al. (1986). (More details about the capabilities and limitations inherent to
X-ray measurements of the vocal-tract can be found in Fant (1970), and in Chiba and
Kajiyama (1941).) The procedure used to estimate the area functions from the cinera-
diography is described here. The analysis starts from digital tracings of midsagittal
proles corresponding to ten French sentences (listed in Table 2.1) as spoken by a
female subject (PB), acquired at a rate of 50 frames per second. The corresponding
labiograms were acquired simultaneously. A sample is shown in Figure 2.2.
Table 2.1: Sentences contained in the corpus
Ma chemise est roussie.
Voila des bougies.
Donne un petit coup.
Une reponse ambigue.
Louis pense a ca.
Mets tes beaux habits.
Une pâte a choux.
Prête-lui seize ecus.
Chevalier du gue.
Il fume son tabac.
2.1.1 Sampling vocal-tract proles

In order to convert the original proles into area functions, a sequence of transfor-
mations is necessary. At rst, each midsagittal prole is plotted on a semipolar grid,
using the hard palate as reference (Heinz and Stevens, 1964; Maeda, 1990). The grid
lines are spaced by 0.5cm in the linear regions, and by 11 degrees in the polar region.
Lip and larynx regions are specied in a special manner: The lips are modeled by
a uniform elliptical tube whose shape is determined from the labiogram, and whose
length (protrusion) is dened as the distance from the upper incisors to the point of
minimum separation between upper and lower lips. The laryngeal cavity is modeled
as a trapezoidal tube (with circular cross-section) dened by the two points where
the tract prole intersects the sixth line of the semipolar grid, and by the lowest left
2.1. CORPUS 9
Digitized Profile Sampled Points Regenerated Profile

15 15 15
PB0146 PB0146 PB0146
25 20
30
10 10 10
15
cm
10
5 5 5
5
0
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm
Figure 2.2: Left: midsagittal prole tracing extracted from cineradiographic frame and
labial tracing extracted from video frame recorded synchronously. Center: points sam-
pled from the grid and from the labial region to represent the prole. Right: Prole
approximated by sampled points.
and right points of the tract prole. This procedure, illustrated in the central panel
of Fig. 2.2, allows the representation of all proles by the same number of points.
In the present case, there are 29 pairs of points, each pair containing a point on the
anterior wall and a point on the posterior wall of the tract. The approximation of the
prole by the segments joining these points is shown in the right panel of Fig. 2.2.
2.1.2 From proles to midsagittal distances

The next step is to represent the set of points plotted on each midsagittal prole by
the corresponding midsagittal distances, plotted as a function of the position along
the vocal-tract. If the midsagittal distance is interpreted as the distance between the
points where an ideal longitudinal sound wavefront propagating in the tract \touches"
the anterior and posterior walls of the prole, then an appropriate geometric procedure
to represent each section of the vocal-tract by a midsagittal distance and a section
length is as follows: each of the 28 sections dened by the 29 pairs of points sampled
from a given prole is seen as part of an innite conical horn (or a cylindrical horn,
if the walls are parallel). The direction of propagation of the wavefront in this `horn'
is determined by the line that bisects the angle formed by the lines containing the
anterior and posterior wall proles of the section (see Fig. 2.3). The intersection
points of this line with the grid lines that dene the section determine a segment
whose length will be taken as the section length. Finally, the intersection points of
the line orthogonal to this segment passing through its midpoint with the anterior and
posterior wall proles determine a segment whose length is taken as the midsagittal
distance of the section. The top panel of Figure 2.4 shows the midsagittal distance
plotted as a function of the distance from the glottis. The vocal-tract length is taken
as the sum of the section lengths of all sections.
This procedure follows the same physical principle adopted in Maeda (1972), but
with a dierent geometrical construction. Alternative procedures can be found, for
example, in Fant (1960), Beautemps et al. (1995), and Maeda(1990).
Grid Lines
Sagittal Distances
Section Lengths
Figure 2.3: Procedure used to determine the midsagittal distance and the length of a
given section. The section length is determined by the distance between the intersections
of the bisection of the angle formed by the lines dened by the section walls with the grid
lines that delimit the section. The midsagittal distance is dened as the distance between
the intersections of the line passing through the midpoint of the segment that determines
the section length, and orthogonal to it, with the section walls. Section walls and grid
lines are represented by thick and thin solid lines, respectively. In the case shown in the
left, the grid lines are part of a Cartesian region of the grid, whereas in the case shown in
the right, the grid lines belong to the polar region of the grid.
2.1. CORPUS 11
Midsagittal Distance along the Vocal−Tract

2
Midsag. Dist. (cm)
0
0 5 10 15
Cross−Sectional Area along the Vocal−Tract
4
Area (cm )
2
0
0 5 10 15
Log−Area Function along the Vocal−Tract
ln(Area) (ln(cm ))
2
2
−2
0 5 10 15
Distance from Glottis (cm)
Figure 2.4: Top: midsagittal distances of the prole shown in Fig. 2.2 plotted as a function
of the distance from the glottis. Center: area function estimated with the model.
Bottom: Log-area function sampled at uniformly spaced points along the vocal-tract.
2.1.3 From midsagittal distances to area function

The midsagittal distances are now transformed into the corresponding cross-sectional
areas (Central panel of Fig. 2.4), using the \ model" originally proposed by
Heinz and Stevens (1964)
A(x) = (x)d(x)(x); (2:1)
where x is the distance from the glottis, A(x) and d(x) are respectively the cross-
sectional area and the midsagittal distance at x, and (x) and (x) are coecients
determined in an ad hoc manner. The values used here taken from Shinji Maeda
(1990).
The problem with such a transformation is that the two-dimensional information
contained in the midsagital prole is not enough to obtain an accurate estimation of
the area function. Therefore, except for the lip region, where the labiogram provides
the necessary information, there may exist non-negligible discrepancies between the
real and the estimated area functions. Even when a more elaborate model, such as
those proposed by Perrier et al. (1992) and by Beautemps et al. (1995), is used, it
is impossible to eliminate the discrepancies. This is the main reason why, for a given
frame, the formants extracted from the speech signal do not match exactly those
numerically derived from the estimated area function (see Fig. 2.5). Other reasons
for this mismatch are errors in formant measurement from speech, and inaccuracies
in the physical model used to calculate the formants from the area function. In order
to avoid such discrepancies, the corpus of formant frequencies used in this and in
the next chapters consists of the formants numerically derived from the corpus of
estimated area functions (see Section 2.2.3), and not of those extracted from the
corresponding speech signal. By doing so, it is assured that the inaccuracies that will
appear in the results shown in the next chapter are inherent in the method proposed
to solve the inverse problem, and do not depend on the factors above. Admittedly, in
order to work with real speech, it is necessary to analyze such factors, but this task
will not be carried out here.
2.1.4 Log-area function

Instead of working directly with the cross-sectional area along the tract, each area is
transformed into a log-area vector, as shown in the bottom panel of Fig. 2.4. This
2.1. CORPUS 13
1 Speech Signal PB1549 1 Speech Signal PB1559

0 0
−1 −1
0 5 10 15 20 0 5 10 15 20
Time (ms) Time (ms)
Power Spectrum (dB) PB1549 Power Spectrum (dB) PB1559
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
Frequency (kHz) Frequency (kHz)
10 10
Area Function PB1549 Area Function PB1559
Area (cm2)
Area (cm2)
5 5
0 0
0 5 10 15 0 5 10 15
Distance from Glottis (cm) Distance from Glottis (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsagittal Distances PB1549 Midsagittal Distances PB1559
4 4
2 2
0 0
0 5 10 15 0 5 10 15
15 15
Vocal−Tract Profile PB1549 Vocal−Tract Profile PB1559
25 20 25 20
30 30
10 10
15 15
cm
cm
10 10
5 5
5 5
0 0
0 0
0 5 10 15 0 5 10 15
cm cm
Figure 2.5: Comparison between vocal-tract transfer function computed from the area
function estimated from midsagittal distances, and spectral envelope of the speech
recorded synchronously. For the example shown in the left, the spectral envelope ob-
tained from the pre-emphasized speech signal (gray line) has its formants (peak frequen-
cies) matching fairly well the formants of the transfer function derived from the area
function (black line). However, in the case shown in the right, due to the relatively low
energy of the speech signal, the colored noise generated by the experimental apparatus
produces a spurious peak around 1.3kHz, considerably aecting the estimated spectral
envelope.
transformation not only assures that the area function will always be positive, but
is also more meaningful from an acoustic point of view, since the area ratios, rather
than their values, determine the formant frequencies (see, for example, Mermelstein
(1967)). The transformation is simply the natural logarithm of the area, with areas
smaller than a given threshold ( = 5mm2 in this study) being clipped to the
threshold value to avoid numerical problems with closures. Such clipping, however,
does not lead to signicant inaccuracy from either an articulatory or acoustic point
of view. In the case of vowels, it was observed that the minimum area is not less than
25mm2, and that, even in the case of fricative sounds, the minimum area is not less
than 15mm2. Basically, areas are smaller than the threshold = 5mm2 only in the
case of closures.
Due to the procedure used to plot the midsagittal proles on the semipolar grid,
the 29 points shown in the top and central panels of Fig. 2.4 are not evenly spaced.
Even spacing allows a simpler representation of the cross-sectional area along the
tract since it can then be described by a vector of log-areas plus the vocal-tract
length. For this reason, using linear interpolation, the log-area along the vocal-tract
is resampled so that it can be represented by K = 32 uniform sections, as illustrated
by the stair-step graph shown in the bottom panel of Fig. 2.4.
Each log-area function present in the corpus, when approximated by a concatena-
tion of uniform tubes of equal length, can now be represented by a vector containing
the natural logarithm of the section areas and the tract length. In this study, the
following notation will be used
t
yi =
h i
y1i : : : yKi yK+1i ; i = 1; : : : ; P ; (2.2)
yki = ln Ai((k 21 ) LKi ); k = 1; : : : ; K ;
yK+1i = Li;
where Li is the tract length of frame i, expressed in units normalized so that the
variance of yK+1 is equal to the largest variance of the rst K components of y; Ai(x)
is the cross-section area of frame i at distance x from the glottis; K = 32 is the
number of uniform sections present in each area function; and P = 519 is the number
of vectors present in the corpus.
2.2. COMPUTATION OF THE FORMANT FREQUENCIES 15
2.2 Computation of the formant frequencies

In order to study the relationship between speech acoustic and vocal-tract geometric
(articulatory) parameters, the rst step is to understand the physical basis of the
process.
2.2.1 Lossless tube

A very simplied model of the vocal-tract is a rigid, lossless tube, whose cross-sectional
area varies along its length. If, moreover, sound is assumed to propagate in longitu-
dinal plane waves, the acoustic pressure inside the tract is governed by the Webster's
horn equation (Webster, 1919; Eisner, 1966)
d [A(x) dP ] + A(x)P = 0; = !2 ; (2:3)
dx dx c2
with boundary conditions
dP = 0

dx

(2:4)
x=0
representing a closed glottis, and
P(L) = 0 (2:5)
representing open lips without radiation load. Here, x is the distance from the glottis
along the tract, P and ! are respectively the amplitude and the angular frequency of
the sound pressure (for a sinusoidal time dependence), A(x) is the cross-sectional area,
L is the tract length, and c is the velocity of sound in the air (inside the vocal-tract).
The formant frequencies are dened as the resonant frequencies (or eigenfrequen-
cies) of the tract p
c m
Fm = 2 ; (2:6)
where m is the m-th eigenvalue of Eq. (2.3) under the boundary conditions dened
by Eq. (2.4) and Eq. (2.5). Therefore, for a given vocal-tract length L and a given
set of boundary conditions, the formant frequencies are basically determined by the
cross-sectional area function A(x). In fact, rewriting Eq. (2.3) as
d2P + d [ln A(x)] dP + P = 0; (2:7)
dx2 dx dx
it is possible to see that the eigenvalues, and hence the formant frequencies, depend
on the logarithm of the area rather than on the area function itself (as already stated).
At this point, it is interesting to return to the works of Schroeder (1967) and

Mermelstein (1967) mentioned in the introduction, and explain them in terms of
the Webster's equation. Schroeder (1967) showed, within the limits of rst order
perturbation analysis, that the m-th formant frequency is directly related to, and
only to, the m-th odd coecient of the Fourier cosine series of the log-area function.
So, even if the whole (innite) set of formant frequencies were known, only \half
of the information" (odd terms) necessary to determine the area function would be
available. The remaining information could be obtained from the complete set of
eigenvalues obtained under the boundary conditions
dP = 0 and dP = 0;

dx x=0

dx x=L

(2.8)
representing a closed glottis and closed lips, respectively. Under these conditions, to
rst order perturbation theory, the m-th eigenvalue is linearly related to, and only
to, the m-th even coecient of the Fourier cosine series of the log-area function.
Mermelstein (1967) then veried experimently that, even for larger variations, the
(nite) set composed of the rst 2M coecients of the Fourier cosine series expansion
of the log-area function has a one-to-one relationship with the set composed of the rst
M eigenvalues obtained under the closed glottis and open lips condition, together with
the the rst M eigenvalues obtained under the closed glottis and closed lips condition.
However, as already stated in the introduction, the latter set of eigenvalues, which
correspond to the admittance zeros at the lips, can not be obtained from the speech
signal.
2.2.2 Lossy tube

The Webster's equation describes the relationship between formant frequencies and
the log-area function for a highly simplied vocal-tract model. When a more realistic
model is considered, factors like nonplanar wave propagation, viscous and thermal
losses, glottal impedance, radiation load, and yielding walls must be taken into ac-
count. Nevertheless, it is interesting to note that, although all those factors do aect
the spectrum of speech produced by the tract, most of them have almost no in uence
on the formant frequencies. Moreover, the factors that in uence the formant frequen-
cies can have their eects approximately compensated for by simple transformations.
Brie y examining these factors one by one, it is possible to say that: (i) considering
that the cross-sectional dimensions of the tract normally do not exceed 4 or 5cm, and
since c ' 350m/s inside the tract, there are no transversal resonance modes below
about 3.5 to 4kHz. Therefore, at least for the rst 3 formant frequencies (which are
normally below 3.5kHz), the plane wave propagation assumption is valid (Sondhi,
1974). (ii) Viscous and thermal losses do aect the formant bandwidths, but have
little eect on the formant frequencies (Flanagan, 1972, pp.58{61). (iii) The glottal
boundary condition has strong in uence on the spectral tilt and on the lower formant
bandwidths, but its eect on the formant frequencies is of the order of 1% (for voiced
speech). Thus, if only the formant frequencies are to be considered, the closed glottis
is a good approximation for the glottal boundary condition, at least in the case of
vowels (Flanagan, 1972, pp.63{65). (iv) The approximate eect of the radiation load
at the lips on the formant frequencies is to lower them by a factor of
s
3L ; a = A(L) ;
3L + 8a
where à' is the eective lip radius and L is the tract-length (Flanagan, 1972, pp.61{
63). (v) Finally, the eect of wall vibration on the formant frequencies is to increase
them. Such an eect becomes weaker for higher frequencies, and can be approximately
expressed by
!2m ' (406)2 + !^ 2m; (2:9)
where !m and !^m are the angular frequencies of the m-th formant of a tract with
exible and rigid walls, respectively (Sondhi, 1974).1 More elaborate models do exist
(Maeda, 1982; Sondhi and Schroeter, 1987) as seen in the next section, but the main
point here is that formant frequencies and the log-area function are basically related
by Eq. (2.7).
Such a relationship (well analyzed in Fant, 1980), relatively independent of other
factors, justies using the formant frequencies to parametrize the acoustics of the
speech signal, and the log-area function to represent the geometry of the vocal-tract.
The point here is that the formant frequencies can be determined from the log-area
function, even when a lossy model is considered. This is in contrast with other acoustic
1 In his work, Sondhi derived an equation in the same format of the Webster's equation, but taking
into account yielding walls, viscous and thermal losses. It was shown that the formant frequencies
of a lossy model and those of a lossless model are approximately related by Eq. (2.9).
parameters present in the speech signal, such as formant bandwidths and spectral
tilt, which result from the combination of the tract shape with other factors, such as
glottal excitation, yielding walls, and radiation load.2 However, as already stated, the
formant frequencies do not contain enough information to uniquely determine the area
function. In Chapter 3, information about the vocal-tract structure (morphology) will
be used to reduce the ambiguity that arises from the lack of information in the formant
frequencies about the area function.
2.2.3 Numerical determination of formant frequencies

Although existent for particular cases (Salmon, 1946; Eisner, 1966), there is no an-
alytical solution for the Webster's equation (Eq. 2.3) when the area A(x) along the
vocal-tract is an arbitrary function. For this reason numerical procedures are used
to determine the formant frequencies (eigenfrequencies) associated with a given area
function.3
The method adopted here approximates the vocal-tract by a concatenation of
uniform lossy tubes (Kelly and Lochbaum, 1962; Sondhi and Schroeter, 1987; Scroeter
and Sondhi, 1991). For the particular case of a uniform tube,4 the (one-dimensional)
sound wave equation can be solved analytically. From the solution, it is possible to
express, in the frequency domain, pressure and volume velocity at one end of the tube
as the product of a matrix by pressure and volume velocity at the other end. For the
K uniform sections that approximate the vocal-tract, this relation can be written as
2 3 2 32 3 2 3
Pk 1 A B Pk P
4 5= k k
4 54 = Kk k ;
5 4 5 k = 1; : : : ; K ; (2.10)
Uk 1 Ck Dk Uk Uk
where Pk , Uk , Pk 1 and Uk 1 are pressure and volume velocity at the section ends
closer to the lips and closer to the glottis respectively. Using the model for losses and
2 The use of additional acoustic information, such as formant bandwidths is, in principle, very
dicult to handle. It is so because, up to the author's knowledge, losses due to the vocal-tract and
to the glottal source can not be well separated. Moreover, the correct estimation of the bandwidths
is a task considerably more dicult to carry out than the estimation of formant frequencies (which,
sometimes, are also dicult to estimate, as in the case of the high-pitched voice of female and child
speakers).
3 Variational and perturbation methods were also tested (Yehia and Itakura, 1993b). It was found
that the range of applicability of perturbation analysis does not cover the entire articulatory vowel
space. Variational analysis has shown to be more robust, but at a computaional cost higher than
that of the numerical procedure adopted here.
4 In V
alimaki and Karjalainen (1994), the interesting alternative approach of conical sections is
analyzed.
yielding walls described in Sondhi (1974) and in Sondhi and Schroeter (1987), the
entries of matrix Kk are given by
l ; c sinh l ;
! !
Ak = cosh Bk = (2.11)
c Ak c
A l ; l ;
! !
Ck = k sinh Dk = cosh
c c c
where
s
= + j! ; (2.12)
q
+ j!
= ( + j!)( + j!); (2.13)
q
= j!c1; (2.14)
= j!!02
(j! + a)j! + b + ; (2.15)
!02 = c2 ; (2.16)
AL
k w
Rw
a =
Lw
; (2.17)
b =
1 ; (2.18)
L C
w w
where Lw , Cw and Rw are mass, compliance and resistance of the tract walls per unit
length; a is the ratio of wall resistance to mass; b is the squared angular frequency of
the mechanical resonance; c1 is the correction for thermal conductivity and viscosity;
!0 is the lowest angular resonance frequency of the tract when it is closed at both
ends; c is the sound velocity inside the tract; is the air density; ! is the angular
frequency; Ak is the cross-sectional area of the section; and l = L=K is the section
length, equal to the vocal-tract length L divided by the number of sections K . The
numerical values used are
c = 3:5 104 (cm/s)
= 1:14 10 3 (g/cm3)
a = 130 (rad/s)
b = (30)2 (rad/s)2
c1 = 4 (rad/s)
!02 = (406)2 (rad/s)2 .
Now, if sound pressure and volume velocity at the lips (PK and UK ) are known, it is
possible to determine pressure and volume velocity at the glottis (P0 and U0) as
2 3 2 3
P0 K P
= Kk K :
Y
4 5 4 5 (2.19)
U0 k=1 UK
The formant frequencies are obtained by nding the maxima of the vocal-tract transfer
function dened by 20 log10(UK =U0). In order to compute this transfer function, the
sound pressure at the lips is expressed as
PK = Zr UK ; (2.20)
where Zr is the output impedance determined by the radiation load. The model used
to represent the load is an RL circuit in parallel (Flanagan, 1972, pp.36{38) with
Rr =
128 c ; (2.21)
(3)2 AKq
8 AK = c
and Lr =
3 c AK
: (2.22)
In the nal step, the volume velocity at the lips is arbitrarily set to UK = 1, and
Eq. 2.19 is used to obtain the volume velocity at the glottis U0. The transfer func-
tion is then given by 20 log10(U0). As already stated, the formant frequencies are
determined by the maxima of the transfer function, and can be found by numeric
search.
2.2.4 Comparison of lossless and lossy models

Before proceeding, it is interesting to compare the formant frequencies corresponding
to lossy and lossless area functions of same length and shape. The objective is to nd
to what degree losses in uence formant frequencies for case of the areas that compose
the cineradiographic corpus analyzed here. Figure 2.6 shows detailed information for
two frames extracted from the corpus. In the power spectrum panels, the transfer
functions obtained from lossless (dashed lines) and lossy models (solid black lines)
of the area function shown below are plotted together with the spectral envelope of
the speech frame shown above (gray line).5 The vocal-tract prole and midsagittal
5 LPC analysis:
Frame length = 20ms; LPC order = 12; Hanning window; preemphasis coecient
= r(1)=r(0), where r(0) is the energy and r(1) the rst correlation coecient of the signal (Markel
and Gray, 1976, p. 216).
distance along the tract area are also plotted for reference purposes. As expected,
losses have stronger in uence on formant bandwidths than on formant frequencies.
Note also the imperfect matching between speech spectral envelope peaks and vocal-
tract transfer function peaks already discussed in Section 2.1.3. The small deviations
observed in the left column of Fig. 2.6 may be explained by inaccuracies in the physical
model adopted to describe yielding walls and radiation load. However, in the spectra
shown in the right column of Fig. 2.6, the larger deviation observed for the second
formant is more likely due to misestimation of the area function.
In order to get qualitative and quantitative information about the eects of losses
over the whole corpus, Fig. 2.7 shows histograms of the relative deviation of the rst
four formant frequencies obtained using a lossless vocal-tract model with respect to
formants obtained using a lossy model
Flossless Flossy
Flossy :
The gure illustrates that the absence of losses has the eect of increasing the fre-
quencies of the second, third and fourth formants. This is due to the nonexistence
of radiation load (see Section 2.2.2). The absence of yielding walls eects has little
eect on the second, third and fourth formants, but tends to substantially lower the
rst formant, which is also aected by the lack of radiation load.
Comment
In this chapter it was seen how to estimate the area function given a vocal-tract
prole; and how to compute formant frequencies given an area function. The problem
of estimating the area function given a set of formant frequencies is simplied if the
area function is expressed by appropriate parameters. This is the target of the next
chapter.

0 0
−1 −1
0 5 10 15 20 0 5 10 15 20
Time (ms) Time (ms)
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
10 10
Area (cm2)
Area (cm2)
5 5
0 0
0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4
2 2
0 0
0 5 10 15 0 5 10 15
15 15
25 20 25 20
30 30
10 10
15 15
cm
cm
10 10
5 5
5 5
0 0
0 0
0 5 10 15 0 5 10 15
cm cm
Figure 2.6: On each column the second panel from the top shows a comparison of the
transfer functions estimated from lossy (black solid line) and lossless (dashed line) models
of the vocal-tract area function. The gray line represents the speech power spectrum
envelope. The speech signal, area function, midsagittal distance and vocal-tract prole
are also given as references. Note that there is little discrepancy between lossy and lossless
formant frequencies.
Formant #1 Formant #2
25 25
20 20
Percentage (%)
Percentage (%)
15 15
10 10
5 5
0 0
−0.2 0 0.2 −0.2 0 0.2
Relative Deviation Relative Deviation
Formant #3 Formant #4
25 25
20 20
Percentage (%)
Percentage (%)
15 15
10 10
5 5
0 0
−0.2 0 0.2 −0.2 0 0.2
Relative Deviation Relative Deviation
Figure 2.7: Formant frequency variation due to yielding walls and radiation load. Each
chart shows a histogram of the relative deviation of formants derived from a lossless
vocal-tract model with respect to formants derived from a lossy model: (Flossless
Flossy)=Flossy . The ordinates show the percentage of points in the corpus whose relative
deviation falls within the the abscissa interval under a given bin.
Chapter 3
Parametric Models
for the Area Function
\You people speak in terms of circles
and ellipses and regular velocities|simple
movements that the mind can grasp|very
convenient|but suppose almighty God had
taken it into His head to make the stars move
like that... (He describes a irregular motion
with his nger through the air) ...then where
would you be?"
Bertold Brecht (1898{1956)
Galileo
The number of sections necessary to obtain a good approximation of the vocal-
tract log-area function by a concatenation of uniform tubes of equal length is con-
siderably larger than the dimension of the space composed by the log-area functions
that can be produced by the human vocal-tract. This space, from now on, will be
called the articulatory space. Two procedures are analyzed in this chapter to repre-
sent it by appropriate components. In the rst section, representation by a Fourier
cosine series is examined whereas a parametric statistical representation is seen in the
second section.
Another point that can be exploited in parametrizing the articulatory space is
the fact that the temporal behavior of the area function is subject to continuity
24
3.1. FOURIER ANALYSIS 25
constraints. In the last section of this chapter, time sequences of parametrized log-
area functions are represented by series expansions.
3.1 Fourier Analysis

The main reason to represent the log-area function by a Fourier cosine series (Davis,
1963, pp. 107{112) is the property pointed out by Mermelstein (1967) that the rst
M formant frequencies depend mainly on the rst 2M terms of the Fourier cosine
series expansion of the log-area function. Specically, the mth formant frequency
depends mainly on the (2m 1)th term. Also, except for some critical cases, it has
a one-to-one relationship with this term, when all other terms are kept constant.
In fact, except for the above mentioned critical cases, there is a one-to-one re-
lationship between the rst M formants and the rst M odd terms of the Fourier
cosine series expansion of the log-area function (Yehia and Itakura, 1993a; Yehia and
Itakura, 1996).
Due to the above reasons, whose importance will become clear along the text, the
rst N 2M log-area function Fourier cosine coecients1 will be adopted here to
parametrize the geometry of the vocal-tract. Mathematically it is dened2 transfor-
mations here as s
N 1
ln A(x) ' 2 ( pa0 + an cos nx );
X
L 2 n=1 L (3.1)
s
an = L2 ln A(x)cos nx
L
Z
0 L dx; (3.2)
a0 = p1
L
Z
ln A(x)dx; (3.3)
L 0
or, in a discrete form,
s
N 1
ln Ak ' K ( + ancnk ); cnk = cos[n K (k 21 )]; k = 1; : : : ; K; (3.4)
2 pa0 X
2 n=1
s
K
an = K2 ln Ak cnk ; Ak = A[(k 12 ) KL ]; n = 1; : : : ; N 1; (3.5)
X
k=1
K
a0 = p1
X
ln A ; (3.6)
K k=1 k
1 Here, Fourier cosine coecient means coecient of the Fourier cosine series, and not cosine
coecient of the Fourier series.
This is a convenient denition because of its properties of symmetry.
2
26 CHAPTER 3. PARAMETRIC MODELS FOR THE AREA FUNCTION
and, in a practical vector notation, area function and tract length can be put together
as (see Eq. 2.2),
y ' Uay a; (3.7)
a = Utay y; (3.8)
where
y = [ln A1; : : :; ln AK ; L]t; (3.9)
a = [a1; a3; : : : ; a2M 1; a0; a2; : : :; a2M ; : : :; aN 1; L]t; (3.10)
(The reason for this unusual component order will become clear in Section 4.1.2.)
2 3
6
c11 c31 : : : c2M 1 1 p12 c21 : : : c2M 1 : : : cN 1 1 0 7
s
6
6c12 c32 : : : c2M 1 2 p1 c22 : : : c2M 2 : : : cN 1 2 0 7
7
2 .. ... ... ...2 ... . . . ... ... ... :
6 7
Uay = 6
... ... (3.11)
7
K .
6 7
6 7
6 7
6
6
4
c1K c3K : : : c2M 1 K p1
2 c2K : : : c2M K : : : cN 1 K 0 7
7
5
0 0 ::: 0 0 0 ::: 0 ::: 0 1
In this discrete form, K is the number of uniform sections used to approximate the
area function. The area of each section is obtained by sampling the continuous area
function at the points xk = (k 12 ) KL ; where L is the tract length, and L=K is the
section length.
3.1.1 Truncation Eects

The eects of taking a nite number of Fourier coecients are illustrated in gures 3.1
and 3.2 for the French vowel /i/. Figure 3.1 shows the area function represented
by a concatenation of K = 32 tubes and, following it, the results obtained by its
approximation by N = 9; 8; : : : ; 3 and 2 Fourier cosine coecients. It is possible
to see in Figure 3.2 that the m-th formant frequency (calculated with the model
described in Section 2.2.3) reaches a value close to its true value when the area
function is approximated by the rst N = 2m terms (including the 0-th order) of the
corresponding Fourier cosine series.
It is also possible to see that, for an approximation of N = 9 terms, the maximal
deviation observed in the gure for the rst M = 3 formants is less than 3%, and so,
3.1. FOURIER ANALYSIS 27
less than the JND (just-noticeable dierence) found by Flanagan (1955). Although
this limit is exceeded in some frames of the corpus, from now on, when using Fourier
representation, the area function will be approximated by the rst N = 9 terms of
its log-area Fourier cosine series expansion, plus the vocal-tract length.
20 20 20
Original /i/
Area (cm2)
Area (cm2)
Area (cm2)
N=9 N=8
10 10 10
0 0 0
0 10 20 0 10 20 0 10 20
position (cm) position (cm) position (cm)
20 20 20
Area (cm2)
Area (cm2)
Area (cm2)
N=7 N=6 N=5
10 10 10
0 0 0
0 10 20 0 10 20 0 10 20
20 20 20
Area (cm2)
Area (cm2)
Area (cm2)
N=4 N=3 N=2
10 10 10
0 0 0
0 10 20 0 10 20 0 10 20
Figure 3.1: Area function for the French /i/, and the same area approximated by trun-
cated Fourier cosine series expansions (thick lines), compared with the original area (thin
lines).
3.1.2 Formant frequencies as functions of Fourier coe-

cients
Formant frequencies can now be interpreted as functions of log-area Fourier coe-
cients. With the objective of getting a better comprehension of the behavior of these
Formant Frequencies versus Fourier Coefficients

5
4
Formant Frequencies (kHz)
0
1 2 3 4 5 6 7 8 9 Real Area
Number of Fourier Coefficients (N)
Figure 3.2: Formant frequencies as a function of the number of Fourier cosine coecients
used to represent the are function of the French /i/. (Obs. Note that N includes the
0th order term.)
3.2. STATISTICAL ANALYSIS 29
functions, Figure 3.3 shows how the rst three formant frequencies vary with respect
to each of the rst six Fourier cosine coecients (excluding the zero-th order), when
all other coecients are set to zero. A histogram of the Fourier cosine coecients
of the areas contained in the corpus is plotted above each graph. Note that the rst
three formant frequencies vary almost linearly with the rst three odd Fourier cosine
coecients.
The joint in uence of the rst (a1) and third (a3) Fourier cosine coecients on
the rst and second formant frequencies, when all other coecients are set to zero,
is shown in Figure 3.4. a1 and a3 have dominant in uence (only) on the rst and
second formants, respectively.
3.2 Statistical Analysis

The objective of this section is to nd representations for both the log-area (articu-
latory) space and the formant frequency (acoustic) space so that
Each space be eciently represented by a small number of parameters.
The components of each space be as independent as possible.
The mapping between both spaces be as simple as possible.
These points will be analyzed one by one in the following sections.
3.2.1 Principal Component Analysis

Articulatory space
The relationship between Fourier cosine coecients and formant frequencies is indeed
interesting; however, for the case of the human vocal-tract, it cannot be said that the
parametrization by a truncated Fourier series is optimum. It is so because cosine
functions, which are the basis functions of a Fourier cosine series expansion, are
\general purpose functions" that, in principle, are not directly related to the vocal-
tract morphology.
In this section Principal Component Analysis (PCA) (Horn, 1985, pp. 411{455)
will be used to parametrize the articulatory space by an appropriate number of com-
ponents (Yehia, Takeda and Itakura, 1996). The procedure is as follows: given the
a1 a2
Freq. (kHz)
Freq. (kHz)
4 4
2 2
0 0
−5 0 5 −5 0 5
a3 a4
Freq. (kHz)
Freq. (kHz)
4 4
2 2
0 0
−5 0 5 −5 0 5
a5 a6
Freq. (kHz)
Freq. (kHz)
4 4
2 2
0 0
−5 0 5 −5 0 5
Figure 3.3: First three formants as functions of the rst 6 Fourier cosine coecients. In
each graph, all other coecients are kept equal to zero. A histogram of the coecients
of the log-area functions of the corpus analyzed is plotted above each chart.
1 First Formant
F1 (kHz)
0
1 −1
0 0
−1 1
a1
a3
Second Formant
3
F2 (kHz)
−1 1
0 0
1 −1
a1
a3
Figure 3.4: First and second formant frequencies as functions of rst and third Fourier
cosine coecients a1 and a3; when all other coecients are equal to zero.
corpus of log-area vectors dened in Eq. (2.2), the corresponding covariance matrix3
given by
P
Cy = P 1 1 [yi y ][yi y ]t; (3:12)
X
i=1
where y is the mean log-area vector; and can be expressed as
Cy = USUt; (3:13)
where S is a diagonal matrix containing the eigenvalues of Cy in decreasing order, and
U is a unitary matrix whose columns contain the corresponding normalized eigen-
vectors. The expansion above is a Takagi's factorization, which is a singular value
decomposition for the particular case of symmetric matrices (Horn, 1985, pp. 201{
218).
Using the same optimality principle of the Karhunen-Loeve transform (Jayant and
Noll, 1984, pp. 535{546), y can then be approximated by
y ' Uy + y ; (3:14)
given by
= Uty (y y ); (3:15)
where Uy is the matrix containing the rst N columns of U, i.e. the normalized
eigenvectors corresponding to the N largest eigenvalues of Cy . The K + 1 = 33
eigenvalues are shown in Fig. 3.5. Note that only the rst N = 5 eigenvalues have
non-negligible values, and that they \explain" more than 92% of the variance of the
corpus of log-areas. The eigenvectors associated with the largest N = 5 eigenvalues
are shown in Fig. 3.6. They will be used in this paper to form a parametric model
for the vocal-tract log-area function. Since the components of this model cannot be
explicitly interpreted as articulators, it cannot be qualied as an articulatory model
(Mermelstein, 1973; Coker, 1976; Maeda, 1990). In spite of that, it is possible to
observe in Fig. 3.6 that: the rst and most important eigenvector is associated with
the tongue region; the tract-length is the dominant component of the second and fth
eigenvectors; the lips determine the dominant component of the third eigenvector; and
3 All P vectors contained in the corpus were used to compute C . The possibility of not including
y
a few vectors in the estimation of Cy and use them only in the tests carried out in Chapter 5 was
considered. However, since by doing so the entries of Uy varied less than 5%, we opted for using
the whole corpus to derive the model as well as to test it. Although this procedure is, admittedly,
not the most rigorous, it gives us an extensive base to analyze the model.
Log−Area Eigenvalues
0.5
0.4 5% Threshold
Eigenvalue
0.3 1% Threshold
0.2
0.1
0
5 10 15 20 25 30
Log−Area Eigenvalue Sum
Eigenvalue Sum
1
0.8
0.6 90% Threshold
0.4 95% Threshold
0.2 99% Threshold
0
5 10 15 20 25 30
Eigenvalue Number
Figure 3.5: Eigenvalues of the log-area covariance matrix.

the tongue apex is the dominant region of the fourth eigenvector. Also, note that there
is almost no in uence of the glottal region on the rst three eigenvectors. In order to
illustrate the performance of this representation, Fig. 3.7 shows an area function taken
from the corpus (thick line), and its approximations by a truncated Fourier cosine
series (dashed line) and by the parametric model described here (thin solid line). Note
that, in contrast with the Fourier series representation, the parametric model is able
to \capture" the vocal-tract structure. In Fig. 3.8, original trajectories followed by
the tract length, by the area at the lips, and by the area of a section in the alveopalatal
region are shown by the dashed lines. The corresponding trajectories obtained with
the parametric model proposed here are shown by the solid lines. Since the parametric
model is derived from the log-area function, the approximation is particularly good
for small areas, which are critical from the acoustic point of view.
Dimensionality and degrees of freedom

Summarizing, it was shown that vocal-tract log-area vectors can be eciently repre-
sented in an N = 5 dimensional articulatory space. Here, it is interesting to note that
most articulatory models are expressed by seven to nine components. This happens
because their formulation is oriented to the speech production direct problem. In
that case, it is important to consider the number of degrees of freedom of the vocal
apparatus, which is usually larger than the dimension of the articulatory space.
Acoustic Space
To each log-area vector there exist one, and only one, set of formant frequencies
associated with it. Here, the set composed by the rst three formant log-frequencies4
will be called a formant vector, and the space formed by all formant vectors that can
be generated by the vocal-tract will be called the acoustic space.
By performing an eigenvalue decomposition on the covariance matrix of the for-
mant vectors (in log-scale) derived from the P = 519 log-area vectors of the corpus,
the following eigenvalues were found
[159 94 9] 10 4 ;
4 It
was veried a higher degree of linearity in the mapping between articulatory and acoustic
spaces when formant frequencies were represented in log-scale.
Eigenvector #1 Eigenvector #4
0.5 0.5
0 0
−0.5 Sqrt(Eigenvalue) = 2.66 −0.5 Sqrt(Eigenvalue) = 0.89
Eigenvector #2 Eigenvector #5
0.5 0.5
0 0
−0.5 Sqrt(Eigenvalue) = 1.94 −0.5 Sqrt(Eigenvalue) = 0.84
Eigenvector #3 Glottis−−−−−>Lips−>Length
0.5
0
−0.5 Sqrt(Eigenvalue) = 1.48
Glottis−−−−−>Lips−>Length
Figure 3.6: Eigenvectors corresponding to the rst 5 eigenvalues obtained from the de-
composition of the log-area covariance matrix. All eigenvectors are normalized to have
unit Euclidean norm. The rst K = 32 components correspond to the log-area along
the tract; and the last component corresponds to the tract length. The corresponding
eigenvalue square root is given as a reference to the \importance" of each eigenvector.
Area Function: Vowel /i/
3 terms
Area (cm2)
0
0 5 10 15
5 terms
Area (cm2)
0
0 5 10 15
Distance from Glottis (cm)
Figure 3.7: Area function approximations by Fourier cosine series expansion (dashed
line), and by statistically optimum eigenvalue expansion (solid line). The thick solid line
shows the original area. Above: expansion with 3 components. Below: expansion with 5
components.
Vocal−Tract Length Trajectory

Length (cm)
18
Mean Difference: 0.07%
16 Standard Deviation: 0.23%
14
0 0.2 0.4 0.6 0.8 1
Lip Area Trajectory
Area (cm2)
Mean Difference: 3%
5 Standard Deviation: 14%
0
0 0.2 0.4 0.6 0.8 1
Alveopalatal Area Trajectory
Area (cm2)
5 Mean Difference: −7%

Standard Deviation: 7%
0
0 0.2 0.4 0.6 0.8 1
Time (s)
Figure 3.8: Vocal-tract length, lip area, and alveopalatal area trajectories along the
sentence (in French): Ma chemise est roussie. The dashed lines show the original
measured trajectories, while the solid lines show the trajectories parametrized by the
model proposed here. For each case, the mean and the standard deviation values of the
relative dierence (in percentage) between parametrized and original trajectories are also
shown.
the normalized eigenvectors being given by the columns of the matrix

2 3
6
0:933 0:360 0:006 7
6
6
4
0:333 0:870 0:364 :
7
7
5
(3.16)
0:137 0:338 0:931
It means that more than 96% of the total variance can be explained by the rst two
eigenvalues. For this reason, the possibility of representing the acoustic space in two
dimensions was considered. However, since the acoustic information associated with
the third eigenvalue can be important for the inverse problem, it was decided to use
the rst three formant log-frequencies to parametrize a three-dimensional acoustic
space.
Principal components and formant frequencies

Unlike Fourier cosine coecients, there is not a one-to-one mapping between a sub-
space determined by a subset of the principal components and the space determined
by the formant log-frequencies. Such a mapping is a necessary condition for the so-
lution of the inverse problem described in the next chapter. This obstacle can be
overcome by performing appropriate transformations on both principal components
of the log-area function and formant log-frequencies.
3.2.2 Independent component analysis

The objective of this section is to perform linear transformations on the coordinate
systems of both articulatory and acoustic spaces, so that the components of each
space become as independent as possible. The nal objective is to nd a mapping of
the articulatory space onto the acoustic space, where each component of the acoustic
space is mainly determined by one, and only one, component of the articulatory space.
Also, each component of the articulatory space must have major in uence on at most
one component of the acoustic space. In order to attain this objective, a necessary
condition is that the components of each space be as independent as possible.
Articulatory Space
The rst step is to nd how the articulatory space, dened in the last section, maps
onto the acoustic space. To reach this target, rstly, the hyperrectangle dened by the
maximum and minimum values of each of the N = 5 components of the parametrized

corpus is \lled" with Q0 = 30; 000 uniformly distributed points.5 Fig. 3.9a illustrates
this operation by showing the projection on the subspace dened by 1 and 2.
However, not all the points in the hyperrectangle correspond to realistic vocal-tract
areas. For this reason, all points in the hyperrectangle that correspond to areas out of
the limits dened by the P = 519 areas present in the corpus described in Section 2.1
are discarded.6 The remaining Q = 7; 285 points are shown in Fig. 3.9b. After that,
the independent component analysis method proposed by Bell and Seijnowski (1995)
is applied to these points to nd a linear transformation (T : R 5 ! R 5) that changes
the coordinate system of the articulatory space into a system with statistically \less
dependent" components. (The term \less dependent" is used because, in the present
case, a simple linear transformation is not enough to obtain a complete decomposition
into independent components.) Mathematically, this transformation is written as
= T ( ); (3:17)
where is the mean of the Q = 7; 285 vectors generated to \ll" the articulatory
space. Fig. 3.9c shows the same points shown in Fig. 3.9b, now plotted in the new
coordinate system.
Acoustic Space
For a given point in the articulatory space, it is possible to nd the corresponding
log-area vector y using the following inverse transformation
y = Uy (T1 + ) + y : (3:18)
Then, using the wave propagation model described in Section 2.2.3, it is possible
to calculate the formant vector f formed by the rst three formant log-frequencies
associated with y and, consequently, with
f = f( ) : (3:19)
5 30; 000 was chosen arbitrarily as a number suciently large to characterize a uniform distribution
in a ve-dimensional space.
6 All points associated with log-area vectors containing values outside the limits dened by the
maximal and minimal values of each component of the corpus are discarded. (More details are given
in Appendix A.)
(a) Parametric Space

4
Alpha 2
−4
−8
−8 −4
0 4 8
Alpha 1
(b) Log−Area Projected Space
4
0
Alpha 2
−4
−8
−8 0−4 4 8
Alpha 1
(c) "Less Dependent" Components
6
3
Beta 2
−3
−6
−6 −3 0 3 6
Beta 1
Figure 3.9: (a) Parametric subspace determined by the rst two components of . (b)
Points corresponding to realistic area functions (articulatory space). (c) The same points
shown in a coordinate system with \less dependent" components.
This procedure was carried out for all Q = 7; 285 points shown in Fig 3.9c. The cor-
responding formant log-frequency normalized histograms, which are approximations
for the probability density functions, are shown in Fig. 3.10a; while the scattering on
the plane dened by f1 and f2 is shown in Fig. 3.10c. After that, the independent
component analysis (ICA) method described in Bell and Seijnowski (1995) was used
to nd a linear transformation (Tfg : R 3 ! R 3) that changes the coordinate system
dened by the formant log-frequencies into a system with \less dependent" variables.
This transformation can be written as
g = Tfg (f f ); (3:20)
where f is the mean of the logarithm of the Q = 7; 285 formant vectors available.
The normalized histograms obtained for the components of g are shown in Fig. 3.10b,
and the scattering of the rst two components of g is shown in Fig. 3.10d.
At this point, g and dene respectively acoustic and articulatory vector variables
whose components are more independent than the components of f and . The next
step is to model the relationship between acoustic and articulatory spaces.
Before continuing, it is worthwhile to write some lines about the independent com-
ponent analysis (ICA) technique used here. The ICA problem consists of nding a
linear transformation which, when applied to a given ensemble of random vectors,
transforms it into an ensemble of vectors whose components are statistically inde-
pendent, in an ideal case; or as independent as possible, in practical cases. The
theoretical background of the problem is very well described in Comon (1994). The
approach described in Bell and Seijnowski (1995) (and used in this paper) is based
on entropy maximization which, under appropriate conditions, implies mutual infor-
mation minimization, and consequent independence maximization. The method was
originally used to solve the problem of blind separation of mixed sound sources, but
has a potentially larger range of applications.
3.2.3 Singular Value Decomposition

In this section, the mapping from onto g is approximated by a linear transformation
(Tg : R 5 ! R 3) as follows
g ' Tg: (3:21)
(a) Formant Dist. (b) ICA

Number of Samples / All Samples
Number of Samples / All Samples

0.2 0.2
log(f1) g1
0.1 0.1
0 0
2.4 2.6 2.8 3 −4 0 4
0.2 0.3
log(f2) 0.2 g2
0.1
0.1
0 0
2.8 3 3.2 3.4 −8 −4 0 4
0.2 0.2
log(f3) g3
0.1 0.1
0 0
3.4 3.5 3.6 −4 0 4 8
log[ Frequency (Hz) ] Norm. Frequency Units
(c) log(f1)−log(f2) Plane (d) g1−g2 Plane
4
3.4
2
log[ f2 (Hz) ]
0
3.2
g2
−2
3 −4
−6
2.8 −8
2.4 2.6 2.8 3 −6 −4 −2 0 2 4 6
log[ f1 (Hz) ] g1
Figure 3.10: (a) Normalized histograms of the rst 3 formant log-frequencies correspond-
ing to an articulatory space lled with approximately uniformly distributed points. (b)
Histograms of the variables obtained after independent component analysis (ICA) of the
formant log-frequencies. (c) and (d) Scatter plots of the rst 2 variables shown in (a)
and (b), respectively.
In such a case, once there is an ensemble of vectors g and available, a minimum

mean square error (MMSE) procedure can be used to estimate Tg , yielding
Tg = GBt(BBt) 1 ; (3:22)
with
G = [g1 : : : gQ]; (3:23)
and
B = [ 1 : : : Q ]: (3:24)
In the above equations, Q = 7; 285 is the number of points present in the ensembles.
Once Tg is determined, a singular value decomposition procedure (Horn, 1985,
pp. 411{455) can be used to nd rotations of the acoustic (g) and articulatory ()
coordinate systems, so that each of the rst three components of the articulatory
space has major in uence on one, and only one, component of the acoustic space.
The singular value decomposition of Tg yields
Tg = Uhg Sh U t ; (3:25)
where Uhg is a unitary matrix containing the normalized eigenvectors of Tg Tgt, U
is a unitary matrix containing the normalized eigenvectors of Tg tTg, and Sh is a
3 5 matrix whose rst 3 columns dene a diagonal matrix containing the square
roots of the eigenvalues of Tg Tg t, and the elements of the last two columns are all
equal to zero. Now, since the multiplication of a unitary matrix by a vector represents
a rotation of this vector,
= U t (3:26)
and
h = Uhgt g (3:27)
dene, respectively, \rotated" articulatory and acoustic variables. Vectors and h
dene parametric representations for log-area vectors y and formant log-frequency
vectors f. The relation between y and is obtained straightforwardly from equations
(3.14), (3.15), (3.17), and (3.26); while the relation between f and h is obtained from
equations (3.20) and (3.27) yielding
= Ty (y 0y ); (3.28)
y ' T y + 0y ; (3.29)

where Ty = Ut T Uty ; (3.30)
T y = Uy T1U ; (3.31)
0y = y + Uy ; (3.32)
and h = Tfh(f f ); (3.33)
f = Tfh1h + f ; (3.34)
where Tfh = Uthg Tfg : (3.35)
The basis vectors contained in the rows of Ty as well as 0y are plotted in Fig. 3.11.
The numerical values are given in Appendix A.
The matrix of correlation coecients (Papoulis, 1991, p. 152) between and hcan
be estimated by
Rh = (Q H1) t ;
t
(3:36)
h
where
H = [h1 : : : hQ]; (3.37)
= [ 1 : : : Q ]; (3.38)
h and are the column vectors containing respectively the standard deviations of h
and , Q = 7; 285 is the number of points present in the ensembles, and the division
of H t by h t is performed element-wise. The numerical result obtained is shown
below
2 3
0:939 0:003 0:005
6
0:004 0:0047
Rh = 0:003 0:953 0:003
6
6
4
0:004 0:002 :
7
7
5
0:002 0:001 0:461 0:001 0:002
This matrix shows that there exists a high degree of correlation between the rst
two acoustic components and the rst two articulatory components. There is also
a not negligible degree of correlation between the third acoustic and articulatory
components. All other correlation coecients are very small.
At this point, in order to see the importance of the independent component anal-
ysis described in Section 3.2, it is interesting to compare Rh with the matrix of
correlation coecients obtained when f and are used in place of g and to obtain
Basis Vector #1 Basis Vector #4

0.4
0.6
0 0
−0.4 −0.6
0.4 Basis Vector #2 0.4 Basis Vector #5
0 0
−0.4
−0.4
0.3 Basis Vector #3 Mean Vector
0 1
−0.3
−0.6 0
Glottis−−−−−−−−−−−−>Lips−>Length Glottis−−−−−−−−−−−−>Lips−>Length
Figure 3.11: Basis vectors (rows of Ty ) and mean vector (y ) used to represent log-area
vectors (y). Units: the rst K = 32 components are log-areas along the tract expressed
in log(cm2 ). The last component is the tract length expressed in normalized units (1 unit
= 0.53cm for the basis vectors and 1 unit = 200.53cm for the mean vector).
h and , as done in Yehia and Itakura (1995b) and in Yehia et al. (1995). The result
is shown below
2 3
6
0:944 0:270 0:206 0:308 0:035 7
Rf = 6
6
4
0:270 0:944 0:266 0:258 0:183 :7
7
5
0:112 0:142 0:511 0:069 0:184
Note that, although the correlation between the acoustic components and the corre-
sponding rst three articulatory components continues to exist, the other correlation
coecients are not negligible any more.
It should be pointed out, however, that uncorrelation does not imply indepen-
dence. This fact is illustrated in Fig. 3.12, where scatterings representing the joint
cross-distributions of the components of h and of are plotted. There exists, for
example, an apparent nonlinear relation between h3 and 1. This kind of dependence
cannot be well approximated by the linear transformation used in this work to model
the mapping from the articulatory space onto the acoustic space. In spite of these
limitations, the model successfully extracted two acoustic variables, namely h1 and
h2, which depend approximately linearly on two, and only two, articulatory variables,
namely 1 and 2. The remaining articulatory components, 3; 4; and 5, have little
in uence on h1 and h2. Moreover, 2 has little eect on h1, and the in uence of 1
on h2 does not aect the one-to-one relationship between 2 and h2. These facts are
illustrated in Fig. 3.13.
Once the parametric model is derived, and its basic characteristics are analyzed, it
is interesting to compare articulatory and acoustic component trajectories for a given
sequence of vocal-tract shapes. The trajectories associated with the French sentence
\Ma chemise est roussie" are shown in Fig. 3.14. It is possible to observe that the
rst two articulatory components are indeed closely related to the rst two acoustic
components. It is also possible to see that there exist some similarities between h3
and h1, indicating that they are not independent.
3.3 Temporal Analysis

Up to this point, each of the P = 519 log-area vectors present in the corpus was
parametrized by N (N = 5 in the statistical analysis, and N = 9 in the Fourier anal-
ysis) coecients, which contain also vocal-tract length information. In this section,
3.3. TEMPORAL ANALYSIS 47
Articulatory−Acoustic Scatterings
4
h1
0
−4
4
h2
0
−4
4
h3
0
−4
−4 0 4 −4 0 4 −4 0 4 −4 0 4 −4 0 4
Gamma1 Gamma2 Gamma3 Gamma4 Gamma5
Figure 3.12: Scatterings representing the joint distributions of the components of the
acoustic variable h and the components of the articulatory variable . Note the high
correlation between 1 and h1 , and between 2 and h2. See also the nonlinear relation
between 1 and h3.
First Acoustic Component
4
h1
−4
4
0 4
0
−4 −4
Gamma1 Gamma2
Second Acoustic Component
4
h2
−4
4
0 4
0
−4 −4
Gamma1 Gamma2
Figure 3.13: First two acoustic components (h1; and h2) expressed as functions of the
rst two articulatory components ( 1 and 2 ), when all other components ( 3 ; 4; and 5 )
are equal to zero. Note that h1 is almost independent of 2 , and that there are one-to-one
relationships between h1 and 1 , and between h2 and 2 .
Articulatory Trajectories Acoustic Trajectories

5 5
Gamma1
h1
0 0
−5 −5
5 5
Gamma2
h2
0 0
−5 −5
5 5
Gamma3
h3
0 0
−5 −5
5 0 0.5 1
Gamma4
Time (s)
0
−5
5
Gamma5
−5
0 0.5 1
Time (s)
Figure 3.14: Articulatory and acoustic component trajectories along the sentence (in
French): Ma chemise est roussie. Note the similarity between the rst two articulatory
trajectories and the rst two acoustic trajectories. (The dashed lines in the acoustic
trajectories indicate the intervals where the formants cannot be reliably extracted from
the speech signal due to very narrow constrictions in the area function.)
the objective is to take sequences of p (e.g. p = 10) frames contained in a sentence,

and represent them with less than pN parameters. The procedure below is carried
out for sequences of parameters, but the same method can be applied to a (Fourier)
parameters as well. The parametrization is as follows. Representing sequences of p
frames by
i = [ i ; i+1 ; : : :; i+p 1 ] ;
t (3:39)
it is possible to compute the \covariance" 7
P
C = P 0 1 1 i ti;
0
X
(3:40)
i=1
where P 0 is the number of sequences of length p contained in the corpus. Then,
following the same method used for log-area statistical parametrization, it is possible
to express C as (Takagi's factorization)
C = VVt; (3:41)
where is a diagonal matrix containing the eigenvalues of C in decreasing order,
and the columns of V are the corresponding normalized eigenvectors. i can then be
approximated by

i ' iV ;
t (3:42)
i given by
i = iV ; (3:43)
where V is the matrix containing the rst q columns of V (i.e. the normalized
eigenvectors corresponding to the q largest eigenvalues of C .) The components of
i are orthogonal in the sense that

E [ ti i] = E [Vt ti iV ] = Vt C V = q ;
E [] denoting expected value, and q being the diagonal matrix containing the q
largest eigenvalues of C in decreasing order.
Thus, a sequence of p log-area vectors
2 3
y1i : : : y1;i+p 1
Yi = ... ...
6 7
6
6
4
7
7
5
(3.44)
yK+1;i : : : yK+1;i+p 1
7 \Covariance"is quoted because f i ; i = 1; : : :; P g is an ensemble of matrices, and not of
0
vectors as it should be to dene a classic covariance matrix. Nevertheless, C contains information

about the covariance between components that parametrize the log-area function at dierent times.
containing p(K +1) elements (e.g. p(K +1) = 1033 = 330) can be approximately
represented by a matrix
2 3
11i : : : 1qi
i = ...
6
6
6
4
... 7
7
7
5
(3.45)
N 1i : : : Nqi
containing qN elements (e.g. qN = 4 5 = 20), with
i = Ty (Yi 0y )V ; (3.46)

Yi ' T y iVt + 0y : (3.47)
The eigenvalues and rst four normalized eigenvectors of the \covariance" matrix
C are shown in Fig 3.15. It can be seen that the eigenvectors have approximately
the shape of cosine functions. This indicates that a Fourier expansion series is also
appropriate to represent the temporal behavior of parametrized log-area sequences
(Yehia and Itakura, 1993a, 1994).
The representation of an area function sequence by the method described here is
illustrated in Fig.3.16. Panel (a) shows the original sequence taken from the corpus
while panel (d) shows the corresponding rst three formant trajectories. Panel (b)
shows the sequence recovered from a parametrization by N q = 54 = 20 coecients
obtained with the principal component analysis (PCA) described in this and in the
previous section. Finally, panel (c) shows the sequence seen in panel (a) recovered
from a two-dimensional Fourier cosine series approximation by N q = 9 4 = 36
coecients. Note the good agreement between the formant trajectories associated
to the recovered area function sequences (panels (e) and (f)). Not surprisingly, also
note that, even using considerably less parameters, PCA analysis preserves better the
morphological characteristics of the vocal-tract.
Comment
Now, the vocal-tract is represented by appropriate parameters. The next task is to
estimate these parameters from formant log-frequencies. A procedure to accomplish
this task is the topic of the next chapter.
Eigenvalues of the "Covariance" Matrix (Lambda)

100
50
0
1 2 3 4 5 6 7 8 9 10
Normalized Eigenvector #1 Normalized Eigenvector #2
0.5 0.5
0 0
−0.5 −0.5
1 10 1 10
Normalized Eigenvector #3 Normalized Eigenvector #4
0.5 0.5
0 0
−0.5 −0.5
1 10 1 10
Frame Number (i) Frame Number (i)
Figure 3.15: Eigenvalues of the \covariance" matrix of sequences of parametrized

log-area vectors, and corresponding rst four eigenvectors.
Original Area PCA Fourier

(a) (b) (c)
Area (cm2)
5 5 5
0 0 0
0 00 00 0
5 5 5
0.1
Tim 10 0.1 10 0.1 10 m)
e ( 0.2 15 0.2 15 0.2 15 (c
s) i t ion
s
Formant Frequency Trajectories Po
Frequency (kHz)
(d) (e) (f)

3 3 3
2 2 F3: 4.7% 2 F3: 11%
Max. Diff. F2: 9.8% Max. Diff. F2: 9.9%
1 1 F1: 7% 1 F1: 8%
0 0.1 0.2 0 0.1 0.2 0 0.1 0.2

Time (s) Time (s) Time (s)
Figure 3.16: (a) Sequence of area functions, taken from the corpus, corresponding to
the diphthong /ui/, uttered in the (French) sentence \Luis pense a ca." (b) Sequence
of areas reconstructed from the parametric principal component representation of the
original areas shown in (a). (c) Sequence of areas reconstructed from the parametric
Fourier representation of the original areas shown in (a). (d), (e) and (f) show formant
frequency trajectories corresponding to the sequences of areas shown in (a), (b) and (c)
respectively. The dashed lines shown in (e) and (f) are the original formant trajectories
shown in (d). For each pair of formant trajectories, the maximum relative dierence (in
percentage) is also shown.
Chapter 4
The Inverse Problem
\Tangible things
become insensible
to the palm of the hand."
Carlos Drumond de Andrade (1902{1987)
Memory
Before entering the details of the speech production inverse problem, it is inter-
esting to analyze it from a more generic point of view: The articulatory space can be
viewed as a space whose dimension N is larger than the dimension M of the acoustic
space. The problem is then to nd, among all the points in the articulatory space
that are mapped onto a given point in the acoustic space, the one that is the most
likely to occur. If such points dene an articulatory subspace of dimension N M
(see Figure 4.1), the solution for the inverse problem can be divided in two parts:
(i) mathematical description of the N M dimensional articulatory subspaces, each
of them corresponding to one and only one point in the M dimensional acoustic
space; and (ii) formulation of a cost function to determine, for each subspace, the
point that is the most likely to occur. This is the procedure that will be described in
this chapter.
54
55
N Dimensional M Dimensional
Articulatory Space Acooustic Space
N-M Dimensional
Subspaces
Figure 4.1: Representation of the one-to-one relationship between the N M dimensional

subspaces that form the N dimensional articulatory space. Compare with gures 4.2
and 4.3, where the level curves are N M = 1 dimensional subspaces (contained in an
N = 2 dimensional articulatory space) which map onto an M = 1 dimensional acoustic
space.
The speech production inverse problem, i.e. the problem of estimating the vocal-
tract conguration from the speech signal, can be seen as a one-to-many non-linear
mapping. This mapping establishes the relationship between an articulatory space,
determined by all possible vocal-tract congurations, and an acoustic space, deter-
mined by all possible speech signals.
The one-to-many characteristic comes from the fact that a given period of speech
can be generated by an innite number of vocal-tract congurations.
56 CHAPTER 4. THE INVERSE PROBLEM
The non-linear characteristic is inherent in the process of speech generation. How-

ever, its degree of complexity depends on the parameters chosen to represent both
articulatory and acoustic spaces.
The objective of this chapter is to analyze a restricted case of the speech pro-
duction inverse problem, namely the estimation of the cross-sectional area along the
vocal-tract (or, for simplicity, the area function) from the corresponding formant fre-
quencies. For this particular case, each area function is represented by one point in
the articulatory space, while each set of formant frequencies is represented by one
point in the acoustic space.
The rst diculty to solve this problem comes from the one-to-many characteristic
of the inverse problem, i.e. each point in the acoustic space is associated with many
points in the articulatory space. To cope with this fact, two kinds of constraints
can be invoked: the rst one is related to the morphology of the vocal-tract, which
determines the positions that can be reached and the eort necessary to reach them.
Under this constraint, the inverse problem can be stated in the following way: Find,
among all points in the articulatory space associated with a given point in the acoustic
space, the point that is reached with minimum eort1 by the vocal-tract.
Morphological constraints however, are essentially static and, therefore, cannot
account for co-articulation eects such as anticipation and retention. In order to
cope with these eects, a second kind of constraint can be invoked: it is related to
the patterns of motion of the vocal-tract, which can be called gestures, and can be
used to determine the trajectories that can be followed by the tract, and the eort
necessary to execute each of them. At this point, it is possible to expand the concept
of an articulatory space, and think about an articulatory trajectory space which is
formed by all the gestures that can be generated by the human vocal-tract. Such a
space maps onto an acoustic trajectory space which is formed by all trajectories that
can be generated by the vocal-tract in the acoustic space. Under this point of view,
the inverse problem can be restated as: Find, among all points in the articulatory
trajectory space associated with a given point in the acoustic trajectory space, the
point that is produced with minimum eort by the vocal-tract.
1 In the
case of human motor behavior, minimum eort is a concept dicult to specify. In reality
it is a combination of facts which come from the command generation level in the brain down
to articulatory motion under physiological constraints. The eort function used here is a simple
quadratic cost that takes into account vocal-tract morphological information.
4.1. ISOLATED FRAMES 57
4.1 Isolated frames

In this section, the mathematical formulation used to represent morphological con-
straints is described for the case of isolated frames. In the following section the
method is generalized for the case of trajectories.
The procedure can be divided in two parts: rst, a mathematical representation
for the \cost" of a given position of the vocal-tract is derived. After that, this cost
function is minimized under the acoustic constraint determined by a given set of
4.1.1 Representing Morphological Constraints

In order to keep a link with the works developed by Schroeder (1967) and Mermelstein
(1967), we start the explanation about morphological constraint representation using
the truncated Fourier cosine series parametrization of the log-area function. There
are many sets of Fourier cosine coecients that are associated with the same set
of formant frequencies (Mermelstein, 1967; Atal et al., 1978). For this reason, it
is necessary to impose constraints if we wish to estimate the area function from
the formants. As an example, the thick solid lines shown in the bottom panel of
Figure 4.2 represent level curves of the surface shown in the top left panel; which
shows the rst formant frequency (F1 in kHz) when the vocal-tract cross-sectional
area is represented by
Ak = exp[a1 cos K (k 21 ) + a2 cos 2K (k 21 )]; L = 17cm: (4:1)
It is seen that, for a given value of F1, there is an innite number of combinations of
a1 and a2 associated with it. Another important fact is that, for a given a2 there is
one and only one a1 associated with a given formant frequency.
The same property is observed when and h are used to parametrize articulatory
and acoustic spaces. This fact is illustrated in the bottom panel of Figure 4.3, where
the thick solid lines show level curves of h1 expressed as a function of 1 and 2 when
all other components of are equal to zero. It is seen that each point h1 is associated
with a line in the plane dened by 1 and 2.
From now on, since and h allow a more ecient parametrization than a and
f, most of the mathematical procedures carried out here will be based on and h.
First Formant Cost Function
F1 (kHz) 1
Cost
0 6 6
−6 0 −6 0
0 6 −6a2 0 6 −6a2
a1 a1
Level Curves of F1 and of Cost Functions
6
2
a2
−2
−4
−6
−6 −4 −2 0 2 4 6
a1
Figure 4.2: Top left: the rst formant F1 as a function of the Fourier cosine coecients
a1 and a2; when all other coecients are equal to zero. Top right: paraboloidal surface
representing the cost function P a = at Ha a used to quantify the vocal-tract eort.
Bottom: The solid thick lines show level curves of the surface shown in the top left panel.
(Compare with the general case in Fig. 4.1.) The solid thin ellipses show level curves of
the cost function shown in the top right panel. The dashed circles represent the particular
case when P y is an unweighted squared Euclidean distance (i.e. when Hy is an identity
matrix.)
First Acoustic Component Cost Function
Cost
h1
−6
6 6
6 6
γ2−6 −6 γ1 γ2−6 −6 γ1
Level Curves of h1 and of Cost Functions

6
2
γ2
−2
−4
−6
−6 −4 −2 0 2 4 6
γ1
Figure 4.3: Top left: the rst acoustic component h1 as a function of the principal
component coecients 1 and 2 ; when all other coecients are equal to zero. Top right:
paraboloidal surface representing the cost function P = t H used to quantify the
vocal-tract eort. Bottom: The solid thick lines show level curves of the surface shown
in the top left panel. (Compare with the general case in Fig. 4.1.) The solid thin ellipses
show level curves of the cost function shown in the top right panel. The dashed ellipses
represent the particular case when P y is an unweighted squared Euclidean distance (i.e.
when Hy is an identity matrix.)
Nevertheless, the counterpart procedures based on a and f can be obtained basically

by substituting by a and h by f.
Coming back to the topic of mathematical representation of morphological con-
straints, the (static) constraint considered here is based on the following optimization
problem: Given a vector of acoustic variables h, nd, among all possible vectors
of articulatory variables associated with h, the one that is the \closest" to the
\minimum eort position." This position can be given by the neutral or the average
position of the vocal-tract: it is reasonable to assume that the neutral vowel position
corresponds to the minimum eort position of the tract, since no active articulation is
being performed. It is also possible to think that the minimum eort position corre-
sponds to the average position of the tract. If the neutral position is mathematically
interpreted as the point of maximum probability density in the articulatory space,
then it coincides with the average position if the probability density function of the
points in the articulatory space is symmetric, but may not coincide in other cases.
The neutral position seems to be more meaningful, but the average position is more
tractable from the mathematical point of view. This point will be addressed again
opportunely.
A mathematical formulation for a morphological constraint can be carried out in
the following way: a log-area vector y can be eciently parametrized by a vector
as already seen in the last chapter (Eq. (3.29))
y ' T y + 0y :
If the \minimum eort position" is dened by 0y , then a quadratic positional cost
P can be dened as
P = tT y Hy T y ; (4.2)
which is an approximation for the quadratic form
P y = (y 0y )t Hy (y 0y ): (4.3)
Now, making
H = T y Hy T y ; (4.4)
yields
P = tH : (4.5)
H is a positive denite matrix which contains information about the morphology of

the vocal-tract. It must be chosen so that natural positions of the tract result in a
low cost P , while positions incompatible with the morphological characteristics of
the tract result in a high cost P .
The simplest choice for H is obtained when Hy is taken as the identity matrix
of order (K +1). In this case, P is simply the square of the Euclidean distance
between the parametrized log-areas of a given vector y and 0y , the log-area vector
corresponding to the minimum eort position. This is, however, not a good cost
function, since it gives equal weight to exible and rigid regions of the vocal-tract.
A geometric illustration of this case is given by the dashed ellipses shown in the
bottom panel of Fig. 4.3. However, the meaning of this unweighted case of the cost
function becomes clearer when it is represented by Fourier cosine coecients: the
dashed circles shown in the bottom panel of Fig. 4.2 represent level curves of P a
as a function of a1 and a2. It is possible to see that the minimal cost for a given
formant corresponds to the intersection of its level curve with the line a2 = 0. It
is interesting to note that Mermelstein (1967) used basically the same mathematical
constraint: for N = 7 and M = 3, fa0; a2; a4; a6g were kept equal to zero, while the
rst M = 3 formant frequencies fF1; F2; F3g were used to determine the rst M = 3
odd Fourier cosine coecients fa1; a3; a5g. (The problem of length determination
was not considered). Also interesting is the fact that the results obtained with this
simple and rather articial constraint are quite acceptable for some vowels, as shown
in Mermelstein (1967), and in the next chapter.
A more realistic possibility is obtained when Hy is taken as a diagonal matrix in
which the elements of the diagonal associated with the rigid regions of the tract are
large, while the elements associated with the exible regions are small. This approach,
combined with a smoothness constraint, was used by Yehia and Itakura (1993,1994)
with reasonable results. The weak point in this approach is that, although the local
characteristics of each region of the tract are well represented, the global articulatory
structure (morphology) of the tract is not taken into account. As an example, the
cost of a given position of the tongue apical region may depend on the position of the
tongue dorsal region.
In order to represent the interdependence between dierent regions of the tract,
the covariance between those regions must be taken into account. From a probabilistic
point of view, given a corpus containing Q > N linearly independent log-area vectors,
if H is taken as the inverse of the covariance matrix of the log-area articulatory
vectors ,
1 Q
H = C ; where C ' Q 1 i ti;
X
1
(4.6)
i=1
then
P = tC 1 ; (4.7)
i.e. P becomes a squared Mahalanobis distance (Duda and Hart, 1973, pp.23{24).
Under the rather strong assumption of normal distribution, it means that, given an
acoustic vector h, minimization of P implies maximization of the probability of
occurrence of the corresponding .
It is interesting to note that the same H can be found by minimizing
Q Q
CP = Q1 P i = Q1 tiH i = trace(C H );
X X
(4.8)
i=1 i=1
the average of the costs of all articulatory vectors in the corpus, with respect to the
elements of H , under the constraint
jH j jC 1j: (4.9)
The proof is based on the fact that the trace and the determinant are respectively
the sum and the product of the eigenvalues of C H . Then it is not dicult to show
that minimization of the trace under the above determinant constraint implies that
the eigenvalues must be all equal to 1 (one) and, hence, C H is an identity matrix.
Since C is a covariance matrix,
H = C 1
is dened, and is a positive denite matrix.

From a geometrical point of view, the level hypersurfaces of P are hyperellipsoids
whose principal axes are determined by the eigenvectors of C , the eigenvalues deter-
mining the length of these axes. An illustration for a two-dimensional case is given
by the ellipses shown in the bottom panel of Fig. 4.3, which represent level curves of
P as a function of 1 and 2.
The above derived cost function is able to cope with the articulatory eects de-
termined by the morphology of the vocal-tract. However, it is important to note
that, in a wider sense, it can not be considered to be optimal. It is so because the
quadratic form adopted for P implies the existence of a single point of minimum,
corresponding to the minimum eort position. Nevertheless, other stable positions,
corresponding to local minima, may exist and, if they are to be taken into account, a
more elaborated model is needed. Also, the probability distribution of points in the
articulatory space is not symmetric relatively to the mean. When solving the inverse
problem, it was observed that slightly better results are obtained when the quadratic
cost function has its center translated to the point of maximum probability density
in the articulatory space (i.e. the neutral position.) Finally, since P is essentially
a static constraint, it can not cope with co-articulation eects. (This point will be
analyzed later.)
As a nal comment note that, for the case of Fourier cosine components, following
the same procedure used for the principal components, the cost function is given by
Q
P a = (a a)t Ha (a a); Ha = Ca 1 = Q 1 1 (ai a)(ai a)t: (4.10)
X
i=1
4.1.2 Solving the inverse problem

For a given acoustic vector h, it is now possible to derive a procedure to estimate its
articulatory counterpart, represented by vector , under the morphological constraint
described above. The method is as follows.
Relationship between acoustic and articulatory variables

A variation in the acoustic vector h is locally linearly related to a variation in the
articulatory vector (Mermelstein, 1967). Thus, it is reasonable to assume that,
for suciently small variations,2
a( ) = h; (4.11)

h( + ) = h( ) + a( ) ; (4.12)
2 In
strict terms, the equality holds only for innitesimal variations. However, this relation is
approximately true even for fairly large variations, as seen in Fig. 3.12.
where a is the Jacobian matrix that gives the partial derivatives @hj =@ i for a
given :
a( ) = dd h ; (4.13)
which is an M N matrix (the number of rows is equal to the number of acoustic
components M , and the number of columns is equal to the number of articulatory
components N ). Figure 3.12 gives an idea of the \degree of linearity" between h
and . It shows the M = 3 acoustic variables h in terms of the N = 5 articulatory
variables: 1; 2; : : :; 5.
There are two important facts to be noted here: 3
1. There is a quasi-linear relationship between h and .
2. There is a one-to-one relationship between [ 1; 2; 3] and [h1; h2; h3]. (In fact,
what is apparent in Figure 3.12 is that h1; h2 and h3 are monotonically increasing
functions of 1, 2 and 3, respectively.)
This one-to-one relationship leads to the following speculation: taking 1 as the vector
formed by the rst M articulatory components of ,
2 3
@h1 @h1 @h1
@ 1 @ 2 : : : @ M
6 7
@h2 @h2 : : : @h2
@ h
6 7
1 ( ) = @ 1 = .. 1 @ .. 2 . . @ ..M
6 7
@ 6
6
7
7 (4.14)
. 6
6
4
. . . 7
7
5
@hM @hM : : : @hM
@ 1 @ 2 @ M
is not singular. In fact, numerical tests with log-area functions indicate that det(1 )
is practically always positive. Exceptions do exist, but did not cause problems during
the cases analyzed until now.
Therefore, under the assumption that det(1 ) 6= 0, it is possible to divide
= [ 1; 2; : : : ; M ; M +1; : : : ; N ]t; (4.15)
into two subvectors: 1containing the rst M components of , and 2 con-
taining the remaining components
1 = [ 1; 2; : : :; M ]t; (4.16)
2 = [ M +1; M +2; : : : ; N ]t; (4.17)
3 Similar conclusions can be taken from Figure 3.3 for the case of Fourier cosine coecients and
where M is the number of acoustic components; and express 1 in terms of h

and 2 as follows.
a = h; (4.18)
1 1 + 2 2 = h; (4.19)
1 = 1 1h 1 1 2 2; (4.20)
where 1 was already dened in Eq. (4.14), and
2 3
@h1 @h1
: : : @@h N1
6 @ M +1 @ M +2 7
@h2 @h2
: : : @@h N2
@ h
6 7
2 ( ) = @ 2 =
6
@ M +1 @ M +2 7
. . . ... ; (4.21)
6 7
6
6
6
... ... 7
7
7
4 5
@hM @hM @hM
@ M +1 @ M +2 : : : @ N
where 2 is given by the last N M components of . (The equations above are in
general terms. In the case under analysis, M = 3 and N = 5.)
Combining acoustic and morphological constraints

The above relation can now be used as an acoustic constraint for the cost function P .
By minimizing P [ + ( 2)] with respect to 2 (using 1 = 1 1 h
1 12 2), it is possible to nd min that minimizes P ( + ) under the
acoustic constraint
a = h:
The mathematical formulation is as follows. Given
P [ ( 2 )] = [ ( 2 ) + 0]tH [ ( 2) + 0]; (4.22)
where 0 is the neutral position4 , nd the family of vectors for which
dP = 0: (4.23)
d 2
Doing the calculation,
dP = d t dP
" #
d 2 d 2 d (4.24)
t
d
"

#
= d [H + H t]( + 0):
2
4 As mentioned before, better results were obtained when the neutral position was used instead
of the average position as the minimum eort position.
Here, the derivative of with respect to 2 will be called 0 and is given by

d
2
1 3
= d =
0 4 1 2 ; 5 (4.25)
2 IN M
where IN M is the identity matrix of order N M . Now, since H is symmetric (it
is a covariance matrix),
dP = 2 0tH ( + ) (4.26)
0
d 2
and, nally, dP =d 2 = 0 implies that
0tH = 0tH ( 0); (4.27)
which can be rewritten as
p = p( 0); (4.28)
where
p = 0tH : (4.29)
The linear system above gives the necessary N M equations to complete the un-
derdetermined system given by Eq. (4.11):
a =
9
h =
=) = h =) = 1h; (4.30)
p = p( 0) ;
where
= a h
2 3 2 3
and h =
p ( 0) : (4.31)
4 5 4 5
p
Iterative solution
For larger variations, the system above is an approximation, since depends on .
However, as seen in Fig. 3.12, there is a high degree of linearity in this non-linear
system. In the experiments performed, it was successfully solved by the following
Newton-Raphson procedure
1 = (h1; H ) n Function to compute 1 from h1 and prior information H . n

f
= 0 ; n Initialize to the neutral position and compute n
h = h( 0); n the corresponding acoustic vector. n
h = h1 h;
while (k h k > ")
f
= ( ; H ); n Calculate new , n
h = h(h; ; ); n h, n
= 1h; n and . n
= + ; n Update . n
h = h( ); n Update h. n
h = h1 h;
g
1 = ; n Return 1. n
g
k k is a norm function. In the implemented system it is the maximal deviation
between the desired and obtained acoustic vectors (h). " is an error criterion. A
value around 0:01 (1%) was found to be a good compromise between precision and
computation cost. The input h1 is obtained from the rst M formant frequencies using
Eq. (3.33), while the area vector is obtained from the output 1 using Eq. (3.29). For
the analyzed cases (see next chapter) this procedure took on average three iterations
to converge, the hardest cases taking six iterations.
4.2 Trajectories
The log-area function moves smoothly due to dynamical constraints. Here, instead of
modeling physically the dynamics of the vocal-tract, only a constraint of smoothness
in time will be imposed. The dynamic case can be derived as an expansion of the
static case. In the end of Chapter 3 , it was seen that, for a given time interval,
the trajectories of the components of vector (or a, in the case of representation by
Fourier cosine series) can be approximated by a linear combination of eigenvectors
whose components follow the approximate shape of cosine functions, as expressed by
Eq. (3.42) rewritten below as
i = iVt + Ei; (4.32)
where Ei is the approximation error matrix, i is a matrix whose columns are a
sequence of p articulatory vectors ( i; : : :; i+p 1), and the q columns of V are
eigenvectors whose sum, weighted by the coecients contained in i; approximates
i.
Now, using Eq. (4.30), and dropping the time index i for convenience, it is possible
to write
M G = H; (4.33)
where
( i)
2 3
0 ::: 0
M=
6 7
6
6 0 ( i+1 ) : : : 0 7
7
(4.34)
... ... ...
6 7
6
6
6
... 7
7
7
0 0 : : : ( i+p 1)
4 5
is a \matrix of matrices" containing the locally linear relation between a sequence of

articulatory variations and their acoustic and cost counterparts;5
2 3
6
i 7
6
6 i+1 7
7
G = 6
6
6 ...
7
7
7
(4.35)
6 7
i+p
4 5
1
5 Compare with Eq. (4.30) and observe that the dependence of on is made explicit here.
4.2. TRAJECTORIES 69
contains a sequence of articulatory variations i vertically arranged as a column

vector; and
2 3
6
hi 7
H =
6
6 hi+1 7
7
(4.36)
...
6 7
6 7
6 7
6 7
hi+p
4 5
1
contains a sequence of vectors hi, vertically arranged as a column vector, com-

posed by the acoustic and positional cost constraints associated with the articulatory
variation vectors contained in G .
The relation given by Eq. (4.32) can be applied to Eq. (4.33) yielding
MV X = H + E; (4.37)
where
2 3
6
v11 : : : v1q 0 : : : 0 ::: 0:::0 7
6
6 0 : : : 0 v11 : : :v1q ::: 0:::0 7
7
6
6
6
... ... ... ... 7
7
7
6 7
0:::0 0:::0 : : : v11 : : : v1q

6 7
6 7
6 7
6 7
6
6
6
v21 : : : v2q 0 : : : 0 ::: 0:::0 7
7
7
6
0 : : : 0 v21 : : :v2q ::: 0:::0 7
V=
6 7
6
6
6
... ... ... ... 7
7
7
6 7
(4.38)
0:::0 0:::0 : : : v21 : : : v2q
6 7
6 7
6 7
6
6
6
... ... ... ... 7
7
7
6
6
6
... ... ... ... 7
7
7
6 7
vp1 : : : vpq 0 : : : 0 ::: 0:::0

6 7
6 7
6 7
6 7
6
6 0 : : : 0 vp1 : : :vpq ::: 0:::0 7
7
6
6
6
... ... ... ... 7
7
7
4 5
0:::0 0:::0 : : : vp1 : : : vpq
is a Np Nq matrix whose nonzero elements vij are the entries of V . Each row of
V
V is repeated N times in the matrix shown above.
2 3
6
11 i 7
6
6 12 i 7
7
6
6
6
... 7
7
7
6 7
1q i
6 7
6 7
6 7
6 7
6
6
6
21 i 7
7
7
6
6 22 i 7
7
6
6
6
... 7
7
7
X = 6 7
(4.39)
2q i
6 7
6 7
6 7
6
6
6
... 7
7
7
6
6
6
... 7
7
7
6 7
N 1 i
6 7
6 7
6 7
6 7
6
6 N 2 i 7
7
6
6
6
... 7
7
7
4 5
Nq i
contains the columns of a variation i rearranged in a column vector and, nally,
E is an Np 1 column vector containing the approximation error.
Eq. (4.33) denes an overdetermined system which can be solved by minimizing
a weighted version of the squared error
(X ) = E t H E
MV MV
(4.40)
t
X H H
h i h i
= X H ; (4.41)
where H is an Np Np positive denite (Horn, 1985, p. 250) matrix which can
be used to give dierent weights to acoustic and morphological constraints, and to
dierent subintervals of the speech interval under analysis. It may be interesting when
dealing with more complex speech intervals, or when studying the trade-o between
eort and acoustic accuracy in speech (Lindblom, 1990). However, in the present
status of this study, such possibilities are not being explored yet, and H is being
taken as an identity matrix.
Minimization of with respect to X is carried out as follows
d = 0
dX
=)
h
MV H MV + MV H MV = 0
i t h i h it h i
MV (H + H ) MV = 0
X H X H
h i t h i
=) t
MV (H + H ) MV = MV (H + H )
X H
h i t h i h it
=) t X t
H
= MV (H + H ) MV MV (H + H ) (4.42)

h it h i 1h i t
=) X
t

t
H
and, for the particular case of H being an identity matrix,
X =
h
MV MV MV t
i h i 1h i t
H: (4.43)
Iterative solution
As in the case described in Section 4.1, the equality of Eq. (4.43) holds only for su-
ciently small variations. For larger variations, the problem is solved by the same kind
of Newton-Raphson procedure used in the previous section. The acoustic components
of H are initialized by the trajectory given by the dierence between a given sequence
of acoustic vectors6 and the acoustic vector determined by the articulatory neutral
position. The minimum cost components of H are initially set to zero, and adapted
as the articulatory trajectory vector X changes during the iterative procedure. For a
given X , i is obtained by simply rearranging the entries of X as (see Eq. (4.39))
2 3
11i : : : 1qi
= ...
6
6
6
4
... ;
7
7
7
5
(4.44)
N 1i : : : Nqi
and the corresponding articulatory trajectory is approximated by (see Eq.(3.42)
i ' iVt + 0 ; (4.45)
where 0 is the \neutral trajectory" determined by the vocal-tract sustaining the

neutral position along the analyzed interval. Finally, the log-area vector trajectory is
obtained from (see Eq. (3.29))
Y ' T y + 0y : (4.46)

6 The acoustic vectors are determined from formant log-frequency vectors using Eq. (3.33).
Comment
Now it is possible to estimate a plausible articulatory trajectory from a given acous-
tic trajectory. Conceptually, the method looks consistent. However, it must not
be forgotten that the vocal-tract articulation eort was arbitrarily represented by a
quadratic cost function, which is convenient from the mathematical point of view,
but has no physiological base to be adopted. From the results presented in the next
chapter, it will be possible to evaluate the performance of the method under the
limitations imposed by this admittedly articial eort measure.
Chapter 5
Results and Discussion
\Computers are useless,
they only give answers."
Pablo Ruiz y Picasso (1881{1973)
Some results obtained with the method described in the previous chapter are given
here. The isolated frame procedure is analyzed in the rst section. The second sec-
tion analyzes the case of trajectories. Not surprisingly, the results obtained are not
always perfect, since the quadratic cost function used to measure the eort of each
vocal-tract position (or, in a broader sense, trajectory) was chosen more because it
allows a simple mathematical minimization procedure than for physiological charac-
teristics of the vocal-tract. Nevertheless, the results are enough to show that using
a simple relation between acoustic and articulatory parameters it is possible to rep-
resent acoustic constraints in the articulatory space, and combine them directly with
minimum eort and continuity constraints.
73
74 CHAPTER 5. RESULTS AND DISCUSSION
5.1 Isolated Frames

The procedure for inversion of the articulatory-to-acoustic mapping for isolated frames
described in the previous chapter was applied for the oral vowel frames contained in
the corpus. All 258 analyzed frames are shown in Appendix B. Some selected frames
are used here to interpret the results.
The four frames shown in Fig. 5.1 show good results obtained for dierent vow-
els. In general, the good agreement between original areas (derived from midsagittal
distances) and estimated areas (obtained from the formant frequencies determined
by the original areas) prevails in most of the frames analyzed. In spite of that, large
discrepancies also exist, and is exactly the comprehension of the dierent types of
discrepancies that may allow future improvements for the system.
Four types of error are shown in Fig. 5.2. From left to right, the rst column
shows the case of excessively open lips. A possible explanation for that comes from
the fact that the inversion procedure is carried out in the log-area domain, what
makes large areas less sensitive to errors than small areas. From the acoustic point of
view it makes sense, since formant frequency variations also depend on the log-area
rather than on the area function itself. However, from the articulatory point of view,
considerably large, sometimes unacceptable, errors can occur.
The second column shows the case of an excessively large oral cavity behind a very
narrow constriction at the lips. From the acoustic point of view, quite large variations
on wide areas behind narrow constrictions have small eects on formant frequencies.
From the articulatory point of view, the quadratic cost function used to evaluate
vocal-tract eort to reach a position seems to be very \mild" with respect to large
variations in the oral cavity. A possible explanation for that comes from the axial
symmetry of the quadratic cost function: since complete closures are accomplished
in the oral cavity with little eort, and since a complete closure is associated with a
\minus innite log-area," very large oral areas become also associated with little eort.
The clipping procedure to avoid closures used in Chapter 2 during the construction of
the log-area corpus apparently was not enough to avoid this discrepancy. A possible
solution for this problem would be the use of an asymmetric cost function, but this
would turn the cost minimization procedure more complex. Another possibility is to
use continuity constraints in time. This is the subject of the next section. Although
an exhaustive analysis of trajectory estimation has not been carried out yet, the cases
analyzed did not present this kind of discrepancy.

The third column of Fig. 5.2 shows a very underestimated vocal-tract length. A
clear explanation for this mistake has not been found yet, but this is one more case
where continuity constraints in time could help.
The right column of Fig. 5.2 shows a case where an underestimation of the vocal-
tract length is compensated by a partial lip closure. This kind of compensation is not
unlikely to happen in real speech. This error shows that, even under morphological
constraints represented correctly, more than one plausible shape of the vocal-tract
can produce the same set of formants.
Finally, we call attention to the fact that, even when the inversion procedure
failed in estimating the correct area function, the transfer functions associated with
original and estimated areas match fairly well. This fact indicates that most of the
articulatory errors observed have small acoustic eects.
5.2 Trajectories
Adding the continuity constraints explained in Section 4.2 to the method of combi-
nation of acoustic and morphological information described in the Section 4.1, it was
possible to estimate sequences of area functions from the corresponding rst three
formant trajectories.
Some characteristics of the inversion procedure are illustrated in the following
way. In the example given in Fig. 5.3, the sequence of area functions shown in the
top panel was used to generate the formant trajectories shown in the bottom panel.
These trajectories were then used to recover the original sequence of areas, under
minimum eort and continuity constraints. The result is shown in the top panel of
Fig. 5.4. The search for the best sequence of areas was performed in the articulatory
trajectory \ " space. Note, however, that the sequence of areas shown in Fig. 5.4
is close, but not identical, to that shown in Fig. 5.3. A possible reason for this is
associated with the fact that the mathematical cost function used does not perfectly
re ect the articulation eort determined by the human physiology. Another reason
is that the parametrization procedure allows only an approximated reconstruction of
the original sequence of areas.
For comparison purposes, it is interesting to see the results when the same problem
1 Speech Signal PB0311 1 Speech Signal PB2854 1 Speech Signal PB1560 1 Speech Signal PB1754
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Time (ms) Time (ms) Time (ms) Time (ms)
Power Spectrum (dB) PB0311 Power Spectrum (dB) PB2854 Power Spectrum (dB) PB1560 Power Spectrum (dB) PB1754
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz)
10 10 10 10
Area Function PB0311 Area Function PB2854 Area Function PB1560 Area Function PB1754
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Distance from Glottis (cm) Distance from Glottis (cm) Distance from Glottis (cm) Distance from Glottis (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsagittal Distances PB0311 Midsagittal Distances PB2854 Midsagittal Distances PB1560 Midsagittal Distances PB1754
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
Vocal−Tract Profile PB0311 Vocal−Tract Profile PB2854 Vocal−Tract Profile PB1560 Vocal−Tract Profile PB1754
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
Figure 5.1: Results obtained with the inversion technique for isolated frames. In each
column, the central panel shows original (thin line) and estimated area (thick line). The
estimated area is obtained from the formant frequencies determined by the original area.
Vocal-tract prole, midsagittal distances, transfer functions and speech signal are also
shown for reference purposes. From left to right the columns correspond to the neutral,
and French /a/, /i/, and /u/ vowels.
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
Figure 5.2: Problems with the inversion procedure. The columns show the following cases.
Left: French /a/ with excessively open lips. Center-left: French /u/ with excessively large
front cavity. Center-right: French /i/ with excessively short length. Right: French /e/
with excessively closed lips compensating underestimated length. As in Fig. 5.1, in the
central panel of each column, the thin line is the original area and the thick line is the
area estimated from the formant frequencies determined by the original area.
is solved using a truncated Fourier series to represent the log-area function. We start
with the analysis carried out by Mermelstein (1967), who parametrized the vocal-tract
log-area function by the rst six coecients of its Fourier cosine series expansion. It
was veried that, when the even coecients are all equal to zero, as already mentioned,
there exists a one-to-one relationship between the rst three formant frequencies
and the three odd Fourier coecients. Using this property, an interactive procedure
was implemented to nd the unique set of odd Fourier coecients associated with a
given set of formant frequencies, when all even Fourier coecients are equal to zero.
This procedure was used to obtain the sequence of areas shown in Fig. 5.5 from the
formant trajectories shown in Fig. 5.3. Note that the result is substantially dierent
from the original sequence of areas (top panel of Fig. 5.3). This conrms the fact
that setting all even Fourier coecients to zero is an articial constraint that does
not re ect the geometrical constraints determined by the vocal-tract morphology.
When the mathematical framework described in the previous chapter to incorporate
such morphological constraints is used with the Fourier representation described in
Chapter 3, the sequence of areas shown in Fig. 5.6 is obtained. It can be seen
that it resembles the original sequence of areas shown in Fig. 5.3. However, abrupt
variations, inherent in some regions of the vocal-tract, cannot be well approximated,
due to the smooth character of the cosine functions, which form the basis of the
Fourier cosine series representation. This is in contrast with the eigenvectors used
in the principal component representation, which allow a good representation of the
vocal-tract structure. When the principal component representation is used in place
of the Fourier representation, the result obtained is the sequence of areas shown in
Fig. 5.4.
As a nal observation, the similarity of the formant frequency trajectories associ-
ated with the sequences of areas shown in Figures 5.3, 5.4, 5.5, and 5.6, show that,
even under continuity constraints, substantially dierent sequences of area functions
can generate basically the same formant trajectories.
Original Area
5
Area (cm2)
0
0 0
5
0.1 10
15
0.2 Position (cm)
Time (s)
Formant Frequency Trajectories

Frequency (kHz)
0 0.1 0.2
Time (s)
Figure 5.3: Top: Sequence of area functions, taken from the corpus, corresponding to
the diphthong /ui/, uttered in the French sentence \Luis pense a ca." Bottom: Formant
frequency trajectories corresponding to the sequences of areas shown in the top panel.
Principal Components
5
Area (cm2)
0
0 0
5
0.1 10
15
0.2 Position (cm)
Time (s)

Frequency (kHz)
2
F3: 2.6%
Max. Diff. F2: 6.4%
1
F1: 7.8%
0 0.1 0.2
Time (s)
Figure 5.4: Top: Sequence of areas estimated from the formant trajectories shown in
the bottom panel of Fig. 5.3, under continuity and minimum eort constraints. Bottom:
The solid lines show the formant frequency trajectories corresponding to the sequences of
areas shown in the top panel. The dashed lines reproduce the original formant trajectories
shown in the bottom panel of Fig. 5.3.
Fourier (Odd Terms)
5
Area (cm2)
0
0 0
5
0.1 10
15
0.2 Position (cm)
Time (s)

Frequency (kHz)
2
F3: 4.7%
Max. Diff. F2: 6.5%
1
F1: 3.2%
0 0.1 0.2
Time (s)
Figure 5.5: Top: Sequence of area functions estimated from the formant trajectories
shown in Fig. 5.3 under the following constraint: The areas are represented by the rst
six components of its Fourier cosine series expansion with the even coecients set to
zero. Bottom: The solid lines show the formant frequency trajectories corresponding to
the sequences of areas shown in the top panel. The dashed lines reproduce the original
formant trajectories shown in the bottom panel of Fig. 5.3.
Fourier (All Terms)

5
Area (cm2)
0
0 0
5
0.1 10
15
0.2 Position (cm)
Time (s)

Frequency (kHz)
2
F3: 3.2%
Max. Diff. F2: 5.4%
1
F1: 2.1%
0 0.1 0.2
Time (s)
Figure 5.6: Top: Sequence of area functions estimated from the formant trajectories
shown in Fig. 5.3 under the following constraint: The areas are represented by the rst
nine components of its Fourier cosine series expansion determined under morphological
and continuity constraints. Bottom: The solid lines show the formant frequency trajec-
tories corresponding to the sequences of areas shown in the top panel. The dashed lines
reproduce the original formant trajectories shown in the bottom panel of Fig. 5.3.
5.3. QUANTITATIVE ANALYSIS 83
5.3 Quantitative Analysis

The qualitative analysis done in sections 5.1 and 5.2 was important to understand
the limitations of the method as well as the reasons for the discrepancies observed.
In this section, a quantitative analysis based on the 258 frames of the corpus that
correspond to oral vowels is carried out. The objective is to give a measure of the
performance of the method and to understand its global behavior.
The rst step in this analysis is to verify the in uence of the parametrization
procedure on the original cross-sectional areas. It is illustrated in Fig. 5.7a, where
all cross-sectional areas in logarithmic scale1 recovered from the parametric represen-
tation as vectors are plotted against their original counterparts. The correlation
coecient2 (Papoulis, 1991, p. 152) of 0.965 indicates that the parametric represen-
tation is good, but the error implied by it is not negligible. Looking at Table 5.1, the
mean relative error of 0:9 % indicates that, globally, the parametrization procedure
does not cause any signicant bias. The standard deviation of the relative error of
21 % is indeed signicant, but still acceptable. In particular, note that the errors due
to parametrization caused only small deviations in the acoustic space (see Fig. 5.7e
and Table 5.3).
Next, original areas and areas estimated from isolated frames are compared (Fig.
5.7b and Table 5.1). A reasonably good correlation coecient of 0.828 is obtained.
Note, however, the minimal error in the acoustic space (Fig. 5.7f and Table 5.3). The
standard deviation of the relative error of 56 % is high, but lower bounded by the 21 %
standard deviation parametrization error. Also, the very good matching observed in
the acoustic space aects the accuracy of the areas estimated in the articulatory
space.
When sequences of log-area vectors are estimated instead of isolated vectors, the
correlation coecient increases only marginally from 0.828 to 0.832 (Fig. 5.7c and
Table 5.1). Nevertheless, observing the scattering shown in Fig. 5.7d and the standard
deviation of the relative error of 17 % plotted in Table 5.1, it is seen that the the areas
estimated are signicantly dierent. The estimation based on sequences of vectors
1 There are 258 log-area vectors, each of them containing 32 log-areas. So, each scattering in
the top row of Fig. 5.7 contains 258 32 = 8256 points. In the bottom row, since there are three
formants per vector, each scattering contains 774 points.
2 The correlation coecient was computed in logarithmic scale as
p
E[log A1 log A2]= E[(log A1)2 ]E[(log A1 )2 ].
yields, naturally, smoother trajectories of log-area vectors. The price paid for that
is a small degradation in the matching observed in the acoustic space (Fig. 5.7g and
Table 5.3).
Finally, the results obtained for length estimation (Table 5.2) show correlation
coecients considerably lower than those obtained for cross-sectional areas. This
fact indicates the need for a more appropriate method to handle length information.
The small values observed for the standard deviation of the relative error are due to
the fact that length variations are small compared with total vocal-tract length. This
is in contrast with cross-sectional areas, which vary from values very close to zero up
to several square centimeters.
Summarizing, the high correlation coecients observed in the acoustic space con-
rm that the acoustic constraint imposed by the formant vectors is respected dur-
ing the log-area vector estimation. In the articulatory space, correlation coecients
around 0.83 indicate that the model works, but still has to be improved.
Table 5.1: Numerical Results: Areas

Mean Di. Std. Dev. Corr. Coef
Parametrized vs. Original 0.9 % 21 % 0.965
Isolated Frames vs. Original 4.1 % 56 % 0.828
Trajectories vs. Original 3.9 % 54 % 0.832
Isolated Frames vs. Trajectories -0.2 % 17 % 0.979
Trajectories vs. Parametrized 3.0 % 45 % 0.872
Table 5.2: Numerical Results: Length

Parametrized vs. Original 0.44 % 0.7 % 0.987
Isolated Frames vs. Original -0.19 % 4.2 % 0.607
Trajectories vs. Original 0.19 % 3.9 % 0.635
Isolated Frames vs. Trajectories -0.38 % 1.9 % 0.928
Trajectories vs. Parametrized -0.25 % 4.0 % 0.603
5.3. QUANTITATIVE ANALYSIS 85
Table 5.3: Numerical Results: Formants

Parametrized vs. Original -0.40 % 1.5 % 0.99925
Isolated Frames vs. Original -0.04 % 0.2 % 0.99999
Trajectories vs. Original -0.10 % 1.3 % 0.99936
Isolated Frames vs. Trajectories 0.06 % 1.3 % 0.99937
Trajectories vs. Parametrized 0.30 % 1.8 % 0.99884
Param. vs. Orig. Isol. Frm. vs. Orig. Traject. vs. Orig. Isol. Frm. vs. Traject.
100 (a) 0.965 100 (b) 0.828 100 (c) 0.832 100 (d) 0.979
Area (cm2)
10 10 10 10
1 1 1 1
0.1 0.1 0.1 0.1
0.01 0.01 0.01 0.01

0.01 0.1 1 10 100 0.01 0.1 1 10 100 0.01 0.1 1 10 100 0.01 0.1 1 10 100
Area (cm2) Area (cm2) Area (cm2) Area (cm2)
10 10 10 10
(e) 0.99925 (f) 0.99999 (g) 0.99936 (h) 0.99937
Freq. (kHz)
1 1 1 1
0.1 0.1 0.1 0.1

0.1 1 10 0.1 1 10 0.1 1 10 0.1 1 10
Freq. (kHz) Freq. (kHz) Freq. (kHz) Freq. (kHz)
Figure 5.7: (a) Scattering of the cross-sectional areas obtained from the parametric
principal component representation of the original areas plotted against their original
counterparts. The scattering of the formant frequencies derived from the areas is shown
in (e). (b) Cross-sectional areas estimated from formant vectors in the case of isolated
frames plotted against original areas. The formant frequencies derived from the areas are
plotted in (f). (c) Cross-sectional areas estimated from formant vector trajectories plotted
against original areas. The formant frequencies derived from the areas are plotted in (g).
(d) Cross-sectional areas estimated from isolated frames plotted against areas estimated
from formant vector trajectories. The formant frequencies derived from the areas are
plotted in (h). The correlation coecients are given in the top right corner. The 258 oral
vowel frames available in the corpus were used to generate the scatterings.
Chapter 6
Conclusion
\Words are words."
William Shakespeare (1564{1616)
Othello Act I Scene III
In this study, a method to combine dierent pieces of information in a restricted

case of the speech production inverse problem, namely the formant-to-area determi-
nation problem, was presented. The initial formulation is based on a Fourier analysis
of the vocal-tract log-area function, already described by Mermelstein (1967). The
novelty is that vocal-tract morphological constraints are invoked to cope with the
underdetermined problem of obtaining a complete set of log-area Fourier coecients
from formant frequencies. After that, the Fourier representation is substituted by an
optimal principal component representation of the log-area function which allows a
better characterization of the vocal-tract. As a nal point, the analysis is generalized
from isolated frames to trajectories of log-area parameters. This allows a natural
implementation of continuity constraints in the articulatory domain.
86
87
The implemented system uses a Newton-Raphson iterative procedure to solve the

non-linear system that arises in the framework formulation. The solution took on
average four iterations to converge, usually but not always, to a position close to the
right solution. It is, in principle, more ecient than analysis-by-synthesis techniques
(Shirai and Kobayashi, 1986; Schroeter and Sondhi, 1991) that require a much larger
number of iterations. Also, it gives a better insight into the problem than the neural
network (Shirai, 1993) and the genetic algorithm (McGowan, 1994) approaches.
The main weak point of the method developed in this study is the limited exibility
of the cost function chosen to quantify the vocal-tract eort during speech production.
The quadratic form used has the merit of allowing a simple minimization procedure,
but does not re ect well the vocal-tract eort for positions far from the neutral
articulatory position. In spite of that, the method worked satisfactorily for most of
the analyzed cases.
From the experimental results, since there was always a very good match between
reference and estimated transfer functions (at least up to the third formant region);
and since the matching between reference and estimated areas was not perfect; it is
possible to conclude that the regions that did not match well were mainly the regions
that have little in uence on the vocal-tract acoustic response (up to the third formant
region).
One important point observed during the analysis is that, when appropriately
represented, the mapping between articulatory and acoustic properties of the human
vocal-tract is not complex, having a dominant linear component.
Another interesting conclusion that can be drawn is that, since the obtained trans-
fer functions were derived only from the formant frequencies, making use of prior in-
formation about morphological and continuity constraints; and since a good spectral
matching was obtained (up to the third formant region); it is possible to say that, if
morphological information is available, it is possible to derive the vocal-tract transfer
function from the formant frequencies (cf Fant, 1956). It remains to be shown if the
human being makes use of such redundancy and, if so, in what way.
Appendix A
Numeric Information
Most of the numeric information used in the implementation of the vocal-tract para-
metric model described in this study was not included in the main text. Instead, for
practical purposes, it is given in this appendix, and can be used by the interested
reader to implement, test and analyze the model proposed.
In order to do this, some observations are important: The rst one is that the
tract length is expressed in normalized units, which can be converted into centimetres
as follows
1 length unit = 0:534 cm:
The second observation is about the procedure used to \ll" the articulatory space in
Section 3.2.2: rst, a suciently high number of points is uniformly generated in the
hyperrectangle dened by min and max. After that, the corresponding log-area
vectors are calculated, and those that exceed the limits dened by ymin and ymax
are discarded, since they probably correspond either to unrealistic area functions or
to areas with constrictions that are too narrow. The nal observation is about the
procedure used to estimate the formants associated with a given area function: they
can be determined using the wave propagation model described in Section 2.2.3.
88
89
2 3
0:006 0:019 0:004 0:004 0:052
6 0:010 0:035 0:026 0:010 0:130 7
6
6 0:017 0:045 0:036 0:016 0:177 7
7
6 0:030 0:026 0:012 0:110 0:142 7
6
6 0:039 0:050 0:002 0:030 0:000 7
7
6 0:057 0:087 0:023 0:013 0:136 7
6 0:083 0:082 0:029 0:004 0:142 7
6
6 0:101 0:071 0:042 0:034 0:161 7
7
6 0:111 0:054 0:053 0:059 0:159 7
6
6 0:113 0:033 0:059 0:079 0:137 7
7
6 0:105 0:014 0:060 0:088 0:112 7
6
6 0:091 0:001 0:058 0:090 0:091 7
7
6 0:074 0:009 0:052 0:084 0:080 7
6 0:072 0:017 0:048 0:106 0:102 7
6
6 0:078 0:043 0:056 0:154 0:159 7
7
0:076 0:059 0:055 0:170 0:170
Uy = 6
6
6
6
0:056
0:011
0:136
0:226
0:063
0:092
0:208
0:233
0:177
0:199
7
7
7
7
;
6 0:043 0:220 0:128 0:199 0:232 7
0:102 0:205 0:135 0:131 0:227
6 7
6 7
6 0:184 0:216 0:136 0:047 0:217 7
6
6 0:264 0:205 0:130 0:048 0:192 7
7
6 0:307 0:169 0:099 0:120 0:129 7
6
6 0:324 0:125 0:062 0:157 0:068 7
7
6 0:334 0:081 0:028 0:156 0:019 7
6 0:340 0:026 0:010 0:131 0:020 7
6
6 0:348 0:077 0:052 0:062 0:048 7
7
6 0:331 0:208 0:106 0:112 0:020 7
6
6 0:288 0:296 0:165 0:355 0:006 7
7
6 0:202 0:262 0:247 0:537 0:002 7
6 0:005 0:157 0:514 0:165 0:111 7
4
0:099 0:304 0:712 0:259 0:269 5
0:006 0:584 0:002 0:353 0:599

2 3 2 3
0:82 0:28 1:22
6 0:46 7 6 0:12 1:14 7
6
6 0:02 7
7
6
6 0:82 1:17 7
7
6 0:39 7 6 0:76 1:13 7
6
6 0:81 7
7
6
6 0:36 1:47 7
7
6 1:00 7 6 0:19 1:87 7
6 1:19 7 6 0:20 1:93 7
6
6 1:13 7
7
6
6 0:12 1:80 7
7
6 1:04 7 6 0:26 1:67 7
6
6 0:97 7
7
6
6 0:49 1:58 7
7
6 0:98 7 6 0:55 1:54 7
6
6 1:07 7
7
6
6 0:37 1:56 7
7
6 1:21 7 6 0:05 1:64 7
6 1:35 7 6 0:21 1:77 7
6
6 1:35 7
7
6
6 0:49 1:91 7
7
1:25 0:14 1:93
y = ; [ymin ymax] = ;
6 7 6 7
6
6 1:08 7
7
6
6 1:47 1:90 7
7
6 0:62 7 6 2:74 1:68 7
6 0:31 7 6 3:00 1:43 7
6
0:39 3:00 1:44
7 6 7
6 7 6 7
6 0:43 7 6 3:00 1:65 7
6
6 0:36 7
7
6
6 3:00 1:84 7
7
6 0:34 7 6 3:00 1:92 7
6
6 0:32 7
7
6
6 3:00 1:95 7
7
6 0:27 7 6 3:00 2:01 7
6 0:18 7 6 3:00 2:04 7
6
6 0:04 7
7
6
6 3:00 2:09 7
7
6 0:02 7 6 3:00 2:15 7
6
6 0:11 7
7
6
6 3:00 2:17 7
7
6 0:19 7 6 3:00 2:11 7
6 0:22 7 6 3:00 1:77 7
4
0:11 5 4
3:00 1:86 5
28:19 25:92 33:40

90 APPENDIX A. NUMERIC INFORMATION
2 3 2 3
0:334 0:053 0:655 0:356 0:743 0:701
1:300 0:348 0:376 0:541 0:121 0:624
= ; T = ;
6 7 6 7
6
4 0:027 7
5
6
4 0:225 0:006 0:440 0:859 1:001 7
5
0:572 0:143 0:195 0:747 0:542 0:018
0:407 0:519 0:142 0:248 0:638 0:469
2 3 2 3
6:120 8:763 0:164 0:458 0:185 0:854 0:000
7:643 3:452 0:813 0:030 0:264 0:083 0:511
[min max] = ; U = ;
6 7 6 7
6
4 4:146 4:535 7
5
6
4 0:378 0:046 0:429 0:045 0:818 7
5
2:678 2:194 0:410 0:012 0:836 0:253 0:263
2:400 2:279 0:032 0:887 0:115 0:444 0:029
2 3 2 3 2 3
2:60 15:4 7:0 18:5 0:063 0:970 0:236
f = 4
3:24 5 ; Tfg = 4
8:3 23:0 12:7 5 ; Uhg = 4
0:692 0:128 0:711 5 ;
3:43 5:2 7:2 35:4 0:719 0:208 0:663
2 3 2 3
2:8 10:1 18:3 0:457 0:007 0:068 0:109 0:810
Tfh = 4 17:0
5:6
1:7
22:8
8:1
37:2
5 ; Sh = 4 0:105
0:084
0:641
0:499
0:399
0:140
0:008
0:560
0:126
0:237
5 ;
2 3 2 3
0:83 0:011 0:019 0:007 0:043 0:056
6 0:45 7 6 0:021 0:043 0:005 0:089 0:161 7
6
6 0:02 7
7
6
6 0:035 0:065 0:001 0:150 0:191 7
7
6 0:55 7 6 0:027 0:023 0:020 0:025 0:256 7
6
6 0:90 7
7
6
6 0:016 0:045 0:008 0:014 0:021 7
7
6 1:08 7 6 0:032 0:092 0:016 0:102 0:153 7
6 1:26 7 6 0:016 0:103 0:015 0:095 0:176 7
6
6 1:17 7
7
6
6 0:002 0:110 0:009 0:085 0:226 7
7
6 1:05 7 6 0:019 0:108 0:001 0:063 0:247 7
6
6 0:95 7
7
6
6 0:034 0:096 0:011 0:030 0:241 7
7
6 0:94 7 6 0:043 0:080 0:018 0:003 0:219 7
6
6 1:01 7
7
6
6 0:046 0:063 0:023 0:012 0:195 7
7
6 1:14 7 6 0:043 0:049 0:021 0:013 0:176 7
6 1:25 7 6 0:054 0:042 0:013 0:011 0:218 7
6
6 1:17 7
7
6
6 0:080 0:036 0:009 0:003 0:324 7
7
1:04 0:092 0:027 0:007 0:004 0:349
0y = ; Tty = ;
6 7 6 7
6
6 0:75 7
7
6
6 0:130 0:018 0:023 0:002 0:383 7
7
6 0:13 7 6 0:153 0:069 0:060 0:016 0:424 7
6 0:18 7 6 0:103 0:069 0:093 0:057 0:432 7
6
0:05 0:047 0:080 0:110 0:104 0:365
7 6 7
6 7 6 7
6 0:01 7 6 0:009 0:112 0:128 0:171 0:271 7
6
6 0:02 7
7
6
6 0:075 0:136 0:140 0:228 0:154 7
7
6 0:05 7 6 0:122 0:151 0:127 0:231 0:016 7
6
6 0:13 7
7
6
6 0:148 0:156 0:102 0:205 0:087 7
7
6 0:15 7 6 0:162 0:162 0:071 0:151 0:139 7
6 0:13 7 6 0:171 0:166 0:030 0:079 0:158 7
6
6 0:08 7
7
6
6 0:188 0:160 0:031 0:044 0:117 7
7
6 0:13 7 6 0:175 0:145 0:132 0:224 0:085 7
6
6 0:20 7
7
6
6 0:105 0:149 0:240 0:451 0:344 7
7
6 0:17 7 6 0:032 0:184 0:335 0:584 0:491 7
6 0:09 7 6 0:300 0:279 0:478 0:183 0:178 7
4
0:18 5 4
0:353 0:269 0:604 0:896 0:092 5
28:89 0:409 0:370 0:149 0:661 0:401

91
2 3
0:024 0:024 0:011 0:019 0:026
6 0:047 0:044 0:005 0:042 0:077 7
6
6 0:070 0:069 0:003 0:070 0:093 7
7
6 0:002 0:069 0:029 0:017 0:126 7
6
6 0:017 0:089 0:044 0:000 0:016 7
7
6 0:056 0:143 0:051 0:040 0:066 7
6 0:023 0:167 0:046 0:036 0:074 7
6
6 0:009 0:176 0:029 0:032 0:096 7
7
6 0:042 0:172 0:007 0:024 0:105 7
6
6 0:073 0:154 0:014 0:011 0:102 7
7
6 0:090 0:128 0:031 0:000 0:093 7
6
6 0:094 0:099 0:042 0:005 0:084 7
7
6 0:086 0:074 0:044 0:005 0:077 7
6 0:096 0:061 0:046 0:003 0:097 7
6
6 0:129 0:042 0:070 0:006 0:147 7
7
0:144 0:024 0:081 0:008 0:159
T y = 6
6
6
6
0:203
0:240
0:064
0:184
0:148
0:248
0:014
0:036
0:180
0:211
7
7
7
7
:
6 0:161 0:226 0:283 0:063 0:226 7
6
6 0:066 0:270 0:288 0:089 0:204 7
7
6 0:031 0:355 0:308 0:127 0:173 7
6
6 0:145 0:422 0:305 0:158 0:129 7
7
6 0:228 0:438 0:256 0:156 0:068 7
6
6 0:278 0:425 0:193 0:137 0:017 7
7
6 0:312 0:410 0:129 0:104 0:012 7
6 0:346 0:388 0:051 0:062 0:026 7
6
6 0:409 0:340 0:069 0:009 0:015 7
7
6 0:447 0:263 0:229 0:112 0:065 7
6
6 0:396 0:203 0:365 0:236 0:168 7
7
6 0:209 0:180 0:433 0:312 0:215 7
6 0:228 0:244 0:370 0:039 0:045 7
4
0:300 0:184 0:440 0:357 0:094 5
0:765 0:526 0:437 0:265 0:190

Appendix B
Results for Isolated Frames
The inverse problem algorithm for isolated frames was applied to all oral vowel frames
present in the analyzed corpus. The results are shown in the next pages. From the
bottom each set of panels shows:
Vocal-tract midsagittal prole extracted from cineradiographic data (Bothorel
et al., 1986) plotted on semi-polar grid; and labiogram simultaneously acquired.
Midsagittal distances computed with the method described in Section 2.1.2.
The thin line shows the area function estimated from the midsagittal prole
using the model described in Section 2.1.3. The formants determined
by this area function (using the method described in Section 2.2.3) are used to
estimate the area function shown by the thick line using the method described
in Chapter 4.
Thin black line: vocal-tract transfer function determined from the area function
estimated from the midsagittal prole. Thick black line: vocal tract transfer
function determined from the area function estimated from formant frequencies.
Gray line: power spectrum envelope estimated from the speech signal.
Speech signal recorded during the acquisition of the midsagittal prole shown
in the bottom panel.
92
93
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
5 Area (cm2) 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
94 APPENDIX B. RESULTS FOR ISOLATED FRAMES
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
95
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
97
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
99
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
101
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
103
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
105
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
107
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
109
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
111
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
113
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
115
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
117
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
119
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
121
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
123
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)

Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
0 0 0 0
−1 −1 −1 −1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
40 40 40 40
20 20 20 20
0 0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
10 10 10 10
Area (cm2)
Area (cm2)
Area (cm2)
Area (cm2)
5 5 5 5
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)
Midsag. Dist. (cm)

Midsag. Dist. (cm)
4 4 4 4
2 2 2 2
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
15 15 15 15
25 20 25 20 25 20 25 20
30 30 30 30
10 10 10 10
15 15 15 15
cm
cm
cm
cm
10 10 10 10
5 5 5 5
5 5 5 5
0 0 0 0
0 0 0 0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
cm cm cm cm
125

0 0
−1 −1
0 5 10 15 20 0 5 10 15 20
Time (ms) Time (ms)
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
10 10
Area (cm2)
Area (cm2)
5 5
0 0
0 5 10 15 0 5 10 15
Midsag. Dist. (cm)
Midsag. Dist. (cm)

4 4
2 2
0 0
0 5 10 15 0 5 10 15
15 15
25 20 25 20
30 30
10 10
15 15
cm
cm
10 10
5 5
5 5
0 0
0 0
0 5 10 15 0 5 10 15
cm cm
Bibliography
[ACT78] B. S. Atal, J. J. Chang, and J. W. Tukey. Inversion of articulatory-to-
acoustic transformation in the vocal-tract by a computing sorting tech-
nique. The Journal of the Acoustical Society of America, 63(5):1535{1555,
1978.
[AH71] B. S. Atal and S. L. Hanauer. Speech analysis and synthesis by linear
prediction of the speech wave. The Journal of the Acoustical Society of
America, 50:637{655, 1971.
[BBL95] D. Beautemps, P. Badin, and R. Laboissiere. Deriving vocal-tract area
functions from midsagittal proles and formant frequencies: a new model
for vowels and fricative consonants based on experimental data. Speech
Communication, 16:27{47, 1995.
[BGGN91] T. Baer, J. C. Gore, L. C. Gracco, and P. W. Nye. Analysis of vocal tract
shape and dimensions using magnetic resonance imaging. The Journal
of the Acoustical Society of America, 90(2):799{828, 1991.
[BLS91] G. Bailly, R. Laboissiere, and J. L. Schwartz. Formant trajectories as au-
dible gestures: an alternative for speech synthesis. Journal of Phonetics,
19:9{23, 1991.
[BS95] A. Bell and J. Seijnowski. Blind separation and blind deconvolution: an
information-theoretic approach. In Proc. IEEE International Conference
on Acoustics, Speech and Signal Processing, 1995.
[BSWZ86] A. Bothorel, P. Simon, F. Wioland, and J. P. Zerling. Cineradiographie de
voyelles et consonnes du Francais. Institut de Phonetique de Strasburg,
1986.
126
BIBLIOGRAPHY 127
[CF66] C. Coker and O. Fujimura. Model for specication of the vocal tract area
function. The Journal of the Acoustical Society of America, 40:1271,
1966.
[CK41] T. Chiba and M. Kajiyama. The Vowel { Its Nature and Structure.
Tokyo, 1941.
[Cok76] C. Coker. A model of articulatory dynamics and control. Proc. IEEE,
64(4):452{460, 1976.
[Com94] P. Comon. Independent component analysis, a new concept? Signal
Processing, 36:287{314, 1994.
[Dav63] H. F. Davis. Fourier Series and Orthogonal Functions. Dover, 1963.

[Eis66] E. Eisner. Complete solutions of the `webster' horn equation. The Journal
of the Acoustical Society of America, 41(4):1126{1146, 1966.
[Fan67] G. Fant. On the predictability of formant levels and spectrum envelopes

from formant frequencies. In I. Lehiste, editor, Readings in Acoustic
Phonetics, pages 44{56. MIT, 1967. (Rep. from R. Jakobson, M. Halle,
and H. MacLean, eds., Mouton, 1956.).
[Fan70] G. Fant. Acoustic Theory of Speech Production. The Hague, 1970.
[Fan80] G. Fant. The relations between the area functions and the acoustic signal.
Phonetica, 37:55{86, 1980.
[FIS79] J. L. Flanagan, K. Ishizaka, and K. L. Shipley. Signal models for low bit-
rate coding of speech. The Journal of the Acoustical Society of America,
68(3):780{791, 1979.
[Fla55] J. L. Flanagan. A dierence limen for vowel formant frequency. The
Journal of the Acoustical Society of America, 27:613{617, 1955.
[Fla72] J. L. Flanagan. Speech Analysis, Synthesis, and Perception. Springer-

Verlag, 1972.
128 BIBLIOGRAPHY
[GS93] S. K. Gupta and J. Schroeter. Pitch synchronous frame-by-frame and

segment-based articulatory analysis by synthesis. The Journal of the
Acoustical Society of America, 94(5):2517{2530, 1993.
[HJ85] R. Horn and C. Johnson. Matrix Analysis. Cambridge, 1985.
[HS64] J. M. Heinz and K. N. Stevens. On the derivation of area functions and
acoustic spectra from cineradiographic lms of speech. The Journal of
the Acoustical Society of America, 36:1037, 1964.
[IS73] F. Itakura and S. Saito. Analysis synthesis telephony based on the maxi-
mum likelyhood method. In J. Flanagan and R. Rabiner, editors, Speech
Synthesis, pages 289{292. Dowden, Hutchinson & and Ross, 1973. (Rep.
from 6th Int. Cong. Acoust., Tokyo, 1968.).
[JN84] N. Jayant and P. Noll. Digital Coding of Waveforms. Springer-Verlag,
1984.
[Jor90] M. Jordan. Motor learning and the degrees of freedom problem. In
M. Jeannerod, editor, Attention and Performance, vol. XIII, pages 797{
836. Erlbaum, 1990.
[KL73] J. L. Kelly and C. C. Lochbaum. Speech synthesis. In J. Flanagan and
R. Rabiner, editors, Speech Synthesis, pages 127{130. Dowden, Hutchin-
son & and Ross, 1973. (Rep. from 4th Int. Cong. Acoust., Copenhagen,
1962.).
[Lin90a] Q. Lin. Speech production theory and articulatory speech synthesis. PhD
thesis, Royal Institute of Technology (KTH), Stockholm, 1990.
[Lin90b] B. Lindblom. Explaining phonetic variation: a sketch of the h & h theory.
In W. J. Hardcastle and A. Marchal, editors, Speech Production and
Speech Modelling, pages 403{439. Kluwer Academic Publishers, 1990.
[Mae72] S. Maeda. On the conversion of x-ray data into formant frequencies.
Technical report, Bell Laboratories, Murray Hill, N.J., 1972.
[Mae82] S. Maeda. A digital simulation method of the vocal-tract system. Speech
Communication, 1(3{4):199{229, 1982.
BIBLIOGRAPHY 129
[Mae90] S. Maeda. Compensatory articulation during speech: evidence from the

analysis and synthesis of vocal-tract shapes using an articulatory model.
In W. J. Hardcastle and A. Marchal, editors, Speech Production and
Speech Modelling, pages 131{149. Kluwer Academic Publishers, 1990.
[McG94] R. S. McGowan. Recovering articulatory movement from formant fre-
quency trajectories using task dynamics and a genetic algorithm: pre-
liminary model tests. Speech Communication, 14:19{48, 1994.
[Mer67] P. Mermelstein. Determination of vocal-tract shape from measured for-
mant frequencies. The Journal of the Acoustical Society of America,
41(5):1283{1294, 1967.
[Mer73] P. Mermelstein. Articulatory model for the study of speech production.
The Journal of the Acoustical Society of America, 53(4):1070{1082, 1973.
[MG76] J. D. Markel and A. H. Gray. Linear Prediction of Speech. Springer-
Verlag, 1976.
[Pap91] A. Papoulis. Probability, Random Variables, and Stochastic Processes.
McGraw-Hill, 1991.
[PBS92] P. Perrier, L. J. Boe, and R. Sock. Vocal tract area function estimation
from midsagittal dimensions with CT scans and a vocal tract cast: mod-
elling the transition with two sets of coecients. Journal of Speech and
Hearing Research, 35:53{67, 1992.
[PCS+ 92] J. S. Perkell, M. H. Cohen, M. A. Svirsky, M. L. Matthies, I. Garabieta,
and M. T. T. Jackson. Electromagnetic midsagittal articulometer sys-
tems for transducing speech articulatory movements. The Journal of the
[Per69] J. S. Perkell. Physiology of speech production: Results and implications
of a quantitative cineradiographic study. Master's thesis, M.I.T., 1969.
[RJ93] L. Rabiner and B. W. Juang. Fundamentals of Speech Recognition. Pren-
tice Hall, 1993.
130 BIBLIOGRAPHY
[RS78] L. Rabiner and R. Schafer. Digital Processing of Speech Signals. Prentice

Hall, 1978.
[Sal46] V. Salmon. Generalized plane wave horn theory. The Journal of the
[Sch67] M. R. Schroeder. Determination of the geometry of the human vocal-
tract by acoustical measurements. The Journal of the Acoustical Society
of America, 41(4):1002{1010, 1967.
[Scu90] C. Scully. Articulatory synthesis. In W. J. Hardcastle and A. Marchal,
editors, Speech Production and Speech Modelling, pages 151{186. Kluwer
Academic Publishers, 1990.
[Shi93] K. Shirai. Estimation and generation of articulatory motion using neural
networks. Speech Communication, 13:45{51, 1993.
[SK86] K. Shirai and T. Kobayashi. Estimating articulatory motion from the
speech wave. Speech Communication, 5:159{170, 1986.
[Son74] M. M. Sondhi. Model for wave propagation in a lossy vocal-tract. The
Journal of the Acoustical Society of America, 55(5):1070{1075, 1974.
[Son79] M. M. Sondhi. Estimation of vocal-tract areas: The need for acousti-
cal measurements. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 27(3):268{273, 1979.
[SR83] M. M. Sondhi and J. R. Resnick. The inverse problem for the vocal-tract:
Numerical methods, acoustical experiments, and speech synthesis. The
Journal of the Acoustical Society of America, 73(3):985{1002, 1983.
[SS87] M. M. Sondhi and J. Schroeter. A hybrid time-frequency domain articu-
latory speech synthesizer. IEEE Transactions on Acoustics, Speech, and
Signal Processing, 35(7):955{967, 1987.
[SS91] J. Schroeter and M. Sondhi. Speech coding based on physiological models
of speech production. In M. M. Sondhi and S. Furui, editors, Advances
in Speech Processing, pages 231{268. Marcel Dekker, 1991.
BIBLIOGRAPHY 131
[SS94] J. Schroeter and M. M. Sondhi. Techniques for estimating vocal-tract

shapes from the speech signal. IEEE Transactions on Speech and Audio
Processing, 2(1):133{150, 1994.
[TY96] M. K. Tiede and H. Yehia. A shape-based approach to vocal tract area
function estimation. To appear in The Proceedings of the 1996 Joint
Meeting of the Acoustical Society of America and the Acoustical Society
of Japan, 1996.
[TYVB96] M. K. Tiede, H. Yehia, and E. Vatikiotis-Bateson. A shape-based ap-
proach to vocal tract area function estimation. In Proceedings of the 1st
ESCA Tutorial and Research Workshop on Speech Production Modeling
& 4th Speech Production Seminar, pages 41{44, 1996.
[VK94] V. Valimaki and M. Karjalainen. Improving the kelly-lochbaum vocal
tract model using conical tube sections and fractional delay ltering tech-
niques. In Proc. International Conference on Spoken Language Process-
ing, pages S12{12.1{S12{12.4, 1994.
[Wak73] H. Wakita. Direct estimation of the vocal-tract shape by inverse ltering
of the acoustic speech waveforms. IEEE Transactions on Audio and
Electroacoustics, AU-21(5):417{427, 1973.
[Wak79] H. Wakita. Estimation of vocal-tract shapes from acoustical analysis of
the speech wave: the state of the art. IEEE Transactions on Acoustics,
Speech, and Signal Processing, ASSP-27(3):281{285, 1979.
[Web19] A. G. Webster. Acoustical impedance, and the theory of horns and of
the phonograph. Proc. Natl. Acad. Sci. (U.S.), 5:275{282, 1919.
[YHI95a] H. Yehia, M. Honda, and F. Itakura. Acoustic measurements of the
vocal-tract area function: sensitivity analysis and experiments. In Proc.
IEEE International Conference on Acoustics, Speech and Signal Process-
ing, pages 652{655, 1995.
[YHI95b] H. Yehia, M. Honda, and F. Itakura. Acoustical measurements of the
vocal-tract area function: System modelling and experimental results.
132 BIBLIOGRAPHY
In Proceedings of the 1995 Spring Meeting of the Acoustical Society of

Japan, pages 305{306, 1995.
[YI93a] H. Yehia and F. Itakura. Dynamic vocal-tract shape determination
from formant frequencies using two-dimensional Fourier analysis. SP-92
143, Institute of Electronics, Information and Communication Engineers,
1993.
[YI93b] H. Yehia and F. Itakura. Variational and perturbation analysis applied to
determination of vocal-tract formants. In Proceedings of the 1993 Autumn
Meeting of the Acoustical Society of Japan, pages 285{286, 1993.
[YI94] H. Yehia and F. Itakura. Determination of human vocal-tract dynamic
geometry from formant trajectories using spatial and temporal Fourier
analysis. In Proc. IEEE International Conference on Acoustics, Speech
and Signal Processing, pages 477{480, 1994.
[YI95a] H. Yehia and F. Itakura. Analysis of a technique to measure the vocal-
tract cross-sectional area based on the impulse response at the lips. SP-94
107, Institute of Electronics, Information and Communication Engineers,
1995.
[YI95b] H. Yehia and F. Itakura. Combining dynamic and acoustic constraints in
the speech production inverse problem. SP-95 13, Institute of Electronics,
Information and Communication Engineers, 1995.
[YI96] H. Yehia and F. Itakura. A method to combine acoustical and mor-
phological constraints in the speech production inverse problem. Speech
Communication, 18(2):151{174, 1996.
[YT97] H. Yehia and M. Tiede. A parametric three-dimensional model of the
vocal-tract based on MRI data. To appear in Proc. IEEE International
Conference on Acoustics, Speech and Signal Processing, 1997.
[YTI95] H. Yehia, K. Takeda, and F. Itakura. A vocal-tract area function trajec-
tory representation oriented to the speech production inverse problem.
In Proceedings of the 1995 Autumn Meeting of the Acoustical Society of
Japan, pages 339{340, 1995.
BIBLIOGRAPHY 133
[YTI96] H. Yehia, K. Takeda, and F. Itakura. An acoustically oriented vocal-

tract model. IEICE Transactions on Information and Systems, E79-
D(8):1198{1208, 1996.
[YTVBI96] H. Yehia, M. K. Tiede, E. Vatikiotis-Bateson, and F. Itakura. Apply-
ing morphological constraints to estimate three-dimensional vocal-tract
shapes from partial prole and acoustic information. To appear in The
Proceedings of the 1996 Joint Meeting of the Acoustical Society of Amer-
ica and the Acoustical Society of Japan, 1996.
List of Publications
Journal Papers
H. Yehia and F. Itakura, \A method to combine acoustical and morphological con-
straints in the speech production inverse problem," Speech Communication,
18(2):151{174, 1996.
H. Yehia, K. Takeda, and F. Itakura, \An acoustically oriented vocal-tract model,"
IEICE Transactions on Information and Systems, E79-D(8):1198{1208, 1996.
H. Yehia, K. Takeda, and F. Itakura, \An analysis of the acoustic-to-articulatory
mapping during speech under morphological and continuity constraints," sub-
mitted to Speech Communication.
International Conferences
H. Yehia and F. Itakura, \Determination of human vocal-tract dynamic geome-
try from formant trajectories using spatial and temporal Fourier analysis," In
Proc. IEEE International Conference on Acoustics, Speech and Signal Process-
ing, pages 477{480, 1994.
H. Yehia, M. Honda, and F. Itakura, \Acoustic measurements of the vocal-tract area
function: sensitivity analysis and experiments," In Proc. IEEE International
Conference on Acoustics, Speech and Signal Processing, pages 652{655, 1995.
M. K. Tiede, H. Yehia, and E. Vatikiotis-Bateson, \A shape-based approach to vocal
tract area function estimation," In Proceedings of the 1st ESCA Tutorial and
Research Workshop on Speech Production Modeling & 4th Speech Production
Seminar, pages 41{44, 1996.
134
PUBLICATIONS 135
E. Vatikiotis-Bateson, K. G. Munhall M. Hirayama, Y. Kasahara, and H. Yehia,

\Physiology-Based Synthesis of Audiovisual Speech," In Proceedings of the 1st
ESCA Tutorial and Research Workshop on Speech Production Modeling & 4th
Speech Production Seminar, pages 241{244, 1996.
E. Vatikiotis-Bateson, K. G. Munhall, Y. Kasahara F. Garcia, and H. Yehia, \Char-
acterizing audiovisual information during speech," In Proceedings of the Inter-
national Conference on Spoken Language Processing, pages 1485{1488, 1996.
H. Yehia, M. K. Tiede, E. Vatikiotis-Bateson, and F. Itakura, \Applying morpho-
logical constraints to estimate three-dimensional vocal-tract shapes from partial
prole and acoustic information," In The Proceedings of the 1996 Joint Meeting
of the Acoustical Society of America and the Acoustical Society of Japan, pages
855{860, 1996.
T. Taniguchi, H. Yehia, S. Kajita, T. Takeda and F. Itakura, \On the problems of
applying Bell's blind separation to real environments," In The Proceedings of
the 1996 Joint Meeting of the Acoustical Society of America and the Acoustical
Society of Japan, pages 1257{1260, 1996.
M. K. Tiede and H. Yehia, \A shape-based approach to vocal tract area function
estimation," In The Proceedings of the 1996 Joint Meeting of the Acoustical
Society of America and the Acoustical Society of Japan, pages 861{866, 1996.
E. Vatikiotis-Bateson and H. Yehia, \Synthesizing audiovisual speech from physio-
logical signals," In The Proceedings of the 1996 Joint Meeting of the Acoustical
Society of America and the Acoustical Society of Japan, pages 811{816, 1996.
H. Yehia and M. Tiede, \A parametric three-dimensional model of the vocal-tract
based on MRI data," To appear in The Proceedings of ICASSP-97, 1997.
Technical Meetings and Symposia

H. Yehia and F. Itakura, \A method to estimate LPC parameters exploring frame
segmentation," In Proceedings of the 1992 Spring Meeting of the Acoustical
Society of Japan, pages 305{306, 1992.
136 PUBLICATIONS
H. Yehia and F. Itakura, \Dynamic vocal-tract shape determination from formant

frequencies using two-dimensional fourier analysis," SP-92 143, Institute of Elec-
tronics, Information and Communication Engineers, pages 49{56, 1993.
H. Yehia and F. Itakura, \Variational and perturbation analysis applied to determi-
nation of vocal-tract formants," In Proceedings of the 1993 Autumn Meeting of
the Acoustical Society of Japan, pages 285{286, 1993.
H. Yehia and F. Itakura, \Analysis of a technique to measure the vocal-tract cross-
sectional area based on the impulse response at the lips," SP-94 107, Institute
of Electronics, Information and Communication Engineers, pages 69{76, 1995.
H. Yehia, M. Honda, and F. Itakura, \Acoustical measurements of the vocal-tract
area function: System modelling and experimental results," In Proceedings of
the 1995 Spring Meeting of the Acoustical Society of Japan, pages 305{306, 1995.
H. Yehia and F. Itakura, \Combining dynamic and acoustic constraints in the speech
production inverse problem," SP-95 13, Institute of Electronics, Information and
Communication Engineers, pages 23{30, 1995.
H. Yehia, K. Takeda, and F. Itakura, \A vocal-tract area function trajectory rep-
resentation oriented to the speech production inverse problem," In Proceedings
of the 1995 Autumn Meeting of the Acoustical Society of Japan, pages 339{340,
1995.
I. Masuda, H. Yehia, and H. Kawahara, \A study of a method for signal separa-
tion by spectral interporation using bartlett window properties (in Japanese),"
EA-96 29, Institute of Electronics, Information and Communication Engineers,
pages 17{24 1996.
E. Vatikiotis-Bateson and H. Yehia, \Physiological modeling of facial motion during
speech," H-96 65, The Acoustical Society of Japan, 1996.
H. Yehia, \Vocal-tract prole to area function mapping taking formant frequency
constraints into account," In Proceedings of the 1996 Autumn Meeting of the
Acoustical Society of Japan, pages 321{322 1996.

A Study On The Speech Acoustic-To-Articulatory Mapping Using Morphological Constraints

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A Study On The Speech Acoustic-To-Articulatory Mapping Using Morphological Constraints

Diunggah oleh

Hak Cipta:

Format Tersedia

A STUDY ON THE SPEECH

ACOUSTIC-TO-ARTICULATORY MAPPING USING

The speech production process is the result of the combination of articulatory

Higher levels of compression, as well as a clearer representation of meaningful

 Combination of acoustic, morphological and dynamic information to solve the

Formant Wall and viscous

2.1.1 Sampling vocal-tract pro les

Digitized Profile Sampled Points Regenerated Profile

2.1.2 From pro les to midsagittal distances

Midsagittal Distance along the Vocal−Tract

2.1.3 From midsagittal distances to area function

2.1.4 Log-area function

1 Speech Signal PB1549 1 Speech Signal PB1559

Midsag. Dist. (cm)

2.2 Computation of the formant frequencies

2.2.1 Lossless tube

At this point, it is interesting to return to the works of Schroeder (1967) and

2.2.2 Lossy tube

2.2.3 Numerical determination of formant frequencies

2.2.4 Comparison of lossless and lossy models

1 Speech Signal PB1518 1 Speech Signal PB1538

Midsag. Dist. (cm)

Midsagittal Distances PB1518 Midsagittal Distances PB1538

3.1 Fourier Analysis

3.1.1 Truncation E ects

N=4 N=3 N=2

3.1.2 Formant frequencies as functions of Fourier coe-

Formant Frequencies versus Fourier Coefficients

3.2 Statistical Analysis

3.2.1 Principal Component Analysis

Figure 3.5: Eigenvalues of the log-area covariance matrix.

Dimensionality and degrees of freedom

Area Function: Vowel /i/

Vocal−Tract Length Trajectory

5 Mean Difference: −7%

the normalized eigenvectors being given by the columns of the matrix

Principal components and formant frequencies

3.2.2 Independent component analysis

maximum and minimum values of each of the N = 5 components of the parametrized

(a) Parametric Space

3.2.3 Singular Value Decomposition

(a) Formant Dist. (b) ICA

Number of Samples / All Samples

In such a case, once there is an ensemble of vectors g and available, a minimum

y ' T y + 0y ; (3.29)

Basis Vector #1 Basis Vector #4

3.3 Temporal Analysis

First Acoustic Component

Second Acoustic Component

Articulatory Trajectories Acoustic Trajectories

the objective is to take sequences of p (e.g. p = 10) frames contained in a sentence,

vectors as it should be to de ne a classic covariance matrix. Nevertheless, C contains information

Eigenvalues of the "Covariance" Matrix (Lambda)

Figure 3.15: Eigenvalues of the \covariance" matrix of sequences of parametrized

Original Area PCA Fourier

(d) (e) (f)

0 0.1 0.2 0 0.1 0.2 0 0.1 0.2

Figure 4.1: Representation of the one-to-one relationship between the N M dimensional

The non-linear characteristic is inherent in the process of speech generation. How-

4.1 Isolated frames

4.1.1 Representing Morphological Constraints

First Formant Cost Function

First Acoustic Component Cost Function

Combination of acoustic, morphological and dynamic information to solve the

2.1.1 Sampling vocal-tract proles

2.1.2 From proles to midsagittal distances

3.1.1 Truncation Eects

3.1.2 Formant frequencies as functions of Fourier coe-

y ' T y + 0y ; (3.29)

vectors as it should be to dene a classic covariance matrix. Nevertheless, C contains information

H is a positive denite matrix which contains information about the morphology of

is dened, and is a positive denite matrix.

a( ) = h; (4.11)

and, for the particular case of H being an identity matrix,

i ' iVt + 0 ; (4.45)

Y ' T y + 0y : (4.46)

In this study, a method to combine dierent pieces of information in a restricted