Anda di halaman 1dari 8


Steven Keyes
24.915 Final Paper
16 December 2013

Comparison of Acoustic Models of Overtone Singing

1. Introduction
Overtone singing, also called throat singing, is a singing technique that produces the
perceptual effect of a singer chanting a low drone note as well as a high, whistle-like melody.
This technique is used in a variety of regions and cultures and is common in the music of Tuva,
Mongolia, and Tibet among others. Although it is sometimes confusingly described as singing
two notes at once, the production of overtone singing can be described by a number of models
incorporating vocal cavity resonances and voicing production. In this paper, we will compare
and summarize some of these models.
The nature of overtone singing is that the singer first produces a voicing that is low and
rich in harmonics. These harmonics are components of the waveform that are integer multiples
of the fundamental frequencyfor example, if the fundamental or first harmonic of a waveform
is 150 Hz, then the second harmonic will be at 300 Hz, the third at 450 Hz, and so on such that
the th harmonic is at 150 Hz. After producing this tone, overtone singers then manipulate the
resonances of their vocal tract to enhance a specific, high frequency harmonic. At high
frequencies, listeners perceive the harmonic frequencies as being close together, so, for example,
the 6th through 12th harmonics correspond to several notes of a scale spanning 1-2 octaves, which
gives singers potential for generating melodies. This harmonic is enhanced so specifically and
prominently, is a few octaves above the fundamental, and may change over a range of several
different harmonics while the fundamental remains fixed, which may contribute to why listeners
perceive this as a second tone.

2. Model Overview
In general, we will rely on a source-filter model of the vocal tract. The vocal folds,
vibrating in some mode chosen by the singer, produce a source waveform at the start of the vocal
tract. The rest of the vocal tract can be modeled as a resonator, which emphasizes or deemphasizes certain formant frequencies of the source signal. A simple approximation of the
vocal tract is a tube or a series of tubes connected in series with open or closed ends depending
on the location of constrictions in the throat and mouth. The nasal cavity adds poles and zeroes to
this transfer function, as mentioned later. Finally, in addition to resonances of this system,
various factors dampen the system, increasing the bandwidth of resonant poles.

3. Effects of Formants in Overtone Singing

Phoneticians posit various mechanisms as to why the emphasized harmonic in overtone
singing is so prominent. Based on a case study of an overtone singer, Bloothooft et al (1992)
developed formant spectra for singing a range of overtone frequencies. From these spectra,
Bloothooft describes 2 modes of throat singing. In mode 1, for harmonics less than 800 Hz, he
notes a dip in the formant spectra around 400 Hz, indicating nasalization. He suggests this helps
separate the overtone resonance from the lower harmonics.
For this mode, Bloothooft elaborated that two possible mechanisms could give rise to this
effect. In one, the harmonic emphasized by the singer, roughly between 500 and 800 Hz, is the
first formant, and the lower resonance at 200-350 Hz corresponds to the nasal pole; the dip at
400 Hz corresponds to the nasal zero. To accomplish this method, the singer could articulate a
nasalized back vowel and manipulate jaw height and lip roundedness along the //-// vowel
range. This would change the length of the vocal tract, allowing the singer to manipulate the first
formant frequency.
Alternatively, the emphasized harmonic could be the second formant. For this
explanation, the singer could keep the lips and jaw fixed but move the tongue slightly forward
and back to manipulate the second formant frequency. Based on observations of the mouth of the
singer, Bloothooft concluded that the first method was more likely.

In mode 2, in which singers produce harmonics greater than 800 Hz, no nasal dip at 400
Hz is produced. As mode 1 only corresponds to about 3 harmonics, I will mainly consider mode
2 for other examples. Bloothooft suggests there may be a nasal zero around 800-1000 Hz, but the
effect is not very strong. If this were the case, the corresponding nasal pole could be conflated
with the first formant. In this mode, Bloothooft also suggests that the second formant combines
with the third formant for a prominent peak. Mouth shapes that lend to convergence of F2 and F3
are mechanisms that support this explanation.
Chernov and Maslov (1987, 1991), using x-ray pictures to study the shape of the mouth
when performing overtone singing, found tongue positions that resembled the articulation of /l/
and /li/, with a retroflex tongue position, and Hai (1991) confirms this shape. This tongue
position implies the mouth may be divided into a front cavity and a back cavity, each with a
resonance that contributes to the final waveform. In fact, many studies of overtone singing
suggest that the combination of these cavities, tuned such that they have the same resonance,
may contribute to the prominence of the emphasized harmonic.
For example, Bloothooft suggests that overtone singing involves a retroflex /r/-like mouth
shape. This is support for the combination of F2 and F3 because r-colored vowels have lower
F3s. This can be seen in Figure 1, which compares a recording of a regular vowel and r-colored

Figure 1: The formants of a regular vowel [] and r-colored vowel [] demonstrate how F3 is lowered when the
tongue moves upward into a retroflex position. The lines, from bottom to top, are F1, F2, F3, and F4 of a 2005
recording demonstrating r-colored vowels, measured using Praat. Formants other than F3 remain in approximately
the same position.

Lowering F3 to approach the F2 frequency by positioning the tongue in a retroflex

position could account for the production of throat singing, especially if the lips are rounded and
protruded, increasing the size of the cavity in front of the constriction and thus further decreasing
the F3 frequency. This lowering effect is achieved by positioning the tongue near an appropriate
node, as explained by Stevens (1989).
Furthermore, this effect is observable when overtone singers transition from a regular
sung vowel to a overtone sung vowel, as shown in Figure 2.

Figure 2: In this excerpt from Borbanngadyr, a piece performed by a Tuvan-style overtone singer, the singer starts
by singing a //-like vowel but then transitions to singing in overtone style. F1, F2, and F3 are visible at the
beginning of the clip, but in the middle F2 and F3 clearly converge by F2 raising and F3 lowering. They form a
combined peak roughly at 1750 Hz, the 12th partial of the measured fundamental.

Bloothooft as well as Klingholz (1993), who performed another case study on a recording
of overtone singing, also observed that overtone singers also realize their fundamental frequency
in a very stable manner with respect to pitch, and they suggest this helps then align their
formants with the harmonics of their voice. Alternatively, they suggest this allows another
parameter of creativity because modulating their pitch, such as by vibrato, would cause a
corresponding effect on the intensity of the harmonics as they oscillate around the resonant point
of the vocal tract.
Also, Klingholz pointed out that the bandwidth of the formant used by the singer relative
to its amplitude is very narrow. A normal formant spans multiple harmonics, but the emphasized
formant of overtone singers generally spans mostly a single harmonic. Klingholz claims this is
because singers tenses the cheeks, reducing damping by as much as 40%. Also, a small mouth

opening further reduces damping. Finally, the formant is very prominent for other reasons
discussed earlier, such as the combination of F2 and F3, further increasing amplitude relative to
its bandwidth. This can be seen in a wide-window spectrogram of overtone singing, as in Figure

Figure 3: This wide window (15 ms) spectrogram makes harmonics easily observable as the lines that span the
graph, starting with the 1st harmonic, the fundamental. In this excerpt from Borbanngadyr, the overtone singer
creates a melody by alternating between the 12th, 10th, and 9th harmonics. F1 is visible as the slight darkening on the
1st and 2nd harmonics. The overtone harmonic, however, is very prominent and narrow; it significantly darkens
roughly one harmonic at a time. Listeners perceive this as a distinct tone.

4. Summary of sources for source filter model

In addition to the formant strength, the source signal has an effect on the distinct sound of
overtone singing, and there is some debate on the exact production of the source signal. Large
and Murry (1981) suggested Tibetian chant was in vocal fry register. However, Bloothooft

disagrees for their case study, writing that the signal of the singer he measured was too regular
and periodic.
Instead, Bloothooft suggests the use of the modal register but with a relatively long
glottal closure. This would result in muscular hypertension in the pharyngeal region, which was
observed by Dmitriev et al (1983) in Touvinian singers. Bloothooft also qualitatively observed
damping in the voice signal, which he though corresponded with the effects of long glottal
closure. Finally, Dmitriev at al (1983) and further Chernov and Maslov (1987, 1991) found the
false vocal folds may have some involvement in Touvinian double-voice singing in additional
to the glottis.

5. Conclusion
In general, the source filter model and formant model provide a strong explanation for the
production of overtone singing. The convergence of F2 and F3 as well as other factors related to
damping are evident in singers of this style.

Bloothooft G, Bringmann E, Van Cappellen M, Van Luipen JB, Thomassen Koen P. (1992)
Acoustics and perception of overtone singing. Journal of the Acoustic Society of
America 1992;92(4):182735.
Chernov, B., and Maslov, V. (1987). Larynx double sound generator, Proc. XI Congress of
Phonetic Sciences Tallinn, Vol. 6, pp. 40-43.
Chernov, B., and Maslov, V. (1991). "Phonation without text. From whistle to speech," Proc. XII
Congress of Phonetic Sciences, Aix-en-Provence, Vol. 3, pp. 370-373.
Dmitriev, L. B., Chernov, B. P., and Maslov, V. T. (1983). "Functioning of the Voice
Mechanism in Double-Voice Touvinian Singing," Fol. Phoniatr. 35, 193-197.
Hai,T . Q. (1991)". New experiments about the Overtone Singing Style," Proc. Conference New
ways of the voice, Becancon, 61.
Huun-Huur-Tu. (1994) Borbanngadyr Recording.

Klingholz, F. (1993) Overtone Singing: Productive Mechanisms and Acoustic Data. Journal of
Voice Vol. 7, No. 2, pp. 118-122
Large, J., and Murry, T. (1981) . Observations on the nature of Tibetan chant, J. Exp. Res.
Singing 5 , 22-28.
Regular and r-colored vowels (2005) Recording submitted by user Denelson83.
Stevens, K.N. (1989) On the Quantal Nature of Speech, Journal of Phonetics 17, 3-45