Anda di halaman 1dari 3

VOCODERS

The coders (encoding/decoding algorithms) we have studied so far could have


been applied to any signals, not just speech. Vocoders are specifically voice
coders.
Phase Insensitivity
Phase is a shift in the time argument, inside a function of time. Suppose
we strike a piano key and generate a roughly sinusoidal sound cos(t), with
= 2nt. If we wait sufficient time to generate a phase shift /2 and then
strike another key, with sound cos(2t + /2), we generate a waveform like
the solid line in Figure 1. This waveform is the sum cos(t) + cos(2t + /2).
If we did not wait before striking the second note (1/4 msec, in Figure 1), our
waveform would be cos(t)+cos(2t). But perceptually, the two notes would
sound the same, even though in actuality they would be shifted in phase.
Hence, if we can get the energy spectrum right - where we hear loudness and
quiet then we dont really have to worry about the exact waveform.

Figure 1: Solid line: Superposition of two cosines, with a phase shift. Dashed
line: No phase shift. The wave is very different, yet the sound is the same,
perceptually.

Channel Vocoder
Sub band filtering is the process of applying a bank of band-pass filters
to the analog signal, thus actually carrying out the frequency decomposition
indicated in a Fourier analysis. Sub band coding is the process of making use
of the information derived from this filtering to achieve better compression.
Vocoders can operate at low bitrates, 1-2 kbps. To do so, a channel vocoder

1
first applies a filter bank to separate out the different frequency components,
as in Figure 2. However, as we saw above, only the energy is important,
so first the waveform is rectified to its absolute value. The filter bank
derives relative power levels for each frequency range. A subband coder
would not rectify the signal and would use wider frequency bands. A channel
vocoder also analyzes the signal to determine the general pitch of the speech
-low (bass), or high (tenor) - and also the excitation of the speech. Speech
excitation is mainly concerned with whether a sound is voiced or unvoiced.
A sound is unvoiced if its signal simply looks like noise: the sounds s and!
are unvoiced. Sounds such as the vowels a, e, and 0 are voiced, and their
waveform looks periodic. The 0 at the end of the word audio in Figure 13.1
is fairly periodic. During a vowel sound, air is forced through the vocal cords
in a stream of regular, short puffs, occurring at the rate of 75-150 pulses per
second for men and 150 250 per second for women.
Because voiced sounds can be approximated by sinusoids, a periodic pulse

Figure 2: Channel Vocoder

generator recreates voiced sounds. Since unvoiced sounds are noise-like, a


pseudo-noise generator is applied, and all values are scaled by the energy
estimates given by the band-pass filter set. A channel vocoder can achieve
an intelligible but synthetic voice using 2,400 bps.

2
Formant Vocoder
It turns out that not all frequencies present in speech are equally repre-
sented. Instead, only certain frequencies show up strongly, and others are
weak. This is a direct consequence of how speech sounds are formed, by reso-
nance in only a few chambers of the mouth, throat, and nose. The important
frequency peaks are called formants [2]. Figure 3. shows how this appears:
only a few, usually just four or so, peaks of energy at certain frequencies are
present. However, just where the peaks occur changes in time, as speech con-
tinues. For example, two different vowel sounds would activate different sets
of formants - this reflects the different vocal tract configurations necessary to
form each vowel. Usually, a small segment of speech is analyzed, say 1040
msec, and formants are found. A Formant Vocoder works by encoding only
the most important frequencies. Formant vocoders can produce reasonably
intelligible speech at only 1,000 bps.

Figure 3: The solid line shows frequencies present in the first 40 msec of
the speech sample of digital speech signal. The dashed line shows that while
similar frequencies are still present one second later, these frequencies have
shifted.

Anda mungkin juga menyukai