Linear Pulse Code Modulation (LPCM) dominates all other audio cod-
ing schemes for high-quality original recording for a number of rea-
sons. First, it is conceptually simple and has had the most research and
development compared to other methods of coding. Second, for pro-
fessionals who manipulate signals, the mathematics of digital signal
processing such as equalization is most easily performed on linearly
represented signals. Third, it has been shown that the theory of LPCM is
just that, linear, that is, distortion free in its theoretical foundation even
to well below the smallest signal that can be represented by the least
significant bit, when the proper precautions are taken.
As new media come onto the scene, producers may have a choice of
word length for representing the audio. Digital Versatile Disc Audio
(DVD-A) offers 20- and 24-bit word lengths in addition to 16 bit, and
Meridian Lossless Packing allows for one-bit increments from 16 to
24 bits. Once again, the producer is faced with a choice, just as in the
choice of sample rate discussed in the previous appendix. While the
marketing race may dictate 24-bit audio, reality is, as always, a bit more
complicated. Word length is important to professionals because it is one
of the three contenders, along with sample rate, and number of audio
channels, for space on a medium and transfer rate on and off a medium.
Conversion
Word length in LPCM delivers the dynamic range of the system, just
as sample rate delivers the frequency range. It is a familiar story, but
215
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Appendix 2
there are some nuances that may not be widely known. While the
number for dynamic range is roughly 6 dB for each bit, to be precise
one has to examine the quantization process in a little more detail. For
each short snippet of time, at the sample rate, a device at the heart
of digital audio called a quantizer picks the closest “bin” to represent
the amplitude of the signal for that sample from one of an ascending
series of bins, and assigns a number to it. In LPCM, all the bins have
the same “height.” There is a likely residual error, because the quant-
izer cannot represent a number like 1/3 of a bin: either it’s in this bin
or the next, with nothing in between. Only a few of the samples will
probably fall exactly on a bin height; for all other samples there is a
“quantization error.”
The name given to this error in the past has sometimes been “quant-
ization noise.” That’s a lousy term because once you’ve heard it, you’d
never call it a mere “noise.” A better name is “quantization distortion,”
which comes closer to offering the correct flavor to this problem—a
distortion of a type unique to digital audio. A low-level sine wave just
barely taller than one bin crosses over from that bin to the next and
gets converted as two bin values, alternating with each other, at the
frequency of the sine wave. At the other end of the chain, upon digital-
to-analog conversion, reconstruction of the analog output produces
a square wave, a terrifically severe distortion of the input sine wave.
Even this simplified case doesn’t tell the whole story; the sheer nasti-
ness of quantization distortion is underestimated because at least here
the sine wave is converted to a square wave, and a sine wave and
a square wave of the same frequency are related by the fact that the
square wave has added harmonics (1/3 of the level of the fundamental of
3rd harmonic, 1/5 of 5th harmonic, 1/7 of 7th…). Quantization distortion
in many cases is worse, as inharmonic frequencies as well as harmonic
ones are added to the original through the process of quantization. Also
it should be pointed out that once quantization distortion has occurred
on the conversion from analog to digital, there is no method to reduce
the distortion: the signal has been irrevocably distorted.
216
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
that once would not have triggered any conversion because they
failed to cross from one bin to the next are now distinguished. This
is because, with the added noise, the likelihood that tones lower than
one bit cross the threshold from one bin to the next is a function of the
signal plus the dither noise, and the noise is agitating the amplitude of
the signal from sample to sample up and down. On the average of
multiple samples, the tone is converted, and audible, even below the
noise floor.
The type of dither that is best for audio is widely debated, with some
digital audio workstations offering a variety of types selectable by the
producer (who sometimes seems in today’s world to be cursed with
the variety of choices to be made!). One well researched dither noise
that overcomes quantization distortion, and also noise modulation
(which is a problem with other dither types), while offering the least
additional measured noise, is white noise having a triangular prob-
ability density function at a peak-to-peak level of ⫾1 least significant
bit. This is a fancy way of saying that for any one sample, the dither
will fall in the range of ⫾1 LSB around zero, and over time the prob-
ability for any one sample being at a particular level follows a triangle
function. The jargon for this is “TPDF” dither. But there is a noise pen-
alty for dither: it exchanges distortion for noise, where the noise is a
more benign “distortion” of the original signal than is the quantization
distortion.
So every LPCM digital audio system requires dither, for low distor-
tion. Often in the past this dither has been supplied accidentally, from
analog noise, or even room noise, occurring before conversion. As the
quality of equipment before the quantizer improves, and rooms get
quieter to accommodate digital recording, dither has become the greatest
limitation on signal-to-noise ratio of digital audio systems. The amount
of noise added for TPDF dither is 4.77 dB, compared to the theoretical
noise level of a perfect quantizer without dither.
217
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Appendix 2
Dynamic Range
So, to dot all the “i’s” and cross all the “t’s,” the dynamic range of a
LPCM digital audio ADC conversion, performed with TPDF white-noise
dither and no pre- or de-emphasis, is:
and approximately
6n – 3 dB
218
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
tape levels. Also note that there are special requirements for multichan-
nel dither, described by Robert A. Wannamaker in “Efficient Generation
of Multichannel Dither Signals,” JAES, Vol. 52, No. 6, 2004 June,
pp. 579–586.
For high playback levels, the noise floor of 16-bit digital audio with
a TPDF, white-noise floor, intrudes into the most sensitive region
of human hearing, between 1 and 5 kHz. Thus action is called for to
reduce noise, and there are several ways to do this. But first, let us
think about whether we are in fact getting 16-bit performance from
today’s recordings.
Large film productions are using mostly 16-bit word length for their
recordings. Let us just consider the noise implications on one gen-
eration of a multiple generation system, the stage of premixes sum-
ming to the final mix. Perhaps eight premixes are used, and “Zero
Reference” mixes are the order of the day. This means that most of
the heavy work has already been done in the pre-mixing, allowing
the final mix to be a simple combination of the premixes, summed
together 1:1:1… . Now eight noise sources that are from different noise
generators add together in a random manner, with double the number
of sources resulting in 3 dB worse noise. Therefore, for our eight pre-
mix to final mix example, the final mix will be 9 dB noisier than one
of the source channels (1:2 channels makes it 3 dB worse, 2:4 another
3 dB, and 4:8 another 3 dB, for a total of 9 dB). Our 76 dB weighted sig-
nal-to-noise ratio from Table A2-2 for one source channel has become
67 dB, and the actual performance heard in theaters is 14.5 bits!
Similar things happened in music recording on 24-track analog
machines, mixed down to 2-track. Noise of the 24-tracks is summed
into the two, albeit rarely at equal gain. Thus the noise performance
of the 2-track is probably thoroughly swamped by noise from the
24-track source, so long as the two use similar technology.
These examples of noise summing in both analog and digital record-
ing systems illustrate one principle that must not be overlooked in
the ongoing debate about the number of bits of word length to use
for a given purpose: professionals need a longer word length than
contained on release media, because they use more source channels to
sum to fewer output channels, in most cases. Only live classical record-
ing direct to 2-track digital for release does not require any longer
word length, and then, if you want to be able to adjust the levels
between recording and release, you should have a longer word length
too. So there is virtually no professional setting that does not require
a longer word length than release media, just to get up to the quality
level implied by the number of bits of word length on the release.
219
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Appendix 2
Actual Performance
Today’s conversion technology has surpassed the 16-bit level. On the
other hand, I have measured a “24-bit” equalizer and its dynamic range
was 95 dB, just 16-bit performance. So what’s the difference between
“marketing bits” and real ones? The dynamic range determines the real
word length, and how close a particular part comes to the range implied
by the number of bits, not the top of the data page that screams 24 bit.
As an example, here are two of the widest dynamic range converters
on the market. One ADC converter chip has 123 dB dynamic range,
A weighted. A corresponding DAC also has 123 dB dynamic range. Both
specifications are “typical,” not worst case which is some 6 dB worse,
so these are not a basis for specifying a finished unit made with
them. Each of the ADC and DAC actually performs practically at 20.5-
bit dynamic range (although it is A weighting that delivers that extra
1/2 bit). For the pair, the noise power adds, and the total will be 120 dB
dynamic range A weighted, or call it 20-bit performance. By the way,
this “matching” of dynamic range between ADC and DAC is not very
common.
There are ways to get “beyond the state of the art.” By paralleling mul-
tiple converters, the signal is the same in all the converters, but the
noise is (supposedly) random among them. By summing the outputs
of paralleled converters together, the contribution of the signal grows
faster (twice as much in-phase signal is 6 dB) than the contribution of
the noise (twice as much noise is 3 dB, so long as the noises being
added are uncorrelated), so noise is reduced compared to the signal
and dynamic range is increased. Using our example of the premixes
in reverse, eight each of the ADCs and DACs, paralleled, can theoret-
ically increase the dynamic range by 9 dB, or 1.5 bits. So the 20-bit pair
becomes 21.5 bits, or 129 dB, if you’re willing to put up with the cost
and power of 8 converters for each of the ADCs and DACs in parallel.
220
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
140 dB with the input of the ADC shorted, showing that the conversion
noise is not contributing to the output significantly. An example of an
auto-ranging microphone preamplifier with a clever matching scheme
so that there are no glitches is the StageTec TrueMatch model with
153 dB A weighted specified dynamic range, so this goes even beyond
the ability of AES 3 to represent.
221
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Appendix 2
222
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
1
Stanley P. Lipshitz, et. al., “Dithered Noise Shapers and Recursive Digital Filters,”
J. Audio Eng. Soc., Vol. 52, No. 11, 2004 November, pp. 1124–1141.
223
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Appendix 2
224
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
than 30 years ago, to 250, 320, 355 and even higher reference levels,
all in nWb/m, before passing into obsolescence. This outlook takes the
headroom as constant, that is, keeping reference level a certain num-
ber of decibels below a specific distortion as tape improves, such as
⫺16 or ⫺18 dB re the point of reaching 3% THD. This has the effect of
keeping distortion constant and improving the signal-to-noise ratio as
time goes by.
I think the difference in these philosophies is due to the differences in
the industries. Film and television wanted a constant reference level
over time so that things like sound effects libraries had interchange-
able levels over a long period of time, with newer recordings simply
having more potential headroom than older ones. Also, film practice
means that every time you put up a legacy recording you don’t have
to recalibrate the machine. There is more emphasis placed on inter-
changeability than in the music industry, where it is not considered a
big burden to realign the playback machine each time an older tape is
put up.
225
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman
Sound for Film and Television, Third Edtion - Tomlinson Holman Copyright 2010 Tomlinson Holman