NAB - 1 - 15 Digital Audio Standards and Practices

C H A P T E R
1.15
Digital Audio Standards and Practices
CHIP MORGAN
CMBE El Dorado Hills, California
RANDALL HOFFNER
ABC, Inc. New York, New York
Updated for the 10th Edition by
TIM CARROLL
Linear Acoustic Inc., Lancaster, Pennsylvania
INTRODUCTION
Digital audio technology has supplanted analog audio technology in U.S. television and radio production and broadcast facilities. Like digital video, digital audio offers many advantages in production, editing, distribution, and routing. Digital audio is remarkably robust and far less susceptible to degradation from hum, noise, level anomalies, and stereo phase errors than analog audio. Each analog audio recording generation and processing step adds its own measure of noise to the signal, but in the digital domain audio is not subject to such noise buildup. However, perceptual coding artifacts can be a problem; see Chapter 3.7, Digital Audio Data Compression, for additional information. Digital audio may be stored on magnetic, optical, or magneto-optical discs and in solid state memories. When audio samples have been reduced to a series of numbers, processing and manipulation become largely mathematical operations, easily accommodated by microprocessors. Nonlinear editing is an example of a process that cannot be done in the analog domain. The technology of digital compression creates new economies in the storage and transport of digital audio and permits the broadcast of digital audio within a reasonable segment of spectral bandwidth. The distribution and routing of audio in the digital domain present the broadcaster with new options as well, such as the capability to embed digital audio within a serial digital video signal, facilitating the carriage of video and multiple audio channels on a single coaxial cable. Although digital audio presents its own
unique set of challenges, its advantages far outweigh its disadvantages. Digital audio systems are inherently free of the hum and noise problems that can invade analog audio systems. The nature of the digital domain gives rise to a new set of considerations for the facility planner and designer. Digital audio signals operate in the multiple megahertz frequency domain that video engineers are well acquainted with, raising such considerations as signal reflections and impedance discontinuities. In digital audio system engineering, just as in analog audio system engineering, cognizance of the potential pitfalls to be avoided and the application of good engineering practices will result in facilities that function well. Since the 9th edition of the NAB Engineering Handbook was published, digital audio standards and practices have not seen a dramatic change in their basic definitions, but their use has grown in popularity. Digital audio is no longer a format used only by the largest broadcasters and postproduction facilities; it has come to be relied upon as the only way to handle modern broadcast audio requirements. Digital audio has proven to be as robust and flexible as it was originally designed to be. Routing, distribution, storage, and signal processing have advanced to the point of making it difficult or impossible to find analog versions of these processes with the same features and capabilities. With multichannel and surround sound audio having gained remarkable popularity, the thought of handling these signals in the analog domain quickly becomes overwhelming and the efficiency and consistency of digital audio truly make it
NAB ENGINEERING HANDBOOK Copyright 2007 Focal Press. All rights of reproduction in any form reserved.
219
SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES possible to process multichannel sound with the precision required.1 Some useful new formats and standards have emerged along with necessary revisions to existing standards. For example, new transport methods based on TCP/IP computer networks that, although not yet standardized, are gaining popularity, and the requirement for audio metadata in digital television is presenting new challenges. In addition, new audio data rate reduction (i.e., compression) schemes have found their place in the chain, and better standards have been developed to support them. Proper synchronization of the digital audio plant still seems to be a challenge. It is difficult for audiocentric facilities and staff to realize that they now have timing requirements like their video counterparts, but AES-11 provides accurate guidance that, when followed, results in a properly timed plant and highquality audio. capacity to carry linearly sampled digital audio at bit depths from 16 to 24, data descriptive of such factors as channel status and sample validity, along with parity checking data and user data. Total bit count per sample, including audio and housekeeping, is 32 bits. An ancillary standard, AES5 (discussed later), recommends use of the professional audio sample rate of 48 kHz on AES3, while recognizing the use of sample rates of 44.1 kHz, 32 kHz, and 96 kHz. AES3 carries audio samples using time-division multiplexing, in which samples from each of the two represented audio channels alternate. Data Structure The data carried on the AES/EBU interface is divided into blocks, frames, and subframes. An AES block is constructed of 192 frames, each frame being composed of two subframes, each subframe containing a single audio sample. A subframe begins with a preamble that provides sync information and describes what type of subframe it is, and ends with a validity bit, a user bit, a channel status bit, and a parity bit. The subframe is divided into 32 time slots, each time slot being one sample bit in duration. The first four time slots are filled with a 4-bit preamble. The 24 time slots following the preamble may be filled in one of two ways. As shown in Figure 1.15-1(a),3 an audio sample word of up to 24 bits may fill all the time slots. Figure 1.15-1(b)4 illustrates that the first four time slots of the audio sample word space may be filled with auxiliary bits, which can represent user data or lowquality audio for informational or cueing purposes, for example. In all cases, the audio word is represented least significant bit (LSB) first, most significant bit (MSB) last. If digital audio words of bit depth less than the maximum are represented, the unused bits are set to logic 0. Time slots 28, 29, 30, and 31 are filled with a validity bit (V), a user bit (U), a channel status bit (C), and a parity bit (P), respectively. The subframes are assembled into frames and blocks as shown in Figure 1.15-2.
DIGITAL AUDIO STANDARDS

Following are standards that are relevant to digital audio in broadcast facilities. Many of them have been presented before and remain important parts of standardized digital audio systems. In recent years, there have been significant advances made by the Society of Motion Picture and Television Engineers (SMPTE), and it is necessary to consider these standards along with their Audio Engineering Society (AES) counterparts if they exist.
AES3-2003
AES3-2003 is the AES Recommended Practice for Digital Audio EngineeringSerial Transmission for Two-Channel Linearly Represented Digital Audio Data. This is the baseline standard for digital audio developed by the AES and the European Broadcasting Union (EBU) and is commonly referred to as the AES/ EBU standard. AES3 defines a digital protocol and physical and electrical interfaces for the carriage of two discrete audio channels, accompanied by various housekeeping, status, and user information in a single serial digital bitstream. As its title indicates, AES3 was designed to carry linearly quantized (uncompressed PCM) digital audio. Compressed digital audio may be carried on the IEC 958 digital audio interface.2 IEC 958 is identical to AES3 in protocol, but can have slightly different electrical characteristics for support of consumer electronics. It addresses a professional implementation (AES/EBU) and a consumer implementation (S/PDIF). The AES3 interface has the
It would be virtually impossible to handle multichannel audio to the precision required if the audio were handled via traditional analog means. 2 Since 1997, the IEC document numbering system has added 60000 to the old IEC standard number, so the official number of this standard is now IEC 60958. Because the three-digit number is more widely known, this number will be used descriptively within the chapter.
1
(a)
(b)
FIGURE 1.15-1 Subframe formats: (a) 1620 bit audio
word; (b) 1624 bit audio word.

3 AES31992 (r1997), AES Recommended Practice for Digital Audio EngineeringSerial Transmission Format for Two-Channel Linearly Represented Digital Audio Data, Figure 1. 4 Ibid.
220
CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES
FIGURE 1.15-2
AES3 block and frame structure.
Each subframe begins with one of three preambles. The first subframe in the 192-frame block, a Channel 1 subframe, starts with Preamble Z. All other Channel 1 subframes in the block start with Preamble X. All Channel 2 subframes start with Preamble Y. Figure 1.15-2 represents the last frame of a block and the first two frames of the following block. Subframe 1 of Frame 0, the first subframe of the block, begins with Preamble Z, uniquely identifying the beginning of the block. After the first subframe, the successive subframes are marked by Preamble Y and Preamble X, to identify Channel 2 and Channel 1 subframes, respectively. A frame, consisting of two 32-bit subframes, is made up of 64 bits, and the data rate of the interface signal may be readily calculated by multiplying the sampling rate times 64. In the case of the 48 kHz sample rate, the total data rate of the signal is 64 times 48,000 or 3.072 Mbps. As will be explained later, the interface employs an embedded clock signal that is twice the sample rate, making the actual frequency of this signal about 6.1 MHz. Encoding All time slots except the preambles are encoded using biphase-mark coding to prevent the transmission of long strings of logic 0s or logic 1s on the interface, and thereby minimize the dc component on the transmission line; facilitate rapid clock recovery from the serial data stream; and make the interface insensitive to the polarity of connections. The preambles intentionally violate the rules of biphase-mark coding by differing in at least two states from any valid biphase code to avoid the possibility of other data being mistaken for a preamble. Biphase-mark coding requires a clock that runs at twice the sample rate of the data being transmitted, and each bit that is transmitted is represented by a symbol that is composed of two binary states. Figure 1.15-3 illustrates these relationships. The top sequence of Figure 1.15-3 illustrates the interface clock pulses, running at a speed twice the source coded sample rate. The middle sequence shows
the source coding, which is the series of pulse code modulated (PCM) digital audio samples. The bottom sequence shows how the source coded data is represented in biphase-mark coding. In biphase-mark coding, each source coded bit is represented by a symbol that is composed of two consecutive binary states. The first binary state of a biphase-mark symbol is always different from the second state of the symbol preceding it. A logic 0 is represented in biphase-mark coding by a symbol containing two identical binary states. A logic 1 is represented in biphase-mark coding by a symbol containing two different binary states. This relationship may be seen by examining the first full source coding bit at the left in the figure, which is a logic 1. Note that the duration of this bit is two clock pulses. Because the symbol immediately before it ended with a logic 0, the biphase-mark symbol representing it begins with a logic 1. As the bit to be transmitted is a logic 1, the second state of the biphasemark symbol representing it is different from the first, a logic 0. The second source coded bit to be transmitted is a logic 0. Its first biphase-mark binary state is a logic 1, because the immediately previous state was a logic 0, and the second state is also a logic 1. The fact that the first binary state of a biphase-mark signal is always different from the last binary state of the previous symbol ensures that the signal on the interface does not dwell at either logic 0 or logic 1 for a period longer than two clock pulses. Because biphase-mark coding does not depend on the absolute logic state of the symbols representing the source coded data, but rather on their relative states, the absolute polarity of a biphase-mark coded signal has no effect on the information transmitted, and the interface is insensitive to the polarity of connections. Ancillary Data The last four time slots in a subframe are occupied by various housekeeping and user data. The validity bit (V) indicates whether the audio sample word is suitable for conversion to an analog audio signal. The channel status bit (C) from each subframe is assembled into a sequence spanning the duration of an
221
SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES
FIGURE 1.15-3 AES channel coding.
entire AES3 block, and these 192 bit blocks of channel status data describe a number of aspects of the signal. Examples of channel status data include the length of audio sample words, sampling frequency, sampling frequency scaling flag, number of audio channels in use, emphasis information, consumer or professional interface implemented, audio or data being transmitted on the interface, a variety of other possible information.
The 192-bit channel status bits (per block) are subdivided into 24-byte units. There is a separate channel status block for each audio channel, so channel status may be different for each of the audio channels. User data, or U-bits, may be used in any way desired. The parity bit (P) facilitates the detection of data errors in the subframe by applying even parity, ensuring that time slots 431 carry an even number of logic 1s and logic 0s. Electrical Interface The electrical interface specified by AES3 is a two-wire transformer balanced signal. The AES interface was devised by audio engineers, with the intent of creating a digital audio signal that could be carried on the same balanced, shielded, twisted pair cables and XLR-3 type connectors that are used for analog audio signals. The specified source impedance for AES3 line drivers and the specified input impedance for AES3 line receivers is 110 , which is the approximate characteristic impedance of shielded twisted pair cable as used for analog audio. The permitted signal level on the interface ranges from 27 V peak-to-peak. The balanced, twisted pair electrical interface can give rise to some problems in implementation. XLR type connectors and audio patch panels, for example, are not impedance matched devices. This is not critical when the highest frequency of interest is 20 kHz, but it can cause serious problems when a 6 MHz signal must be passed. These considerations, plus the familiarity of television engineers with unbalanced coaxial trans-
mission of analog video, and the need for higher connector density for a given product size generated the requirement for standardization of an unbalanced, coaxial electrical interface for the AES3 signal. Such an electrical interface is standardized in SMPTE 276M, which describes carriage of the AES/EBU interface on standard 75 video cable using BNC connectors, at a signal level of 1 V peak-to-peak. The fact that the 110 balanced and 75 unbalanced signal formats coexist in many systems frequently presents the requirement to translate between these two signals. Devices to perform such translations are readily available, and SMPTE 276M has an informative annex explaining how to build them. For density and compatibility issues, most modern multichannel audio equipment is being designed to support SMPTE 276M. AES-2id1996 (r2001) is an information document containing guidelines for the use of the AES3 interface. AES-3id2001 is an information document containing descriptive information about the unbalanced coaxial interface for AES3 audio.
AES52003
AES52003 is the AES Recommended Practice for Professional Digital AudioPreferred Sampling Frequencies for Applications Employing Pulse-Code Modulation. This companion document to AES3 contains the recommended digital audio sample rate for signals to be carried on the interface. The professional digital audio sample rate of 48 kHz is recommended, with recognition given to the use of the compact disc sample rate of 44.1 kHz, a low bandwidth sample rate of 32 kHz, and higher bandwidth sampling frequencies, also referred to as Double Rate (62108 kHz) and Quadruple Rate (124216 kHz) for applications requiring a higher bandwidth or more relaxed anti-alias filtering. SMPTE EG 32, engineering guideline on AES/EBU audio emphasis and sample rates for use in television systems, also recommends that the 48 kHz sample rate be used. Variations on these sample rates are encountered. Varispeed operation requires the ability to adjust these sample rates by about +/ 12%, and of
222
CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES course, accommodation to 59.94 Hz video requires operation at 48 kHz/1.001. equipment sample clocks are locked to a single reference; (b) the use of the sample rate clock embedded within the digital audio program signal that is input to the equipment; (c) the use of video, from which a DARS signal is developed; and (d) the use of GPS to reference a DARS generator providing frequency and phase from one-second pulses, as well as time-of-day sample address codes in bytes 1821 of channel status. Methods (a), (b), and (c) are preferred for normal studio practice, as method (b) may increase the timing error between pieces of equipment in a cascaded implementation. The digital audio reference signal is to have the format and electrical configuration of the two-channel AES3 interface, but implementation of only the basic structure of the interface format, where only the preamble is active, is acceptable as a reference signal. A digital audio reference signal may be categorized in one of two grades. A grade 1 reference signal must maintain a long-term frequency accuracy within 61 ppm, whereas a grade 2 reference signal has a tolerance of less than 610 ppm.
AES102003
AES102003 is the AES Recommended Practice for Digital Audio EngineeringSerial Multichannel Audio Digital Interface (MADI). MADI is a multichannel digital audio interface that is based on AES3. It is designed for the carriage of a maximum of 64 audio channels (at 48 kHz sample rate) on a single coaxial cable or optical fiber. MADI preserves the AES3 subframe protocol except for the preamble. A MADI frame is composed of 64 channels, which are analogous to AES3 subframes. Each MADI channel contains 32 time slots, as does an AES3 subframe. The first four time slots contain synchronization data, channel activity status (channel on/off), and other such information. The following 28 time slots are filled in the same way as in an AES3 subframe24 audio bits, followed by a V bit, a U bit, a P bit, and a C bit. The MADI coaxial cable interface is based on the fiber distributed digital interface (FDDI) standardized in ISO 9314, for which chip sets are available. Data is transmitted using non-return-to-zero inverted (NRZI), polarity-free coding and a 45 bit encoding format, in which each channels 32 bits are grouped into 8 words of 4 bits each, and each 4-bit word is then encoded into a 5-bit word. The data rate on the interface is a constant 125 Mbps, with the payload data rate running between approximately 50 and 100 Mbps, depending on the sample rate in use. Sample rates may vary from 32 to 96 kHz +/12.5%. The specified coaxial cable length for the MADI signal is up to 50 m. A standard for carriage on optical fiber is under consideration. MADI finds frequent use in multitrack audio facilities, for example, as an interface between multitrack audio recorders and consoles. It is conceivable that the MADI interface could be transmitted over very long distances, using, for example, a synchronous optical network (SONET) circuit. AES-10id2005 is an information document containing engineering guidelines for the implementation and use of the MADI interface.
AES171998 (r2004)
AES171998 (r2004) is the AES Standard Method for Digital Audio EngineeringMeasurement of Digital Audio Equipment. This standard defines a number of tests and test conditions for specifying digital audio equipment. Many of these tests are substantially the same as those used for testing analog audio equipment, but the unique nature of digital audio dictates that additional tests are necessary beyond those used for analog audio equipment.
AES181996 (r2002)
AES181996 (r2002) is the AES Recommended Practice for Digital Audio EngineeringFormat for the User Data Channel of the AES Digital Audio Interface. This standard describes a method of formatting the user data channels within the AES3 digital audio interface using a packet-based transmission format. This method has gained popularity in some broadcast facilities for carrying nonaudio ancillary data such as song titles and other information. It is critical to note, however, that user and other channel status bits are notoriously unreliable. In an effort to save data space, most storage equipment does not preserve this data and instead generates static values prior to output. If a facility design relies on using this data space, it is imperative to verify that all equipment in the chain supports it.
AES112003
AES112003 is the AES Recommended Practice for Digital Audio EngineeringSynchronization of Digital Audio Equipment in Studio Operations. This document describes a systematic approach to the synchronization of AES3 digital audio signals. Synchronism between two digital audio signals is defined as that state in which the signals have identical frame frequencies, and the timing difference between them is maintained within a recommended tolerance on a sample-by-sample basis. AES11 recommends that each piece of digital audio equipment has an input connector that is dedicated to the reference signal. Four methods of synchronization are proposed: (a) the use of a master digital audio reference signal (DARS), ensuring that all input/output
ATSC A/52B2005 Digital Audio Compression (AC-3) Standard

Digital television broadcasting as described by the Advanced Television Systems Committee (ATSC) standard utilizes the AC-3 digital audio standard. Use of this standard will necessitate the carriage of AC-3
223
SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES compressed digital audio streams between pieces of DTV equipment. An example of this is the interface between an AC-3 encoder and the program data stream multiplexer of a DTV transmission system. The former Annex B of the ATSC AC-3 Digital Audio Standard for digital television broadcast that describes the carriage of compressed AC-3 elementary streams on the IEC 958 digital audio interface has been replaced by IEC 61937. the AES3 format may be carried on the serial digital video interface signal that travels on a single coaxial cable.
SMPTE 276M1995
SMPTE 276M1995 is TelevisionTransmission of AES/EBU Digital Audio Signals over Coaxial Cable. This SMPTE standard defines the unbalanced 75 coaxial cable electrical interface for the AES3 bitstream.
IEC 60958 Digital Audio Interface

IEC 60958 (IEC 958) is logically identical to the AES3 digital audio interface. Electrically, it provides for both the 110 balanced and the 75 unbalanced interfaces. Two versions are described: a consumer version, the Sony/Philips Digital Interface (S/PDIF), in which bit 0 of the channel status word is set at logic 0; and a professional version, the AES/EBU interface, in which bit 0 of the channel status word is set at logic 1. Provision is made in the location of time slots 1227, which are normally used to carry linear 16-bit PCM audio words, to permit some recording equipment to record and play back either linear 16-bit PCM audio or encoded data streams (compressed digital audio). The consumer implementation permits only the 32-bit mode, in which channel 1 and channel 2 subframes are simultaneously employed to carry 32-bit words. The professional implementation permits either the 32-bit mode or the 16-bit mode, in which each subframe carries a 16-bit digital audio word. The consumer implementation may carry either two channels of linear PCM digital audio, or one or more compressed audio bitstreams accompanied by time stamps. The professional implementation may carry two channels of linear PCM digital audio, two sets of compressed audio bitstreams with time stamps, or one channel of linear PCM digital audio and one set of compressed audio bitstreams with time stamps. Note that the consumer implementation may also present output levels that are lower than the specified 1 V peak-to-peak of the professional version, and care is advised when connecting consumer and professional devices.
SMPTE 299M2004
SMPTE 299M2004 is Television24-Bit Digital Audio Format for HDTV Bit-Serial Interface. This standard defines the embedding of AES/EBU digital audio data into the high-definition serial digital video interface specified in SMPTE 292M, Bit Serial Digital Interface for High-Definition Television Systems. This is the high-definition counterpart to SMPTE 272M.
SMPTE 302M2002
SMPTE 302M2002 is TelevisionMapping of AES3 Data into MPEG-2 Transport Stream. This SMPTE standard describes how the 20-bit audio payload of an AES/EBU signal is mapped into an MPEG-2 transport stream in a bit-for-bit accurate manner. This format can be found in most modern MPEG-2 encoders and integrated receiver/decoders (IRDs), and is a method used to carry uncompressed 20-bit PCM audio as well as mezzanine compressed audio such as Dolby E, high-density multiplexed AC-3, and Linear e-squared formats. Although it can be used to carry a single AC-3 stream, it is very inefficient and is incompatible with consumer equipment. IEC 13818 describes the proper manner for multiplexing an AC-3 stream into an MPEG-2 transport stream.
SMPTE 320M1999
SMPTE 320M1999 is TelevisionChannel Assignments and Levels on Multichannel Audio Media. This often-overlooked standard defines proper channel ordering for multichannel audio soundtracks. The standard for television is as follows: 1 = Left front, 2 = Right front, 3 = Center, 4 = Low Frequency Effects (LFE or Subwoofer), 5 = Left surround, 6 = Right surround, 7 = Left or Lt (left total, for matrix surround encoded systems), 8 = Right or Rt (right total, for matrix surround encoded systems). It is possible for film format to differ slightly and the channel ordering is detailed in this specification, but for use within television facilities, film channel formatting should be corrected to match the order shown above.
SMPTE STANDARDS AND RECOMMENDED PRACTICES CONCERNING THE USE OF AES DIGITAL AUDIO IN TELEVISION SYSTEMS SMPTE 272M2004
SMPTE 272M2004 is TelevisionFormatting AES/ EBU Audio and Auxiliary Data into Digital Video Ancillary Data Space. This standard defines the embedding of AES/EBU digital audio into the standard definition serial digital interface specified in SMPTE 259M, 10-Bit 4:2:2 Component and 4fsc NTSC Composite Digital SignalsSerial Digital Interface. With such embedding, up to 16 channels of digital audio in
SMPTE 337M through 341M

These standards are for Formatting of Non-PCM Audio and Data in AES3 Serial Digital Audio Inter-
224
CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES face. They describe standardized methods for carrying compressed audio and other data types within AES3 signals, specifically: 338M2000: Data types 339M2000: Generic data types 340M2000: ATSC A/52 (AC-3) data type 341M2000: Captioning data type engineers. SMPTE 276M and AES3id provide guidance for using the 75 unbalanced AES3 interface. Any high-quality video cable will be found quite acceptable for unbalanced AES3 signals. Those engineers designing facilities dealing only with audio may prefer the use of balanced, shielded, twisted pair cables with XLR-type connectors to carry AES3 signals, but should be aware of the cable length restrictions of this implementation and of the possibility that problems will arise from impedance mismatches at connectors and patch panels. For balanced transmission of AES3 signals, special low capacitance twisted pair cable intended especially for digital audio use is recommended over the standard twisted pair cables used for analog audio, as the higher capacitance of analog audio cable tends to distort square wave signals by rolling off the higher frequency components.
They will become increasingly important as new professional equipment is developed to support compressed audio formats.
SMPTE RP 1552004
SMPTE RP 1552004 is Reference Levels for Digital Audio Systems. This recommended practice describes a reference level lineup signal for use in digital audio recording on digital television tape recorders, and recommends the proper setting for the lineup signal on the recorders digital audio level meters. The reference signal is the digital representation of a 1000 Hz sine wave, the level of which is 20 dB below the system maximum (full-scale digital). Meters are to be calibrated with this signal to indicate 20 dBFS (i.e., 20 dB below full-scale digital).
Digital Audio Distribution

The use of analog video distribution and routing equipment is generally not recommended for AES3 signals, as such equipment may distort AES signal shapes and rise times, adversely affecting the decoding of the signal at the receiving equipment. The spurious high frequency signal energy that may be generated by such distortions of signal shape can cause crosstalk-related bit errors that are difficult to detect and analyze. Distribution of the AES3 signal using high-quality digital audio distribution amplifiers will maintain the proper frequency and phase relationships, as well as signal shapes and rise times.
SMPTE EG 321996
SMPTE EG 321996 is Emphasis of AES/EBU Audio in Television Systems and Preferred Audio Sampling Rate. This engineering guideline recommends that no emphasis be used on digital audio recordings for television applications and that the professional digital audio sample frequency of 48 kHz be used.
System Synchronization
When possible, all digital audio signals should be synchronous in order to avoid objectionable digital artifacts. In a large plant, it is necessary to provide a single master reference signal to which all interconnected systems are synchronized. The master reference, fed to all pieces of equipment, allows audio data to be retimed and synchronized within specified tolerances. Large facilities, in particular, will benefit from the conversion of digital audio signals from sources without external sync capability to a standard, synchronized audio sample rate. Broadcast digital audio plants typically contain consumer and other nonsynchronizable equipment that requires sample rate conversion. Audio sample rate converters perform a function similar to video standard converters, in that a dynamic low pass filter continually adjusts the offending signals phase at the output of the converter. In some cases, the output and input sample rates can be locked together via an integer relationship in a process known as synchronous sample rate conversion. For example, 48 kHz and 44.1 kHz are related by the integer ratio of 160 to 147. Modern sample rate conversion can be accomplished with full 24bit resolution and THD+N below 140 dBm and as such has become an audibly lossless process. However, it is important to note that systems utilizing compressed audio, such as AC-3, Dolby E, and Linear
IMPLEMENTATION ISSUES
The key to realizing the benefits of digital audio on a systemwide scale is a thorough understanding of the principles underlying digital signal distribution, routing, and switching. There are, as explained, two electrical interfaces available for AES3 signals, and both require good engineering practices for successful implementation. Digital audios data rate dictates that uncompressed digital audio signals occupy a bandwidth similar to that of analog video. Regardless of the electrical interface, a well-engineered interconnect requires proper match of source, destination, and characteristic cable impedances. Prior to the 1992 revision of AES3, any equipment manufactured to AES3 1985 violated this principle, as that standard specified a 250 load impedance for receivers and a 110 source impedance for transmitters. Beginning in 1992, AES3 specifies impedance matching among transmitter, receiver, and cable.
Choice of Cable
The use of the unbalanced coaxial cable interface for AES3 data transmission is often preferred by video
225
SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES esquared, bit-for-bit accuracy of the AES3 audio payload is imperative and will be corrupted by sample rate conversioneven when used in 1:1 modes for retiming (i.e., 48 kHz is reclocked to local 48 kHz reference). standard whereby an AES/EBU signal with its embedded clock reference is used to derive proper synchronization. This eliminates the difficulties of distributing a high-frequency word clock square wave throughout a facility.
MADI Synchronization
It is necessary for the equipment transmitting MADI data to include timing information that the receiving equipment can extract and use for synchronization. At least one sync code must be sent per frame; a sync code consists of two consecutive 5-bit words not used in the 4-bit to 5-bit encoding scheme. The total MADI interface data rate is higher than the payload data rate required, the difference between these two rates being sufficient to include sync codes within each frame. The fiber distributed digital interface (FDDI) chip set used for MADI implementation automatically handles the required synchronizing and coding operations.
Signal Routing
Asynchronous routing is the simplest and most costefficient method of routing digital audio. It passes digital audio signals at any sample rate, a degree of flexibility that is ideal in situations in which a number of different audio sample rates are encountered. However, the lack of synchronization to a master reference makes it a poor choice for on-air applications or any other situation in which frame accurate switching or editing is required. An asynchronous router may be thought of as an electronic patching system, functioning as though simple wires were used to connect inputs to outputs. In an asynchronous system, it is imperative that the destination equipment be capable of locking to the sample rate of the signal routed to it; otherwise, muting usually takes place. The disadvantage of asynchronous routers is that their output signal is almost always corrupted when a switch is made between input signals. A switch typically results in one or more AES frames being damaged, and this may cause destination equipment to momentarily lose lock, causing muting or the generation of pops and clicks. Synchronous routing ensures precise timing and no corruption of the data stream during switches. It is considerably more complex and costly than asynchronous routing, as it requires that a transition between two inputs be made at an AES frame boundary. All inputs to a synchronous router must be locked to a common digital audio reference. A digital audio console is essentially a synchronous router with many controls. Note that when routing compressed audio such as Dolby E, switching must occur not only at an AES frame boundary, but also at an AES frame boundary located near the video vertical interval switch point to prevent corruption of the compressed audio packets. Systems like the Linear Acoustic StreamStacker-HD require switching only on the AES frame boundary. Routing and switching AC-3 encoded signals are of greater difficulty, as the encoded packets from one stream to the next are not phase aligned.
AES3 Synchronization
AES3 is inherently synchronous, the clock signal being readily recovered from the AES3 bitstream. However, the use of a master digital audio reference ensures that all digital audio equipment in a system will be frequency and phase locked and free of cascaded timing errors, and is highly recommended by AES11. The master reference signal may come from the digital audio console in a facility on the scale of a single room, or from an external reference generator in larger facilities. The master sync signal should be sent to all equipment capable of accepting external sync signals. Digital audio phase integrity must remain intact during the conversion of multiple audio channels between the digital and analog domains. Perfect phase synchronization requires use of an SDIF-2 word clock or an AES3 signal as the common master clock. Digital audio recording and processing equipment forces any AES3 input signal into a common AES3 frame phase. When such an AES3 frame alignment is performed, a phase error will result if there are any deviations in the frame phase of analog-to-digital (A/D) converters. When digital audio signals are transferred to a piece of equipment that is not synchronized using a master sync signal, sample rate converters must be used at the inputs to the receiving equipment to prevent clicks and pops.
Word Clock Synchronization

SDIF-2 word clock, commonly referred to as simply word clock, is a square wave signal at the digital audio sample rate. Word clock is commonly used as a reference signal in small, audio-only facilities. In facilities that handle both video and audio, black burst is commonly used as the reference for both video and AES audio signal synchronization. Note that most professional audio equipment does not accept word clock as a reference signal, but instead relies on the AES11
Jitter
Jitter is short-term frequency variation in the input data stream to a digital audio device. It can result from a number of causes, including such things as the coupling of excessive noise into a transmission link. Some jitter buildup is inevitable in a system, as certain components of the system inherently generate some amount of jitter. For example, noise in the phaselocked loops that control clock frequencies in the components of the system unavoidably generates some jitter. The presence of out-of-specification jitter on a
226
CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES
FIGURE 1.15-4 Representative digital audio level meter. (Courtesy Dorrough Electronics.)
digital audio signal or clock can result in bit errors that generate clicking and popping sounds. High levels of jitter may cause a receiving device to lose lock, while a relatively small amount may have no apparent negative effect unless present in devices performing analog-to-digital (A/D) or digital-to-analog (D/A) conversions. Excessive jitter is seldom a problem when only two pieces of equipment are involved, but typically builds up when larger numbers of equipment are interconnected. Jitter may be eliminated through the use of synchronizing digital-to-analog converters or a common synchronization signal. Jitter on the synchronization signal itself can cause degradation of all digital audio in devices locked to it.
Levels and Metering

When an analog audio signal is converted to digital, the greatest analog voltage level that may be represented digitally is called full-scale digital (FSD). When quantized, this voltage level causes all digital audio bits to be set to logic 1, and this level is called 0 dBFS (full scale). This is an inflexible limit, and any excursion of the analog signal above this level will be clipped off, as the digital audio word does not have the capacity to faithfully represent it. In practice, the FSD level is often set about 1 dB above the analog clip level in an effort to assure that digital clipping never occurs. When signals are converted between the analog and digital domains, the analog reference levels of A/D and D/A converters may be set to any number of values. If the analog reference level is improperly calibrated in any of the converters in the path, A/D and D/A conversions may result in an increase or a decrease in the level of the recovered analog signal. Consistency in the type of digital audio metering device used, good operator training, and the establishment of strict house standard reference levels and alignment practices are the best defenses when it comes to accurate audio level control. There is no U.S. standard for a specific digital audio level meter. Digital audiometers are often of the instantaneous response type, with no integration time, permitting them to respond with full excursion to a peak as brief as a single digital audio sample. Contrast this with the standard volume indicator (VU meter), which is an average-responding device, and the typi-
cal peak-program meter, which does not respond with full excursion to peaks with durations less than 10 ms. Typically, digital audio metering devices display a maximum value of 0 dBFS, and reference level lineup tone is set to a designated point below 0 dBFS to accommodate peaks without digital clipping. Figure 1.15-4 shows a representative digital audio meter, the display device of which is usually an array of light emitting diodes or other such devices. This representative meter displays a range of 40 dB to 0 dBFS, with lineup tone being calibrated at 20 dBFS. For television applications, SMPTE RP 155 recommends adjusting the level of lineup tone to read 20 dBFS on digital audio meters used on digital videotape recorders. Other industry segments have variously used lineup tone levels of 15, 18, and 20 dBFS. These varying reference levels may cause inconsistent results when digital audio recordings are interchanged. It is therefore important to establish common digital audio reference and operating levels when exchanging digital audio recordings. Loudness metering is best accomplished with meters designed to measure loudness. VU- and PPMtype meters are not truly appropriate for accurately judging loudness, as the results are often a mixture of meter readout and user interpretation and are thus unreliable for producing consistent results.
SUMMARY
Digital audio, with its many advantages, is inherently not susceptible to many of the problems that are encountered in analog audio systems. It does harbor some potential hazards of its own, however. With care and attention to good engineering practices in the design and maintenance of digital audio facilities and observance of the recommendations described in AES/EBU, IEC, and SMPTE standards, outstanding results will be realized.
Standards
[1] AES31992 (r2003) AES Recommended Practice for Digital Audio EngineeringSerial Transmission Format for Two-Channel Linearly Represented Digital Audio Data, New York, Audio Engineering Society, 2003. [2] AES51998 (r2003) AES Recommended Practice for Professional Digital AudioPreferred Sampling Frequencies for Applications
227
SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

Employing Pulse-Code Modulation, New York, Audio Engineering Society, 2003. AES101991 (r2003) AES Recommended Practice for Digital Audio EngineeringSerial Multichannel Audio Digital Interface (MADI), New York, Audio Engineering Society, 2003. AES111997 (r2003) AES Recommended Practice for Digital Audio EngineeringSynchronization of Digital Audio Equipment in Studio Operations, New York, Audio Engineering Society, 2003. AES171998 AES Standard Method for Audio EngineeringMeasurement of Digital Audio Equipment, New York, Audio Engineering Society, 1998. AES181996 AES Recommended Practice for Digital Audio EngineeringFormat for the User Data Channel of the AES Digital Audio Interface, New York, Audio Engineering Society, 1996. ATSC A/52B2005 Digital Audio Compression (AC-3) Standard, Washington, Advanced Television Systems Committee, 1995. IEC 60958 (1999) Digital Audio Interface, Geneva, International Electrotechnical Commission, 1999. SMPTE 259M1993 10-Bit 4:2:2 Component and 4fsc NTSC Composite Digital SignalsSerial Digital Interface, White Plains, Society of Motion Picture and Television Engineers, 1993. SMPTE 272M1994 (r2004) Formatting AES/EBU Audio and Auxiliary Data into Digital Video Ancillary Data Space, White Plains, Society of Motion Picture and Television Engineers, 2004. SMPTE 276M1995 Transmission of AES/EBU Digital Audio Signals over Coaxial Cable, White Plains, Society of Motion Picture and Television Engineers, 1995. SMPTE 292M1996 Bit-Serial Digital Interface for High-Definition Television Systems, White Plains, Society of Motion Picture and Television Engineers, 1996. SMPTE 299M1997 (r2004) 24-Bit Digital Audio Format for HDTV Bit-Serial Interface, White Plains, Society of Motion Picture and Television Engineers, 2004. SMPTE 302M1998/2000 Mapping of AES3 Data into MPEG-2 Transport Stream, White Plains, Society of Motion Picture and Television Engineers, 2000. SMPTE 320M1999 Channel Assignments and Levels on Multichannel Audio Media, White Plains, Society of Motion Picture and Television Engineers, 1999. SMPTE 337M through SMPTE 340M Formatting of Non-PCM Audio and Data in AES3 Serial Digital Audio Interface, White Plains, Society of Motion Picture and Television Engineers. SMPTE RP 155 Audio Levels for Digital Audio Records on Digital Television Tape Recorders, White Plains, Society of Motion Picture and Television Engineers, 2004. IEC 61937-1 Digital AudioInterface for Non-Linear PCM Encoded Bitstreams Applying IEC 60958, Part 1General, Geneva, International Electrotechnical Commission. IEC 61937-3 Digital AudioInterface for Non-Linear PCM Encoded Bitstreams Applying IEC 60958, Part 3Non-Linear PCM Bitstreams According to the AC-3 Format, Geneva, International Electrotechnical Commission.
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
Recommended Practices and Information Documents

AES2id, 1996 AES Information Document for Digital Audio EngineeringGuidelines for the Use of the AES Interface, New York, Audio Engineering Society, 1996. AES3id, 1995 AES Information Document for Digital Audio EngineeringTransmission of AES3 Formatted Data by Unbalanced Coaxial Cable, New York, Audio Engineering Society, 1995. AES10id, 1995 AES Information Document for Digital Audio EngineeringEngineering Guidelines for the Multichannel Audio Digital Interface (MADI) AES10, New York, Audio Engineering Society, 1995. SMPTE Recommended Practice RP 1551997 Audio Levels for Digital Audio Records on Digital Television Tape Recorders, White Plains, Society of Motion Picture and Television Engineers, 1997. SMPTE Engineering Guideline EG 321996 Emphasis of AES/EBU Audio in Television Systems and Preferred Audio Sampling Rate, White Plains, Society of Motion Picture and Television Engineers, 1996.
228

NAB - 1 - 15 Digital Audio Standards and Practices

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

NAB - 1 - 15 Digital Audio Standards and Practices

Diunggah oleh

Hak Cipta:

Format Tersedia

C H A P T E R

Updated for the 10th Edition by

DIGITAL AUDIO STANDARDS

word; (b) 1624 bit audio word.

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

AES3 block and frame structure.

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

FIGURE 1.15-3 AES channel coding.

ATSC A/52B2005 Digital Audio Compression (AC-3) Standard

IEC 60958 Digital Audio Interface

SMPTE 337M through 341M

Digital Audio Distribution

Word Clock Synchronization

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

Levels and Metering

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

Recommended Practices and Information Documents

Anda mungkin juga menyukai