Anda di halaman 1dari 27

Advanced Audio Coding

Introduction
Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.

Advanced Audio Coding (AAC)


It is a standardized, lossy digital audio compression scheme. It was developed with the cooperation and contributions of companies mainly including Dolby, Fraunhofer (FhG), AT&T, Sony and Nokia, and was officially declared an international standard by the Moving Pictures Experts Group in April of 1997. Not backward compatible with other MPEG audio standards (like mp3)

AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filter bank, non-uniform quantization, Huffman coding, iteration loop structure using analysis by-synthesis), but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates. Its popularity is currently maintained by it being the default iTunes codec, the media player which powers iPod, the most popular digital audio player on the market. Furthermore, the iTunes Music Store, whose sales account for 85% of the market for legal online downloads, sells AAC-encoded songs.

AAC's improvements over MP3


Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) Up to 48 channels Higher efficiency and simpler filterbank (hybrid pure MDCT) Higher coding efficiency for stationary signals (blocksize: 576 1024 samples) Higher coding efficiency for transient signals (blocksize: 192 128 samples) Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe Much better handling of frequencies above 16 kHz More flexible joint stereo (separate for every scale band)

Both the mid/side coding and the intensity coding are more flexible, allowing to apply them to reduce the bit-rate more frequently. An optional backward prediction, computed line by line, achieves better coding efficiency especially for very tone-like signals. This feature is only available within the rarely used main profile. Improved Huffman Coding : In AAC, coding by quadruples of frequency lines applied more often. In addition, the assignment of Huffman code tables to coder partitions can be much more flexible. AAC and HE-AAC are far better than MP3 at very low bitrates, but at medium to higher bitrates the two formats are more comparable

MPEG Audio Codec Family


MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2) MPEG-1 Layer 3 (mp3) MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC MPEG-4 HE AAC MPEG-4 HE AAV v2

Evolution from MPEG-2 AAC LC to MPEG-4 AAC LC to HE-AACv2


MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC

MPEG-4 HE AAC

MPEG-4 HE AAV v2

Mepeg layers
Layer 1: DCT type filter with equal frequency spread per band, sychoacoustic model only uses frequency masking. Layer 2: (Musicam or MUSICAM) Same filter bank as layer 1. Psychoacoustic model uses a little bit of the temporal masking. Layer 3 (MP3): Layer 1 filter bank followed by MDCT per band to obtain nonuniform frequency division similar to critical bands. Psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses Huffman coder. At the time of MPEG1 audio development (finalized 1992), Layer 3 was considered too complex to be practically useful. But today, layer 3 is the most widely deployed audio coding method (known as MP3), because it provides good quality at an acceptable bit rate. It is also because the code for layer 3 is distributed freely.

Basic idea for the coding technique


Decompose a signal into separate frequency bands by using a filter bank Analyze signal energy in different bands and determine the total masking threshold of each band because of signals in other band/time Quantize samples in different bands with accuracy proportional to the masking level 1) Any signal below the masking level does not need to be coded 2)Signal above the masking level are quantized with a quantization step size according to masking level and bits are assigned across bands so that each additional bit provides maximum reduction in perceived distortion.

Audio Coding Basics


Human hearing limited to values lower than ~20kHz in most cases Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components Stereo audio streams contain largely redundant information MPEG audio compression takes advantage of these facts to reduce extent and detail of mostly inaudible frequency ranges

Masking
what is Masking : Masking refers to a process where one sound is rendered inaudible because of the presence of another sound

Basic structure of audio Encoder and Decoder


How does it works ?

Advanced Audio Coding


The perceptually-oriented traditional waveform codecs, Reproducing the waveform of the original input audio signal with a minimum amount of data while considering all the psychoacoustic principles to minimize the audibility of coding effects. Other well-known representatives of this of codecs are 1) MPEG-1/2 Layer 2, MPEG-1/2 Layer 3 (mp3) and 2) Dolby AC-3 . The main tools for the AAC 1)Modified Discrete Cosine Transform (MDCT): 1)filter bank using window switching: 2) Transforming the signal into a spectral representation is the key to apply psychoacoustic principles , Redundancy problems reduction for audio content. 2)Stereo Processing : To increase the compression efficiency for stereo signals. 3)Temporal Noise Shaping (TNS) : This avoids undesirable effects caused by the relatively coarse time resolution of the MDCT filter bank , by tool to shape quantization noise in the time domain by running a prediction across frequency on the spectral data 4)Quantization and Coding : Improves compression efficiency , by using tools to quantize and code the spectrum by AAC bit stream syntax.

AAC
Spectral Band Replication (SBR): Bandwidth extension technology is based on the observation for the purpose of improved compression . Instead of transmitting the upper part of the spectrum with AAC, SBR regenerates it from the lower part with the help of some low-bit rate guidance data. For regenerating the missing high-frequency components, using a QMF (Quadrature Mirror Filter) filter bank analysis/synthesis system. The main tools are : 1)High Frequency Reconstruction: Transposer or Generator -- upper part of the spectrum by copying and shifting the lower part of the transmitted spectrum. To generate the highfrequency spectrum. Constructor -- addition of missing sinusoids generates a the 2)Envelope Adjustment: upper spectrum generated by the transposer needs to be shaped subsequently with respect to frequency and time .

AAC
Parametric Stereo (PS): 1) joint coding of stereo audio : just a mono-downmix is transmitted, along with a small data stream describing it becomes to up mix in the decoder Noiseless Coding : This is done by a lossless packing of quantized spectral data exploiting statistical dependencies and other properties. To achieve a further gain in required data rate by reduction of redundancy in the representation of the transmitted data. PREDECTIVE CODING: Forward prediction : The correlation between subsequent input samples is exploited by quantizing /coding the prediction error based on the unquantized input samples. Backward prediction :This scheme is also known as opposed to the more widely used which comprises a prediction based on previously quantized values

Diagram for the predictive analysis

AAC Compression

Extensions and Improvements


Some extensions have been added to the original AAC standard:
MPEG-4 Scalable To Lossless (SLS); High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1 or AAC+ - the combination of SBR (Spectral Band Replication) and AAC; used for low bitrates; HE-AAC v.2, a.k.a. aacPlus v2 - the combination of Parametric Stereo (PS) and HE-AAC; Perceptual Noise Substitution (PNS); Long Term Predictor (LTP) - added in MPEG-4 Part 3.

Architecture of HE-AAC

Block Diagram of MPEG-2 AAC

MPEG-HE AAC
HE-AAC is the low bit rate codec in the AAC family and is a combination of the AAC LC (Advanced Audio Coding Low Complexity) audio coder and SBR (Spectral Band Replication) bandwidth expansion tool. This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s. HE-AAC is also known as AAC Plus and can be used in multichannel operations.

Diagram for the HE AAC v2

MPEG-4 HE-AAC v2
Combined with parametric stereo, the HE-AAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content. HE-AAC v2 is also known as AAC Plus v2.

Refrences ::

Anda mungkin juga menyukai