Anda di halaman 1dari 6

Comparative Analysis of H.

264 and Motion-JPEG2000 compression for Video Telemetry


Kurt Hallamasek, Karen Hallamasek, Brad Schwagler, Les Oxley Ampex Data Systems Corporation Redwood City, CA USA Kurt_Hallamasek@ampex.com Karen_Hallamasek@ampex.com Brad_Schwagler@ampex.com Les_Oxley@ampex.com ABSTRACT The H.264/AVC standard, popular in commercial video recording and distribution, has also been widely adopted for high-definition video compression in Intelligence, Surveillance and Reconnaissance and for Flight Test applications. H.264/AVC is the most modern and bandwidthefficient compression algorithm specified for video recording in the Digital Recording IRIG Standard 10611, Chapter 10. This bandwidth efficiency is largely derived from the inter-frame compression component of the standard. Motion JPEG-2000 compression is often considered for cockpit display recording, due to the concern that details in the symbols and graphics suffer excessively from artifacts of inter-frame compression and that critical information might be lost. In this paper, we report on a quantitative comparison of H.264/AVC and Motion JPEG-2000 encoding for HD video telemetry. Actual encoder implementations in video recorder products are used for the comparison. INTRODUCTION Commercial-off the shelf (COTS) technologies increasingly have been sought out for military and aerospace solutions, with the goal to save cost and development time. Often-times technology that was originally developed to serve the needs for the consumer or enterprise market is adapted to a specific purpose for which it was not originally intended. This has been particularly true for the technologies in support of capturing, distributing and storing video. The entertainment industry has long been a driver of video technology, first for broadcasting video and more recently for streaming video over the internet. Video transmission is also evolving to better serve land-based and mobile telecommunication applications. For flight and range tests, video has become part of the measurement set; transmitting and recording video, along with other instrumentation data, is common in these and in intelligence, reconnaissance and surveillance (ISR) applications. In consumer and enterprise markets a diverse range of protocols and formats exist for video compression, transmission and file containers. In an attempt to assure interoperability in products for the defense 1 market, standardization bodies have narrowed the selection and specified implementation details for this selection. The NATO standardization agency and the Motion Imagery Standards Board (MISB) in the United States have essentially all limited the choices for video compression to MPEG-2, H.264/MPEG-4 AVC Part 10 and Motion JPEG-2000 encoding. The latter two are recommended for new developments. For brevity, we will refer to the two encoding schemes simply as H.264 and MJ2K, respectively. Both are modern and mature encoders, built on improvements of previous generations of encoders. For video transmission, the MPEG-2 transport stream is the most common way to encapsulate video that is encoded with either MPEG-2 or H.264 in an IP protocol. The transport stream also carries audio and sensor metadata packets with the video packets. Since accurate time stamping is a requirement for test and mission acquisition that is not found in commercial applications, implementations for time stamping using mechanisms allowed by the commercial standard have been specified. The Telemetry Group of the Range Commander Council IRIG 106-chapter 10 digital recording standard specifies MPEG-2 and H.264 as video formats, JPEG-2000 as still image format. Video compression algorithms are optimized for human visual perception and aesthetics. The analyst examining the flight test video requires a level of fidelity in each video frame that contains important measurement information. When flight test video is part of the data set sent over a telemetry link, is it best to rely on MJ2K, which compresses each frame independently, or should H.264 be considered, which compresses frames as part of a group of pictures? At what level of compression do the artifacts of interframe compression become problematic? VIDEO COMPRESSION The enabling technology that has contributed most to shape the way in which video is disseminated and stored today is video compression. Network bandwidth is a precious resource in mobile and terrestrial networks. Video compression has made it possible for video to be distributed over IP networks, changing the way video content is marketed and consumed.

Preserving bandwidth is particularly relevant in telemetry when video is part of the measurement set. To illustrate the advances in video compression, consider that the first airborne digital recorder capable of recording standard definition digital video in 1984 boasted a data rate of 120 Mbit/sec [1]. Today, HD video containing six times the number of pixels of standard definition video, can be encoded into a bit stream with one tenth of the bit rate, while maintaining blu-ray quality. While recording a bit stream of 120 Mbps is no longer particularly challenging, managing the ever more crowded spectrum efficiently and using data link capacity efficiently is increasingly so. Analog FM links, which consume about 20 to 25 MHz for standard definition video, are being replaced by bandwidth efficient digital modems. In this paper we examine the quality of HD video that is available in the 2 Mbps to 15 Mbps range afforded by these modems. A video stream is captured as a sequence of pictures. Each picture is acquired as a frame of pixels (compressive sensing technologies notwithstanding). The large amount of spatial redundancy in each picture neighboring pixels are often similar provides an opportunity to compress the image without noticeable loss of fidelity. Both the MJ2K and the H.264 encoder use intra-frame encoding to dramatically reduce the amount of data required to represent each picture, however the compression methods differ substantially. The light incident on the image sensor is typically sensed as separated color components, so it is natural to represent the image as color components in the RGB color space. The RGB color space can be mapped, in a linear fashion, to the luminance-chrominance space ( YCbCr), where the image is represented by intensity (luminance) and color (chrominance) components. This representation lends itself to reducing image data by color-sub sampling: human vision is much less sensitive to changes in color than in brightness. A common first step in video compression is to use fewer samples to represent color about half or a quarter as many as are used to represent intensity. Color subsampling reduces the amount of data that needs to be processed by the computationally more complex data compression algorithms. The JPEG standard does not explicitly offer a sub-sampled YCbCr representation for color image compression [2], but it does allow for it and is often implemented with color sub-sampling. The H.264 standard prescribes 4:2:0 color sub-sampling, meaning that each chrominance component is represented with one fourth the number of samples as the luminance. Fidelity Range Extensions (FRExt) of the H.264 standards do allow for higher-resolution color sampling, but these are not typically found in avionic hardware encoders. Motion JPEG-2000 The MJ2K encoder is built on JPEG-2000 image compression, which has the distinctive feature that it uses the wavelet transform for image compression. Wavelet compression is highly efficient; a good overview is given in [3]. One of the remarkable 2

properties of JPEG 2000 is its scalability in both resolution and quality. Lower resolution images are embedded in higher resolution images. A low resolution image can be accessed by only decoding part of the bit stream. The JPIP protocol allows the client to ask the server for further components of the bit stream to support increasingly higher resolutions; no trans-coding is required to support reconstructing images at different resolutions. The protocol also supports region-of-interest viewing based on the scalability of the encoding scheme. An airborne application taking advantage of the scalability of JPEG encoding is the DARPA funded ARGUS-IS project [4]. There, a 1.8 gigapixel sensor images several square miles with high resolution. The 425 Gbits/sec stream of image data is compressed using JPEG-2000 and sent over a 274 Mbps Common Data Link (CDL). Another notable application of MJ2K is digital cinema. The Digital Cinema Initiative Group has selected MJ2K as the means to distribute digital content for the large movie screens. In flight test applications cockpit display video or natural video with synthetic overlays is recorded to validate avionic systems. When the Navy retrofitted the Ea-18G Growler with high resolution 8x10 inch displays, it commissioned a High Resolution Recorder with MJ2K video encoding for flight tests. The comparatively low compression rate produces good quality video, but the data rates are more suitable for on-board recording than for remote sensing. The lack of inter-frame compression eliminates the possibility of the associated artifacts. However, it limits the opportunity to leverage temporal redundancy in the video stream, to reduce bit rate, to decimating the frame rate. H.264 It is common in a video sequence for a portion of a frame to remain constant, or to only change in a minor way, between successive frames. Motion-compensated prediction with multiple reference frames is a mechanism built into H.264 that allows a portion of an image to be predicted from one or more images already transmitted [5]. This use of inter-frame prediction is a fundamental difference between H.264 and MJ2K. Motion estimation, a computationally intensive algorithm, is used to predict how the picture changes. A construct of I-frames, P-frames and B-frames is used to reduce the temporal redundancy in the compressed video stream: I-frames encode a source frame independent of other frames; P-frames encode the differences between the forward-predicted frame and the source frame; bi-directional prediction in B-frames encodes the difference between the average of the forward-predicted frame and the backward-predicted frame from a future frame. H.264 does not have the built-in scalability for quality and resolution that MJ2K has, but it is still practical to build encoders with quality, resolution and bandwidth tradeoffs as outlined by the Engineering Guideline

MISB EG 0904. An H.26 64-based enco oder that allow ws interactive control of reso olution and q quality has be een demonstrated in [6]. d H.264 is wi idely used in consumer a n and profession nal markets. It i readily acc is cessible on a wide range of software and hardware pr d roducts. It is u used by Blu-r ray Disc and for streaming video on the in r nternet. Popu ular media player such as Windows Media Player and t rs, a the VLC Media Player have b built-in H.264 codecs. DER PERFO ORMANCE A ANALYSIS ENCOD The followin analysis i intended to provide som ng is o me practical g guidance fo or codec selection a and configuration for transmitt n ting HD video in a bandwid o dth budget in the range of 2 t 20 Mbps. T e to These date rat tes are typical fo digital data links for telemetry and IS for a SR use cases. W encode an uncompresse test sequen We n ed nce at different c compression r rates, analyze the quality a and look for ar rtifacts partic cular to eac compressi ch ion scheme. Sequence The Test S A test sequ uence for enc coder evalua ation should be relevant to th particular a he application, b because differe ent source mater can favor different enc rial r coding schem mes. The test seq quence chosen for the com n mparison here is from a video captured dur o ring a flight t with an E test E/O sensor. This test sequence contains a v s e view of an urb ban landscape, w buildings cars and tre with s, ees. A graph hic overlay con ntains compu uter-generated symbols a d and characters. T type of c This compound vid is typical of deo ISR video, sy ynthetic video and heads-up o p-display vide eo. The aerial vi iew from the moving vanta point of t age the camera on th aircraft res he sults in video in which lar o rge areas of the picture ha e ave a low co ommon moti ion component. The camera has a long-f focal length a and occasionally pans, increas sing the motio content. T on The

te emporal redun ndancy in this type of vide may favor s eo r th H.264 enc he coder. The video overl lay with the e sy ymbols and c characters doe not share the common es n motion of the p m physical back kground video. One would d ex xpect the wav velet transform of the MJ2 m 2Kencoder to o ex xcel in rende ering the sy ymbol conten nt. The test t se equence analy yzed is 500 f frames. One frame of the e vi ideo (274) is s shown in Figu 1. ure The Encoder Th rs To provide res o sults indicativ of practica attainable ve ally e pe erformance, en ncoders in act products manufactured tual m d by Ampex that process and record HD video, in real y t d v l tim were used for the analy me, d ysis. Th MJ2K enc he coder available on the mini was used iR d in this study. The miniR is an IRIG chapter 10 n R G 0 in nstrumentation recorder tha can be con n at nfigured with h an of 20 or s interface cards, includin a range of ny so ng f vi ideo interfaces and H.264 a MJ2K en and ncoders. (The e miniR was the aforemen m ntioned High Resolution h n Display Record on the Ea der a-18G Growle Video on er). n th MJ2Kencod is convert to the YUV color space he der ted V e us sing 4:2:2 sam mpling; that i color is sa is, ampled at full l ve ertical resolut tion and half the horizont resolution f tal n be efore the JPEG G2000 compre ession is appli ied. Th H.264 ana he alysis was don using the video encoder ne v r fu unction in the TS 100v. Th TS 100v is a HD video he o re ecorder/ encod purpose-d der, designed for airborne ISR R ap pplications. T This encoder supports Baseline Profile e an up to level 4.0, i.e. the maximum da rate of the nd l ata e vi ideo stream is 20 Mbps; th largest form supported s he mat d is 1080p30. Pe the H.264 s er standard, 4:2:0 sampling is s us in the Y sed YUV color sp pace. Color in nformation is s sa ampled at half the rate of in f ntensity inform mation in both h th vertical and horizontal di he d imensions.

Figure 1: Frame number 274 of the test sequence used for encoder evaluation

Results A metric oft used to as ten ssess encoder performance is r e Peak-Signal-to-Noise Rati Figure 2 s io. shows the PSN NR computed fo each frame of the test sequence wh or e hen processed w with each of the encoders. The H.2 264 encoder was set to produ a bit strea at a consta s uce am ant rate of 8 Mb bps. The Grou up-of-Pictures (GOP) was set s to 30, mean ning that vide is encoded in a sequen eo d nce where one I-frame, wh hich uses on nly intra-fram me use encoding, is followed by 29 P-frames, which also u motion predi iction. It can be seen that t PSNR of t the the H.264 stream varies mo than the PSNR in t m ore the JPEG2000 st tream, and th it is typica a couple d hat ally dB better for the I-frame than for the follo e n owing P-fram mes. Indeed, comp parisons have shown that t I-frame on e the nly H.264 encod makes a go intra-fram only encod der ood me der [7]. The ave erage PSNR is 44 dB for the H.264 encod s e der at 8 Mbps.

H.26 64

JP PEG2000

Performance P H.264 20 Mbps 49 dB 0.72 bpp MJ2K 22 Mbps 45 dB 0.8 bpp

(a)

H.264 8 Mbps 44 dB 0.29 bpp MJ2K 8.3 Mbps 38 dB 0.3 bpp

(b)

Figure 2: Peak Signal-to-Noise Ratio for each frame in t r the sequence comp pressed to 8 Mb bps

H.264 2 Mbps 36.7 dB 0.07 bpp MJ2K 1.9 Mbps 32 dB 0.07 bpp

The Motion JPEG2000 en ncoder was se to encode t et the video stream with a rate o 0.3 bits per pixel, yieldi m of r ing an average b rate of 8.29 Mbps. The average PSN bit 9 e NR is 38 dB. T video qu The uality is acce eptable for bo oth cases. The H H.264 encode does better in this use ca er ase than the MJ2 encoder. This is consi 2K istent with oth her comparisons of these code the MJ2K excels at larg ecs: K ger image frame and lower compression rates [8]. F es For this bandwi idth constrain ned telemetr case, H.2 ry 264 provides superior results a 720p30. at

(c)

H.264 1 Mbps 33dB 0.04 bpp MJ2K 1.4 Mbps 31 dB 0.05 bpp

(d)

Figure 3: A Average PSNR vs. Data Rate

Figure 4: Two detail areas of frame 274 at various o f v com mpression ratios s

The average PSNR perf e formance of the whole test sequence wa calculated a several enco as at oder settings, t the results are plotted in Figu 3. The ma ure aximum bit ra ate for the profi that the H ile H.264 encoder supports is 20 r Mbps; Figur 3 indicates better perfo re formance of t the H.264 encod up to that rate. At low compressi der t wer ion rates, the MJ encoder fu J2K further improv the PSNR. ves A related me etric for enco oder performan is the Ra nce ateDistortion cu urve, shown in Figure 5. The rate of t the encoder mea asures how m many bits, on t average, a the are used to repre esent one pixe For the typ el. pical 8-bit dep pth video, each pixel start out as a 24-bit RG h ts GB presentation. Figure 5 sho that, usin only 0.15 b . ows ng bits per pixel (bp the H.264 encoder yiel a better th pp) 4 lds han 40 dB PSNR representati of the image. The MJ2 R ion 2K encoder has a rate of 0.4 b for the sam distortion. bpp me

n while they are g gone or have moved in the e on the scale, w H.264 picture. Surprisingly the H.264 encoder does H y, s be etter with rendering th text rea he adable. Also o un nexpected wa that, in the heavily com as mpressed case, , co olor is prese erved better by the H.2 264 encoder. . Comparing the yellow parkin lot striping of the MJ2K ng g K en ncoder at a bi rate of 1.9 M it Mbit/sec (Figu 4c) to the ure e H.264 encoder at 1 Mbit/s shows that the H.264 H r sec 4 en ncoder still g gets the color right (Figu 6). Both r ure h im mages are at about the sa ame PSNR, yet the MJ2K y K en ncoder, which started out w twice the vertical color h with r re esolution, ends up with a les faithful repr s ss resentation.

Figure 6: MJ2K color compr ression artifact at PSNR ~ a dB; dle) ( 32d H.264 (midd and MJ2K (right)

In nter-frame co ompression ar rtifacts in H.264 became H e no oticeable at th 2 Mbps bit rate (.07 bpp in sections he t p) s of the test vid f deo where th camera pa he anned. Small l fe eatures in the overlay some etimes had gh trails that host t pa anned with th scenery (F he Figure 7). A more severe e in nter-frame com mpression arti ifact can be seen in Figure e 4( at 1 Mbp the cross hair on the group of cars (d), ps: g s be egins to disint tegrate, and th right section of the cross he s ha moves dow On the M air wn. MJ2K encoder, the crosshair , r is completely g gone.

Figure 5: Rate Distortion curve for the H.264 and MJ2K encoders.

We relate the performance metrics to th quality of t e e he the encoded and decoded im d mage by exam mining a part of one frame t that has gone through the encoding a e e and decoding pro ocess. Figure 4 shows tw details fro e wo om the video fra ame in Figure 1, containin natural vid e ng deo and compute generated ov er verlays. The same details a are shown at s selected com mpression rat tes to give a qualitative i indication of the image quality and its deterioration when compression is push too far. T n hed The performance metrics b bitrate of the video strea e am, average PSN of the sequ NR uence, rate (b bit-per-pixel, i i.e. how many b the enco bits oder uses on the average to encode one pixel) are sum mmarized nex to the ima xt age details. A PS SNR above fo dB yields images that a orty are perceptibly f free of artifacts, as illustrat in the ima ted age details in Fig gure 4(a) and (b). For the H.264 encod d e der this holds tru for bit rates from 4 Mbps to 20 Mbps. ue s s Figure 4(c) and (d) illu ustrate what happens wh hen compression is pushed to far. Obvio oo ously there is a s point at wh hich artifacts dominate the image. The e ese settings are o only of intere to learn ho compressi est ow ion ultimately fa apart. Con alls nsider the ima detail of t age the parking lot w the grap with phic overlay s shown in Figu ure 4(c). As exp pected, the MJ2K encoder renders the do ots 5

Figure 7: H.264 inter-frame c compression art tifact at 2 Mbps

CONCLU USION We have show that H.264 encoding ma W wn akes it indeed d po ossible to sen good quality 720p HD video over nd D r te elemetry mode operating in the range of 3 Mbps to ems g o 20 Mbps. H.264 outperform MJ2K at these bit rates 0 ms t s du to the a ue additional fle exibility that inter-frame t e en ncoding afford Symbolic and natural video remain ds. cs n pr reserved faith hfully at data rates down to 4 Mbps. a n . In nter-frame com mpression art tifacts become pronounced e d at 2 Mbps, but a not notice t are eable at higher bit rates. r REFERE ENCES 1] n, ments in Rotar Head ry [1 J. Mallinson ""Achievem Recording"," Proceeding of the IEEE vol. 78, no. gs E, 6, pp. 1004-1016, June 19 990. [2 D. Taubman and M. Mar 2] n rcellin, Jpeg20 000: Image Compressio Fundament on tals, Standards and s, Practice, Kl luwer Academ Publishers 2002, p. 12. mic s,

[3] A. Skodras, C. Charilaos and T. Ebrahimi, "The JPEG 2000 Still Image Compression Standard," IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36-58, 2001. [4] J. E. J. A. D. C. D. H. E. L. M. Brian Leiningera, "Autonomous Real-time Ground Ubiquitous Surveillance - Imaging System (ARGUS-IS)," Proceedings of SPIE, vol. 6981, 2008. [5] D. Marpe, T. Wiegand and G. Sullivan, "The H.264/MPEG4 advanced video coding standard and its applications," IEEE Communications Magazine, vol. 44, no. 8, pp. 134 - 143 , 2006. [6] K. Hallamasek, D. Reher and M. Makar, "Compression-Aware Region-of-Interest Video Encoding for Full-Motion Video Telemetry and Surveillance," Proceedings of the 2010 European Telemetry Conference, 2010. [7] R. de Queiroz, R. Ortis, A. Zaghetto and T. Fonseca, "Fringe benefits of the H.264/AVC," International Telecommunications Symposium 2006, pp. 166 - 170, 2006. [8] D. Marpe, V. George, H. L. Cycon and B. K. U. , "Performance Evaluation of Motion-JPEG2000 in Comparison with H.264 / AVC Operated in Intra Coding Mode," Proceeding SPIE, vol. 5266, pp. 129-137, 2004.