Anda di halaman 1dari 8

FPGA Implementation and Verification System of H.

264/AVC Encoder for HDTV Applications


Teng Wang, Chih-Kuang Chen, Qi-Hua Yang, and Xin-An Wang
Key Lab of Integrated Microsystems Science and Engineering Applications, Peking University Shenzhen Graduate School, Shenzhen 518055, China wangteng@sz.pku.edu.cn, anxinwang@pku.edu.cn

Abstract. For huge systems like video processing, FPGA prototyping plays an important role before taping out. In this paper, a verification system for H.264/AVC encoders with FPGA prototyping is proposed and implemented. An H.264 encoder with baseline profile of Level 3.2 was carried out with a clock frequency of 200MHz on a Xilinx Virtex-6 FPGA connected with DDR3 memory, which could satisfy real-time encoding for HDTV applications (720P@60fps) with a PSNR around 34 db. The encoder was finally implemented with SMIC 65nm CMOS technology for silicon verification. Keywords: FPGA Prototyping, Verification, H.264/AVC, HDTV.

Introduction

Moores low shows that the processing capability of IC manufacturing has been improved increasingly, and that the average annual growth rate can approach 58% [1]. At the same time, FPGA leading companies Xilinx and Altera announced FPGA products with 28nm process technique on 2011 [2][3]. It makes whole verification system integrated on FPGA with real-time running becomes possible. Fig. 1 shows the trade-off between performance and flexibility of a variety of verification methods [4]. It shows that FPGA prototype takes good balance between performance and flexibility. Therefore, for complex IC such as ASIC or SoC (System on Chip), FPGA verification is still an effective method before IC tape-out.

Fig. 1. Trade-off between performance and flexibility of a variety of verification methods


D. Jin and S. Lin (Eds.): Advances in CSIE, Vol. 2, AISC 169, pp. 345352. springerlink.com Springer-Verlag Berlin Heidelberg 2012

346

T. Wang et al.

Nowadays FPGA is of higher and higher integration and many modern applications use FPGAs to implement complex systems. In order to design quickly and with more flexibility, FPGA-based development board becomes more and more popular. A development board is comprised of one or more piece of latest high-capacity and fastspeed FPGA, varieties of proven peripherals, industry standard interfaces, power supply circuits, status indicators, control switches and debug interface to make it easy to create a prototype system for most complex applications. A facility and effective verification approach is proposed for an H.264/AVC encoder design and then implemented on a highly integrated FPGA platform. The system is first simulated for function debugging and then implemented on the FPGA platform for real-time verification with a clock frequency of 200MHz. The paper is organized as follows: Section 1 is a general introduction, section 2 presents an overview of H.264 encoder and the FPGA platform used in this paper, section 3 introduces the proposed schemes and the architecture and the verification system, section 4 shows the implementation results and section 5 draws the conclusion.

Overview of H.264 Encoder and FPGA Prototype

For providing better compression of video images, H.264/AVC standard is jointly developed by ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) and has been published in 2003 [5]. Fig .2 shows the H.264 encoder system block [6]. In Fig. 2, current frame (named as Fn) is processed in units of MBs (macro block), which are encoded in intra or inter mode. A predicted MB (marked as P) is formed based on reconstructed frame which is unfiltered (named as uF`n). The reference picture use the previously encoded frames named as F`n-1. In Intra mode P is formed from samples in the current frame that have previously reconstructed, while in Inter mode P is formed by MC (motion compensation) from F`n-1. The P is subtracted with Fn to produce a residual block named as Dn, which is then processed by DCT and Quantization to obtain a set of quantized transform coefficients marked as X, which is then encoded with Entropy and NAL encode for transmission or storage. The coefficients X sent to Inverse Quantization and IDCT to produce a differential block named as D`n. P is added to D`n to create a reconstructed block named as uF`n, which is then filtered by DB (de-blocking) filter to reduce the effects of blocking distortion and obtain the reconstructed reference frame named as F`n.

Fig. 2. The architecture of H.264 Encoder

FPGA Implementation and Verification System

347

Fig. 3. The architecture of purposed FPGA prototype

Fig. 3 shows the architecture of FPGA platform [7] used in this paper. The Marvell MV78200 CPU connected with the Configuration FPGA (Xilinx Virtex-5 XC5VLX85T) is used to configure the two user FPGAs (Xilinx Virtex-6 XC6VLX240T). The User FPGA XC6VLX240T uses 40 nm copper CMOS process technology and the platform can emulate up to 5 million gates of logic as measured by a reasonable ASIC gate counting standard. The Marvell MV78200 can interact with PC through either the USB, Ethernet or PCIe interface. Xilinx ISE Project Navigator12.3 suite tool with Chipscope provides simulation and debug environment. A Linux kernel (Linux 2.6.22.18) provides the basic services and device drivers used on the Marvell CPU. External memory of the FPGA platform is DDR3 SODIMM, which can be used to store the RAW data and encoded data of the H.264 encoders. The JTAG device is connected between PC with FPGA, which provides a configuration and debug interface via JTAG chain.

Architecture of Purposed Scheme

An H264/AVC encoder for HDTV (720P@60fps) application with target spec of baseline profile at Level 3.2 is designed. The proposed design is conducted with a pipelining architecture of 5 stages, which is controlled by a control unit. The first stage is ME (Motion Estimation); the second stage is FME (Fraction Motion Estimation); the third stage consists intra prediction, DCT/IDCT, quantization and inverse quantization; the fourth stage is composed of VLC (variable length coding) and DB (de-blocking filer) and the fifth stage is NAL (network abstract level) coding.

Fig. 4. Architecture of purposed H264 Encoder

348

T. Wang et al.

Fig. 5. The proposed simulation and debugging system for H.264 encoder

Fig. 6. FPGA prototyping system based on DN-DualV6-PCIe-4 platform

Shared memories and distributed memories are used among the pipeline stages for data transfer, which means the flowing of the pipeline is data-driven. This can increase the utilization of memories and obtain better timing performance. Controller modules provide control signals which dominate the coding behaviors in the H.264 encoder. Direct Memory Access controller (DMAC) module plays as interface with external memory of DDR3. Fig. 5 shows the architecture of the proposed simulation and debugging systems for the encoder. The simulation procedure is divided into three steps. Firstly, the system controller control arbiter to choose the raw data path and write raw data to DDR3 memory. Secondly, the arbiter switches to the encoder path and the encoder interacts with the DDR3 while the coding is carrying on. Thirdly, the encoded bitstream in the DDR3 is transferred to the result monitor through the arbiter. A H264to-DDR3 module and Simtop-to-DDR3 module are designed to match the DDR3 protocol. The DDR3 model uses Micron MT8JSF12864HY SODIMM for simulation. And the DDR3 controller with PHY uses Xilinx Memory Interface Generator (MIG) tools, which is provided by Xilinx [8]. Fig. 6 shows FPGA prototyping system based on DN-DualV6-PCIe-4 platform [7], which is developed by the DINI group of California, USA. The platform can be connected to PC with either USB or Ethernet or PCIe interface. The Xilinx USB Cable cable is used to configure and debug the FPGAs via JTAG chain.

FPGA Implementation and Verification System

349

Table1 shows the hardware and software used in proposed system. ModelSim and Xilinx ISE suit are used for simulation, debugging and implementation to the FPGA. A Linux kernel is installed on FPGA prototype board, which provides the FPGA platform a configuration and communication environment with PC.
Table 1. (a) Hardware and (b) Software environment in the proposed scheme Software Window 7 Linux Xilinx ISE ModelSim Hardware PC Description Note Win7 Professional 64 OS of PC Linux 2.6.22.18 OS of FPGA Prototype Board Xilinx ISE Navigator v12.3 Xilinx FPGA design suit v12.3 Xilinx ChipScope Xilinx Debug tools ModelSim SE PLUS 6.5 Simulation tool (a) Description MB: ASUS M4A88TD-M CPU: AMD Athlon II X2 250 processor 3.0 GHz RAM: DDR3-1333, 4GB x 4 Marvell MV78200 CPU (Dual) Configuration FPGA: Xilinx Virtex-5 LX85 User FPGA: Xilinx Virtex-6 XC6VLX240T x 2 Xilinx USB Cable 801.11 b/g router, for Ethernet interface use (b)

FPGA Prototype Board Programming Cable Router

Results and Discussion

Before FPGA implementation, function simulation is first conducted with Xilinx ISE based on the simulation system proposed in Fig.5 and the simulation waveform is demonstrated in Fig.7 (a). The design is then implemented on the platform presented above and Fig.7 (b) shows the debug waveform with Xilinx ChipScope. Both the simulation and implementation results provide good evidence for the correctness of the encoder. The overall system is implemented on a Xilinx Virtex-6 xc6lx240t-1156 FPGA with 200MHz working frequency, which can support real-time HDTV applications. 92,109 slices of the FPGA are occupied and Table 2 shows the FPGA utilization summary. Table 3 presents the results comparison of the proposed implementation and that in Ref. [9] and [10]. It has to be noted that the 92K slices occupied includes the DDR3 MIG and the H264-to-DDR3 module. Furthermore, the proposed design is implemented with SMIC 65nm CMOS technology and the core size is about 3.24 mm2 with a clock constraint of 350MHz. Fig.10 shows the layout of the ASIC design.

350

T. Wang et al.

(a)

(b) Fig. 7. (a) Simulation waveform with Xilinx ISE; (b) Debug waveform with Xilinx ChipScope Table 2. FPGA utilization summary of H.264 encoder implementation Slice Logic Utilization Slice Registers Slice LUTs Occupied Slices RAMB36E1/FIFO36E1s DSP48E1s Bonded IOBs xc6vlx240t (Used/Available) 77,646/ 301,440 92,109/ 150,720 33,718/37680 92/416 28/768 183/600 Utilization 25% 61% 89% 22% 3% 30%

Table 3. Specification and performance comparison with other work Items Spec Ref.[9] Baseline profile, Level 3.0; 1024 x 768@30fps Platform & Xilinx Virtex II with performan SDRAM; 12K slice size; ce 50MHz. ASIC UMC0.18um, 3.88mm2, 100MHz Design Ref.[10] Proposed Baseline profile, Level Baseline profile, Level 3.1; 1280 x 720@30fps 3.2; 1280 x 720@60fps Xilinx Virtex6 with DDR3 SDRAM; 92K slice size; 200MHz. UMC0.18um, SMIC65nm, 3.24mm2, 31.7mm2, 108MHz 350MHz

FPGA Implementation and Verification System

351

The encoded bit-stream from the real-time coding is decoded with a standard H.264 decoder and the comparison of original frame and the decoded frame is demonstrated in Fig. 8. Fig. 9 presents the PSNR of the Y, U and V under 3.7Mbps bitrates with a QP of 32. The average PSNR of the first 30 frames is about 34 db.

Fig. 8. Comparison of original frame with decoded frame after the H.264 encoded

Fig. 9. PSNR of the first 30 frames within the testing case

Fig. 10. The layout of ASIC implementation of the proposed encoder (with pads)

Conclusion

In this paper, a verification system for H.264/AVC encoders with FPGA prototyping is proposed and implemented based on the Dini DN-DualV6-PCIe-4 platform. An H.264 encoder of baseline profile at Level 3.2 with a DMAC interface that can interact with DDR3 SDRAM was carried out with a clock frequency of 200MHz on a Xilinx Virtex-6 FPGA, which could satisfy real-time encoding for HDTV applications

352

T. Wang et al.

(720P@60fps) with a PSNR around 34 db. The encoder was finally implemented with SMIC 65nm CMOS technology for silicon verification with a core size of 3.24 mm2 and working frequency of 350 MHz. Acknowledgment. This work is supported by National High Technology Research and Development Program ("863" Program, Grant No.2009AA01Z127) of China and by Shenzhen Science & Technology Program (Grant No.JC201005270278A) .

References
1. Wang, Y., Wang, Y.: Chinas IC Industry Development - from the country of consumption to the power industry, p. 241. Science Press, Beijing (2008) 2. Xilinx website, http://www.xilinx.com 3. Altera website, http://www.altera.com 4. Huang, W., Wang, X., et al.: Implementation of high-speed verification platform based on emulator for reDSP & reMAP. In: IEEE 8th International Conference on AISC, pp. 682 685 (2009) 5. Wiegand, T., Sullivan, G.J., et al.: Overview of the H264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560576 (2003) 6. Puri, A., Chen, X., et al.: Video Coding Using the H.264/MPEG-4 AVC Compression Standard. Signal Processing: Image Communication 19, 793849 (2004) 7. DN-DualV6-PCIe-4 User Manual, http://www.dinigroup.com/new/ DN-DualV6-PCIe-4.php 8. Xilinx Virtex-6 FPGA Memory Interface Solutions User Guide V3.91, http://www.xilinx.com/support/documentation/ip_documentation/ mig/v3_91/ug406.pdf 9. Babionitakis, K., Lentaris, G., et al.: An Efficient H.264 VLSI Advanced Video Encoder. In: 13th IEEE International Conference on Electronics, Circuits and Systems, pp. 545548 (2006) 10. Chen, T.C., Chien, S.Y., et al.: Analysis and architecture design of an HDTV720p 30frames/s H. 264/AVC encoder. IEEE Trans. Circuits Syst. Video Technol. 16(6), 673 688 (2006)

Anda mungkin juga menyukai