Best Paper Award
A High Performance HEVC De-Blocking Filter and SAO Architecture for UHDTV Decoder
Jiayi Zhu, Dajiang Zhou, Satoshi Goto
[Trans. Fundamentals., Vol. E96-A No.12, Dec. 2013]

Jiayi Zhu

Dajiang Zhou

Satoshi Goto
 
  HEVC (High Efficiency Video Coding) is the latest video compression standard published in April 2013. Compared with the conventional MPEG2 and H.264, while maintaining the image quality, approximately four times and double compression rates are achieved, respectively. HEVC is expected to have wide applications in 4K and 8K TV broadcasting, internet, and video devices in the future. The high compression rate of HEVC results in some challenges such as more complex algorithms, a greater amount of computation and larger hardware circuits. The ILF (In-Loop Filter), a new feature in HEVC, consists of not only the traditional DBF (De-Blocking Filter) function but also the novel SAO (Sample Adaptive Offset) function. As a result, its algorithm becomes more complex and hence, an important issue is to achieve high performance and low cost hardware for ILF.
  In this paper, a new LSI architecture for HEVC ILF is proposed to solve the issue. In this architecture, SAO and DBF are pipelined based on an 8x8 block. That is to say, when the DBF is processing the current 8x8 block, the previous 8x8 block is processed by the SAO. Each 8x8 block contains four edges for the DBF and four 4x4 blocks for the SAO. The combinational logic engines for both DBF and SAO allows one cycle to process one edge in DBF and one 4x4 block in SAO. Thus it takes four cycles for DBF and SAO to finish one 8x8 block. The 8x8 block from DBF to SAO is one sample shifted up and to the left, which satisfies the input pattern requirement of SAO and eases the coupling of DBF and SAO. In addition, luma and chroma samples of each 4x4 block are organized in the same memory storage unit and they are processed simultaneously to raise parallelism. The proposed ILF architecture for 8Kx4K video decoding in this paper can be synthesized to 240MHz under 65nm technology. The cost of the circuit is 31.0K gates for DBF and 36.7K gates for SAO. In each cycle, 16 pixels can be processed. Hence, this solution can process up to 3.84G pixels/s with the maximum synthesizable frequency, and UHDTV 4320p (7680x4320) 60 fps video can be decoded by the proposed solution with only 124.4MHz.

Close