Design and Implementation of Turbo Decoder for Advanced T-DMB System

Hyuk Kim, Jinkyu Kim, Juehyun Lee, Duckwhan Kim and Bon-tae Koo
Electronic and Telecommunications Research Institute
138 Gajeongno, Yuseong-gu, Daejeon, KOREA
haggy@etri.re.kr

Abstract: Iterative decoders such as turbo decoders have become integral components of modern broadband communication systems such as Advanced Terrestrial DMB (AT-DMB) system because of their ability to provide substantial coding gains. This paper presents a design and FPGA implementation of turbo decoder for AT-DMB system which enables high quality mobile multimedia broadcasting services that exceeds current DMB service's quality and contents capability. The turbo decoder uses the max-log-map algorithm with simple min function. It features an interleaver address generator that computes the interleaved addresses for four block sizes enabling it to quickly switch context to support different data services. The decoder was implemented on a Xilinx Virtex-4 XC4VLX200 device with other functional blocks.

1. Introduction

The turbo code introduced in 1993 is one of the most powerful forward error correction channel codes, and provides near optimal bit-error rates[1]. The expense of superior performance of turbo codes is intensive computational complexity specifically in the decoding process. There exist two main algorithms for turbo decoding: MAP algorithm [2] and soft output Viterbi algorithm (SOVA) [3]. Recently, nonbinary turbo codes have received a great attention and adopted in several mobile radio systems such as DVB-RCS, 802.16 standard (WiMAX)[4] and AT-DMB system, as they can offer many advantages over the classical single-binary turbo codes. To avoid spectrum waste caused by the tail bits, the circular coding technique called tail-biting is also employed in the turbo codes.

Digital Multimedia Broadcasting (DMB) is the next generation broadcasting service which enables various digital multimedia contents, i.e., audio and video, and data access for mobile users. However, due to the bandwidth limitation, the spatial resolution is limited to CIF (Common Interleaved Frame). The AT-DMB secures additional bandwidth by adopting hierarchical modulation transmission technology and provides high data rate and quality for mobile multimedia broadcasting services with scalable video coding (SVC) and MPEG surround audio technology.

This paper is organized as follows, next section present an overview of the AT-DMB system. Section 3 describes the turbo code adapted to the AT-DMB system. Section 4 describes the design and FPGA implementation results of the turbo decoder. Finally, conclusions are described in section 5.

2. Advanced Terrestrial DMB System

AT-DMB system can be backward compatible with the conventional T-DMB system and provide high-rate and high-quality services. For this, it applies two key techniques: Hierarchical modulation and scalable video coding. Hierarchical modulation can be backward compatible with conventional T-DMB system and improve the effective data rate. Conventional T-DMB system adopts the \( \pi /4\)-DQPSK(Differential Quadrature Phase Shift Keying) modulation scheme and it can be expanded to 16-QAM (Quadrature Amplitude Modulation) for backward compatibility and high data rate. The service concept of AT-DMB is depicted in Figure 1.

![Figure 1. Service concept of AT-DMB system](image)

HP(High Priority) channel in the figure 1 is the conventional T-DMB channel modulated by \( \pi /4\)-DQPSK method and LP(Low Priority) channel is an extra channel given by the hierarchical modulation scheme. Newly added LP channel by the hierarchical modulation scheme can be used for more service channels and high-quality AV service with backward compatibility can be realized by adopting scalable AV coding technique. The enhancement information for the scalable encoding are frame rate, resolutions and bit-rate.

Base layer stream and enhancement layer stream of the scalable AV encoder are fed into HP and LP channel respectively and modulated hierarchically. In this case, the conventional T-DMB receiver decodes the AV stream of base layer and can provide AV service of standard quality. But AT-DMB receiver decodes base layer and enhancement layer stream simultaneously and can provide the high-quality AV services. Figure 2 shows the AT-DMB system with hierarchical modulation.

![Figure 2. AT-DMB system with hierarchical modulation](image)
3. Turbo Code for AT-DMB

Channel coding process of AT-DMB enhancement layer uses the turbo code instead of convolutional code. This code supports the equal error protection level only. The architecture of turbo encoder, including its constituent encoder, is depicted in Figure 3. It uses a double binary circular recursive systematic convolutional code.

The bits of the data to be encoded are alternatively fed into \( x_e \) and \( x_r \), starting with the MSB of the first data fed to \( x_e \). The encoder is fed by blocks of \( N \) couples. \( N \) is decided according to the bit rate and limited to \( N=384 \cdot l \) where \( l=1,2,3,4 \).

Figure 3. Turbo code for AT-DMB system

The generator polynomial is defined as follows:
- Feedback branch: \( 1+D+D^3 \)
- Parity Bit \( p1: 1+D^2+D^3 \)
- Parity Bit \( p2: 1+D^2 \)
- Parity Bit \( p3: 1+D+D^2+D^3 \)

The state of the encoder is denoted by \( S \) (\( 0 \leq S \leq 7 \)) with \( S \) the value read binary (left to right) out of the constituent encoder memory. The circulation states \( S_{c1} \) and \( S_{c2} \) are determined by the following operations:
1) Initialize the encoder with zero state. Encode the sequence in the natural order for the determination of \( S_{c1} \) or in the interleaved order for determination of \( S_{c2} \). In both cases the final state of the encoder is \( S_{0N} \);
2) According to the length \( N \) of the sequence, use Table 1 to find \( S_{c1} \) or \( S_{c2} \).

Table 1. Circulation state \( S_{c} \)

<table>
<thead>
<tr>
<th>( N_{mod7} )</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>6</td>
<td>4</td>
<td>2</td>
<td>7</td>
<td>1</td>
<td>3</td>
<td>5</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>3</td>
<td>7</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>5</td>
<td>3</td>
<td>6</td>
<td>2</td>
<td>7</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>4</td>
<td>1</td>
<td>5</td>
<td>6</td>
<td>2</td>
<td>7</td>
<td>3</td>
</tr>
<tr>
<td>5</td>
<td>0</td>
<td>2</td>
<td>5</td>
<td>7</td>
<td>1</td>
<td>3</td>
<td>4</td>
<td>6</td>
</tr>
<tr>
<td>6</td>
<td>0</td>
<td>7</td>
<td>6</td>
<td>1</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>2</td>
</tr>
</tbody>
</table>

The encoding process is performed as follows:

First, the constituent encoder (after initialization by the circulation state \( S_{c1} \)) is fed the sequence in the natural order (position 1) with the incremental address \( i=0\sim N-1 \). It generates the first triple binary parity vector. This first encoding is called \( C_1 \) encoding. Then the constituent encoder (after initialization by the circulation state \( S_{c2} \)) is fed by the interleaved sequence (switch in position 2) with incremental address \( j=0\sim N-1 \). It generates the second triple binary parity vector. This second encoding is called \( C_2 \) encoding.

The two level interleaving process is performed as follows:

Level 1:

if \( j \) mod 2 = 0, let \( (x_r,x_e) = (x_{pr},x_{pe}) \) : switch the couple

Level 2:

case \( j \) mod 4

when 0 : \( P = 0 \)
when 1 : \( P = N/2 + P1 \)
when 2 : \( P = P2 \)
when 3 : \( P = N/2 + P3 \)

\( i = (P0 \times j + P + 1) \) mod \( N \)

It is well described in [4]. Table 2 and 3 gives the interleaver parameters \( P0, P1, P2, P3 \) and the puncturing patterns for the parity vectors, respectively.

Table 2. Turbo code permutation parameters

<table>
<thead>
<tr>
<th>( l )</th>
<th>( N )</th>
<th>bytes</th>
<th>( P0 )</th>
<th>( P1 )</th>
<th>( P2 )</th>
<th>( P3 )</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>384</td>
<td>96</td>
<td>13</td>
<td>94</td>
<td>192</td>
<td>286</td>
</tr>
<tr>
<td>2</td>
<td>768</td>
<td>192</td>
<td>19</td>
<td>12</td>
<td>332</td>
<td>376</td>
</tr>
<tr>
<td>3</td>
<td>1152</td>
<td>288</td>
<td>23</td>
<td>23</td>
<td>388</td>
<td>16</td>
</tr>
<tr>
<td>4</td>
<td>1536</td>
<td>384</td>
<td>31</td>
<td>31</td>
<td>384</td>
<td>20</td>
</tr>
</tbody>
</table>

Table 3. Puncturing patterns

<table>
<thead>
<tr>
<th>Coderate</th>
<th>( V_{p1} )</th>
<th>( V_{p2} )</th>
<th>( V_{p3} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>( R=1/2 )</td>
<td>1111 1111</td>
<td>0000 0000</td>
<td>0000 0000</td>
</tr>
<tr>
<td>( R=2/5 )</td>
<td>1111 1111</td>
<td>1010 1010</td>
<td>0000 0000</td>
</tr>
<tr>
<td>( R=1/3 )</td>
<td>1111 1111</td>
<td>1111 1111</td>
<td>0000 0000</td>
</tr>
<tr>
<td>( R=1/4 )</td>
<td>1111 1111</td>
<td>1111 1111</td>
<td>1111 1111</td>
</tr>
</tbody>
</table>

4. Design and FPGA Implementation Results

Specification of the turbo decoder is decided as follows:

- Algorithm : Max-Log-Map
- Supported code rate \( R=1/2, 2/5, 1/3, 1/4 \)
- Data Rate : 1.5 Mbps
- Soft Input Symbol : 6 or 8-bits
- BMV/SMV/LLR/EXT = 9/9/9/9-bits
- Iteration : max 8
- Sliding Window : 256
- Overlapped Window : 32
- Scaling factor of extrinsic information: 0.75
- Operation Frequency : 24.576 MHz

In the conventional turbo decoder, since we have known that the trellis starts at zero state and ends at zero state, the
initial conditions for forward metric and backward metric of the MAP decoding procedure can be easily defined. But for tail-biting turbo code, we only know the final trellis state is equal to the initial trellis state, but do not know exactly which state is the circular state used by the encoder. So the initial values of the forward state metric and backward state metric are not explicitly specified.

In this paper, the final metric values of the previous iteration are used to determine the initial values. So, the initial forward and backward state metrics of a frame are determined as follows if the frame size is in pairs $N$.

$$
\alpha_s(s) = \begin{cases} 
0, & \text{for the 1st iteration} \\
\alpha_s(s), & \text{otherwise}
\end{cases}
$$

$$
\beta_s(s) = \begin{cases} 
0, & \text{for the 1st iteration} \\
\beta_s(s), & \text{otherwise}
\end{cases}
$$

In Figure 4, the BER performance of turbo decoding with 8 iterations, over AWGN channel with the two level interleaving process having a length of $N=192$ bits, code rate $R=1/2, 1/3, 1/4$ and BPSK modulation, is presented.

![Figure 4. BER performance of turbo decoding](image)

In turbo codes, the interleaver is involved in both encoding and decoding. The most straightforward way to implement the address interleaving is to store interleaved addresses in a memory. This large-sized memory leads to significant area occupation and power consumption.

Figure 5 depicts the calculation of the interleaved or de-interleaved addresses on-the-fly for the decoder, where $P0, P1, P2$ and $P3$ are determined according to the frame length $N$.

The forward direction address generator calculates the intermediate values for the first address of the extrinsic information for the backward state metric calculation in the next sliding window while the forward state metric calculations are advanced in the current sliding window. Then it passes the intermediate values to the forward and backward direction address generator which makes the real interleaved or de-interleaved addresses. Because the permutated addresses are computed on the fly, the decoder can switch block sizes quickly. Selection signal controls the input and output symbol switching.

![Figure 5. Architecture of the permutation address generator](image)

The decoder is currently implemented on a Xilinx Virtex-4 XC4VLX200 device with other functional blocks. Figure 7 depicts the test board for AT-DMB receiver system.

![Figure 7. Test board for AT-DMB receiver system](image)

Figure 8 shows the comparison of the video quality between the conventional T-DMB system and AT-DMB system. The video quality of AT-DMB is better than that of conventional T-DMB. Because the bit-rate of conventional
T-DMB is lower than AT-DMB, T-DMB has more blocking effect than AT-DMB when it displayed in full screen.

(a) T-DMB                               (b) AT-DMB

Figure 8. Video quality comparison between (a) the conventional T-DMB and (b) AT-DMB

5. Conclusions

In this paper, we presented a design and FPGA implementation of turbo decoder for AT-DMB system. The turbo decoder uses the max-log-map algorithm with an interleaver address generator that computes the interleaved addresses for four block sizes enabling it to quickly switch context to support different data services. Performance of the decoder according to code rates was presented. The decoder was implemented on a Xilinx Virtex-4 XC4VLX200 device with other functional blocks and integrated to AT-DMB system.

References