Test Framework for Reducing Power in NoC

Seok-hee Yi¹, Byung-Gyu Ahn², Jong-Wha Chong³
Department of ECE, HANYANG University
702ho R&D Building, 17 Haengdang-Dong, Seongdong-Gu, Seoul, 133-791, Korea
E-mail: ¹sayonara1013@hanmail.net, ²mon809@chor.com, ³jchong@hanyang.ac.kr

Abstract:
In this paper, we propose the test framework for reducing power in Network-on-Chip (NoC). First, the possibility of using embedded processor and on-chip network are introduced and evaluated with benchmark system to test the other embedded cores. Second, a new generation method of test pattern, which is called ‘don’t care mapping’, is presented to reduce the power consumption of on-chip network. The experimental results show that the power consumption is reduced up to 8% at the communication components.

1. Introduction
Modern SoC (System-on-Chip) typically contain dozens of IP (Intellectual Property) cores. Especially, such systems which must be also reusable to meet time-to-market requirements require communication templates with several dozens of Gbits/s of bandwidth. To provide reusability, scalability, high bandwidth and low latency, the Network-on-Chip (NoC) design paradigm has recently proposed as an alternative to traditional broadcast and shared-bus architectures for core-based systems [1]. A study presented in [2] shows that NoCs have better communication performance than conventional bus architectures. Therefore, it is clear that NoCs can potentially become the preferred interconnection approach for SoCs being developed in a near future [3].

However, the growing design complexity of chips, device size miniaturization, increasing transistor count, and high clock frequencies have led to a dramatic increase in the number of possible fault sites and fault types [4]. Consequently, a high test data volume is needed for high-quality testing. However, the high test data volume leads to the long testing time and large memory size.

In addition, the design of test access mechanism (TAM) is one of main issues to testing dozens of embedded cores in the NoC. However, the NoC testing implies much power consumption, because the embedded cores in NoC can be tested in parallel, and just then the cores, routers and channels are activate at the same time. Therefore, a new method for the low-power consumption during the NoC testing is necessary to ensure the safeness of the test.

This paper proposes the power-aware test framework for NoC, which is based on embedded processor and the on-chip network. First, the possibility of using embedded processor and on-chip network are introduced and evaluated with benchmark system to test the other cores. Second, a new generation method of test pattern, which is called ‘don’t care mapping’, is presented to reduce the power consumption of on-chip network. The experimental results show that the embedded processor can be executed like the automatic test equipment (ATE), and that the power consumption is reduced up to 8% at the communication components.

2. Related works and motivations
With the introduction of NoC, valuable works have been proposed for embedded core testing based on this new architecture. In Ref. [5], a TAM architecture based on a packet switching communication is presented. The proposed model, called NIMA, is defined to allow modularity, generality, and configurability for the test architecture. In Ref. [6], the authors extended the results of a previous on-chip network research [3] to a test scheduling algorithm with power constraints considered. A new test data transportation method using multiple data flit formats is proposed in [7]. In this method, a data flit can contain multiples bits for each wrapper scan chain, instead of only 1 bit/chain in the traditional test application methods.

However, these approaches mainly considered how to support the embedded core testing using the on-chip network. One major problem of these literatures is that the external ATE with low-bandwidth cannot easily control the test procedure in both the cores and the on-chip network. Recently, in [8-11], the authors proposed a test platform for embedded processor based SoC. In these cases, the embedded processor is employed as a controllable ATE to execute the test programs for all the other embedded cores through system bus in SoC. In this paper, we extend this method to a NoC with communication protocols and functional wrappers already well-defined.

Recently, the increasing dominance of the power consumption of these on-chip networks in the systems of today poses critical challenges that need to be address lest they become a bottleneck in the development of high performance systems [12]. The power consumption in the routers and the links of the Alpha 21364 microprocessor were found to be about 20% of the total power consumption [13]. In the MIT Raw on-chip network, the network components constitute 36% of the total power consumption [14]. In [15], the authors explained that these numbers indicate the significance of managing the interconnect power consumption.

Nevertheless, present most literatures [3, 6 and 16] have focused on the test scheduling problem to minimize test application time under power constraints. In this paper, therefore, we propose a new approach to reduce the power consumption of on-chip network during the embedded core testing.

3. The proposed test framework
3.1 NoC test framework based on embedded processor
The overview of proposed framework is illustrated in Fig. 1. There are an embedded processor, an embedded memory and a system bus to execute the main program in the conventional SoC platform. It is very likely that there is a main PE (processing element) to manage overall system in the NoC architecture. We expand the concept to the on-chip communication network with reference to the literatures [10, 11]. The proposed framework can be grouped into hardware and software parts.

![Figure 1. Conceptual overview of the proposed framework based on embedded processor](image)

The software includes test data and test program, because our proposed methodology is basically a C-based testing. Therefore, we need a test program which is described such as C or C++. After compilation of the test program with test data, the binary image can be loaded into the external memory for execution. And, the embedded processor should execute the test program and transfer the test pattern to the other embedded cores through network interface.

The hardware is composed of a scan-based core, network interface and TAM controller. Since all packets including functional and testing data are sent and received through the on-chip communication, the test packets are also transferred to core through router and network interface. Only when the status register indicates testing mode, the TAM controller is activated. To make this architecture effectively, we assumed the scan-based core with IEEE Std. 1500 wrapper [17]. It enables us to minimize the hardware overheads and the design complexity.

3.2 Generation of power-aware test packets

The total test flow is decided by software in the proposed framework, but the test packet can be generated beforehand. Generally, since the data packet consists of header, payload, and trailer, the test data will be placed in the payload. And the packet will be transferred to flit by flit according to the real channel length. In this paper, we generate the test packet such as Fig. 2 to minimize the power consumption of on-chip network. In general, in the test pattern for the full scan-based core, there are many ‘don’t care bits’ which have no influence on testing. We mapped these bits to 0 or 1 for minimizing the hamming distance between flit and flit. It will decrease the number of switching at the communication components significantly.

![Figure 2. Generation flow of test packet](image)

This approach is very useful, because the dynamic power consumption is induced by the flit traverses, which is the primary source of power consumed in the on-chip network. And, it also can be extended to the other approaches without any modification. The detailed mapping algorithm is described in Fig. 3.

![Algorithm: Low-Power Don’t Care Mapping for Test Packet](image)

4. Case study

4.1 Benchmark System

In this paper we do not use the ITC 2002 SOC test benchmarks [18] for evaluating our methodology, because these benchmark circuits are not suitable for the estimation of the total power consumption of communication components. As shown in Fig. 4, we assume a new
benchmark system, which consists of s5378, s9234, s13207, s15850, s38417, s38584 of ISCAS 89 benchmark circuits.

The embedded processor and the 7 nodes in the above are organized as a 3x3 mesh network and each router has five physical bidirectional ports (north, south, east, west, and injection/ejection). In this sample system, each core is assumed as a full-scan circuit using the IEEE Std. 1500 wrapper. As mentioned above, the on-chip network and the wrapper will be activated such as the TAM.

4. 2 Experimental Environments

First of all, we generated test patterns for each core by using MinTest ATPG (Automatic Test Pattern Generation) program. This program is based on the dynamic compaction method [19] and the patterns have 100% fault coverage. We applied the proposed method to the test patterns for generating the test packets. The proposed test packet generation program is developed using C/C++.

To evaluate the performance and power consumption of the proposed test framework, we modified Orion [20] and PoPNet [21]. We assume an on-chip network with 250MHz frequency, Vdd = 1.8V, in 0.18µm process technology as in [20]. We also set the router architecture which has 2 VCs (virtual channel) per port, 12 flit input/output buffers per VC, and 1000µm length in each link. The dimension routing algorithm is selected for the network topology, and the propagation delay across data and credit channels is assumed to take a single cycle.

4.3 Experimental Results

We run the simulation program at the AMD AthronTM 64 X2 3800+ processor with 1024MB main memory. And then, almost of the simulations was terminated in several minutes.

The table 1 shows the effectiveness of our methodology when the flit size is 16 bits. At each pattern, the decrease for the number of switching is average 36 percent compared with 0 mapping. In the tables, ‘X’ means the proposed mapping method.

<table>
<thead>
<tr>
<th>Benchmark</th>
<th>Size of pattern (bits)</th>
<th>Number of switching 0</th>
<th>X</th>
<th>Reduction Rate (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>s5378</td>
<td>23744</td>
<td>5546</td>
<td>3240</td>
<td>41.58</td>
</tr>
<tr>
<td>s9234</td>
<td>39264</td>
<td>7928</td>
<td>5247</td>
<td>33.82</td>
</tr>
<tr>
<td>s13207</td>
<td>165200</td>
<td>8321</td>
<td>5433</td>
<td>34.71</td>
</tr>
<tr>
<td>s15850</td>
<td>76976</td>
<td>8370</td>
<td>5662</td>
<td>32.35</td>
</tr>
<tr>
<td>s38417</td>
<td>164736</td>
<td>39326</td>
<td>24631</td>
<td>37.37</td>
</tr>
<tr>
<td>s38584</td>
<td>199104</td>
<td>25186</td>
<td>16252</td>
<td>35.47</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
<td></td>
<td>35.88</td>
</tr>
</tbody>
</table>

Table 1. Reduction rate of the switching number

We ran the cycle-accurate simulator using 0, 1 and the proposed don’t care mapping, respectively, when the flit size is 16, 32 and 64 bits. In each case, we assume that the test packet injection rate of the embedded processor is 0.2 packets per cycle. In Fig. 5, the performance is improved proportionately according to the increasing of channel width. Nevertheless, this performance improvement is not achieved at the expense of higher power consumption, as indicated by the table 2.

The table 2 shows the simulation results of average total power consumption according to the mapping methods. We learned that there was little differentiation between 0 and 1 mapping, but the proposed mapping algorithm showed about 2% - 8% better average power consumption than 0 or 1 mapping.

<table>
<thead>
<tr>
<th>Flit size (bits)</th>
<th>Mapping Method (mW)</th>
<th>Reduction Rate (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>16</td>
<td>226.16</td>
<td>229.13</td>
</tr>
<tr>
<td>32</td>
<td>254.83</td>
<td>258.40</td>
</tr>
<tr>
<td>64</td>
<td>305.80</td>
<td>308.85</td>
</tr>
<tr>
<td>Average</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 2. Reduction rate of the average power consumption

5. Conclusion

In this paper, the power-aware test framework for NoC based on embedded processor and on-chip network was proposed. The experimental results showed that the
embedded processor could be executed like the low-cost ATE with high-bandwidth, and the power consumption could be reduced at the communication components. The main contribution of this paper is the manageable testing through embedded software. And the generation method can be used in the other TAM architectures based on packet-switching network without any modification.

However, our current works excluded the detailed hardware architecture of test framework and a test scheduling algorithm of embedded cores under power constraints. In the near future, we will develop more accurate simulation environment to estimate the power consumption of the embedded cores and on-chip network at the same time.

6. Acknowledgment

This research was sponsored by Seoul R&BD Program and ETRI SoC Industry Promotion Center, Human Resource Development Project for IT-SoC Architect. This research was also supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) (IITA-2008-C1090-0801-0019). The IDEC provide research facilities for this study.

References