Low Power and Area Consumption Custom Networks-On-Chip Architectures Using RST Algorithms
IJCSIS is an open access publishing venue for research in general computer science and information security. Target Audience: IT academics, university IT faculties; industry IT departments; government departments; the mobile industry and computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; computer science, computer applications, multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. The average paper acceptance rate for IJCSIS issues is kept at 25-30% with an aim to provide selective research work of quality in the areas of computer science and engineering. Thanks for your contributions in September 2010 issue and we are grateful to the experienced team of reviewers for providing valuable comments.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 Low Power and Area Consumption Custom Networks-On-Chip Architectures Using RST Algorithms 1 P.Ezhumali 2Dr.C.Arun 1 Professor, Dept of Computer Science Engineering 2 Asst. Professor, Dept of Electronics and Communication Ralalakshmi Engineering College, Thandalam-602 105, Chennai, India 1 firstname.lastname@example.org, email@example.com Abstract: Network-on-Chip (NoC) paradigm for communications within large architectures with optimized topologies have VLSI systems implemented on a single silicon been shown to be superior to regular chip. The layered-stack approach to the design architectures (such as mesh) for application of the on-chip intercore communications is the specific multiprocessor System-on-Chip Network-on-Chip (NOC) methodology. In a (MPSoC) devices. The application specific NoC NoC system, modules such as processor cores, design problem takes, as input the system-level memories and specialized IP blocks exchange floorplan of the computation architecture .The data using a network as a "public objective is to generate an area and power transportation" sub-system for the information optimized NoC topology. In this work, we traffic. A NoC is constructed from multiple consider the problem of synthesizing custom point-to-point data links interconnected by networks-on-chip (NoC) architectures that are switches (a.k.a. routers), such that messages optimized. Both the physical links and routers can be relayed from any source module to any determine the power consumption of the NoC destination module over several links, by architecture. Our problem formulation is based making routing decisions at the switches. on the decomposition of the problem into the inter-related steps of finding good flow A NoC is similar to a modern partitions, and providing an optimized network telecommunications network, using digital bit- implementation for the derived topologies. We packet switching over multiplexed links. used Rectilinear–Steiner-Tree (RST)-based Although packet switching is sometimes algorithms for generating efficient and claimed as necessity for a NoC, there are several optimized network topologies. Experimental NoC proposals utilizing circuit-switching results on a variety of NoC benchmarks showed techniques. This definition based on routers is that our synthesis results were achieve reduction usually interpreted so that a single shared bus, a in power consumption and average hop count single crossbar switch or a point-to-point over different mesh implementations. We network is not NoCs but practically all other analyze the quality of the results and solution topologies are. This is somewhat confusing times of the proposed techniques by extensive since all above-mentioned are networks (they experimentation with realistic benchmarks and enable communication between two or more comparisons with regular mesh-based NoC devices) but they are not considered as network- architectures. on-chips. Note that some erroneously use NoC as a synonym for mesh topology although NoC Index Terms—Multicast routing, network-on- paradigm does not dictate the topology. chip (NoC), synthesis, system-on-chip (SoC), Likewise, the regularity of topology is topology. sometimes considered as a requirement, which is, obviously, not the case in research 1.Introduction concentrating on "application-specific NoC topology synthesis". Network-on-Chip (NoC) is an emerging 107 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 the complexity of designing wires for predictable speed, power, noise, reliability, etc., because of their regular, well-controlled structure. From a system design viewpoint, with the advent of multi-core processor systems, a network is a natural architectural choice. A NoC can provide separation between computation and communication; support modularity and IP reuse via standard interfaces, handle synchronization issues, serve as a platform for system test, and, hence, increase engineering productivity. Although NoCs can borrow concepts and techniques from the well-established domain of computer networking, it is impractical to figure.1 Topological illustration of a blindly reuse features of "classical" computer 4-by-4 grid structured NoC. networks and symmetric multiprocessors. In particular, NoC switches should be small, The wires in the links of the NoC are shared energy-efficient, and fast. Neglecting these by many signals. A high level of parallelism aspects along with proper, quantitative is achieved, because all links in the NoC can comparison was typical for early NoC operate simultaneously on different data research but nowadays they are considered in packets. Therefore, as the complexity of more detail. The routing algorithms should integrated systems keeps growing, a NoC be implemented by simple logic, and the provides enhanced performance (such as number of data buffers should be minimal. throughput) and scalability in comparison Network topology and properties may be with previous communication architectures application-specific. Research on NoC is now (e.g., dedicated point-to-point signal wires, expanding very rapidly, and there are several shared buses, or segmented buses with companies and universities that are involved. bridges). Of course, the algorithms must be Figure 1 shows how a NoC, in comparison designed in such a way that they offer large with shared buses, could be occupied with parallelism and can hence utilize the various components as resources potential of NoC. Traditionally, ICs have been designed with 2.EXISTING RELATED WORKS dedicated point-to-point connections, with one wire dedicated to each signal. For large So far, the communication problems faced designs, in particular, this has several by System on chip were tackled by making use limitations from a physical design viewpoint. of regular Network on chip architectures. The The wires occupy much of the area of the chip, following are the list of popular regular NoC and in nanometer CMOS technology, architectures: interconnects dominate both performance and dynamic power dissipation, as signal Mesh Architecture. propagation in wires across the chip requires Torus Architecture. multiple clock cycles. NoC links can reduce Butterfly Fat Tree Architecture. 108 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 Extended Butterfly Fat Tree Architecture to design high performance SoCs. While these papers mostly focus on the concept of regular The NoC design problem has received NoC architecture (discussing the overall considerable attention in the literature. Towles advantages and challenges), to the best of our and Dally  and Benini and De Micheli  knowledge, our work is better than previous motivated the NoC paradigm. Several existing custom NoC synthesis formulations and NoC solutions have addressed the mapping efficient way to solve it. problem to a regular mesh-based NoC architecture , . Hu and Marculescu  proposed a branch-and-bound algorithm for PROPOSED SYSTEM the mapping of computation cores on to mesh- based NoC architectures. Murali et al.  3.1 PROBLEM DEFINITION described a fast algorithm for mesh-based NoC architectures that considers different routing • We consider the problem of synthesizing functions, delay constraints, and bandwidth custom networks-on-chip (NoC) requirements. On the problem of designing architectures that are optimized for a custom NoC architectures without assuming given application. existing network architecture, a number of • We divide the problem statement into techniques have been proposed –. Pinto the flowing interrelated steps: et al.  presented techniques for the constraint-driven communication architecture Physical topology Construction. synthesis of point-to-point links by using Power and Area Comparisons heuristic-based -way merging. Their technique is limited to topologies with specific structures 3.2 SYSTEM ARCHITECTURE that have only two routers between each source and sink pair. Ogras et al. ,  proposed graph decomposition and long link insertion techniques for application-specific NoC architectures. Srinivasan et al. ,  presented NoC synthesis algorithms that consider system-level floor planning, but their solutions only considered solutions based on a slicing floorplan where router locations are restricted to corners of cores and links run around cores. Murali et al.  presented an Figure. 2 Proposed System Architecture innovative deadlock-free NoC synthesis flow with detailed backend integration that also Our NoC synthesis design flow is depicted in considers the floorplanning process. The Figure 2. The major elements in the design proposed approach is based on the min-cut flow are elaborated as follows. partitioning of cores to routers. This work presents a synthesis approach based on a set Input Specification: The input specification partitioning formulation that considers to our design flow consists of a list of multicast traffic, Although different in modules. As observed in recent trends, many topology and some other aspects, all the above modern SoC designs combine both hard and papers essentially advocate the advantages of soft modules as well as both packet-based using NoCs and regularity as effective means network communications and conventional 109 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 wiring. Modules can correspond to a variety that is optimized for the given specification. of different types of intellectual property (IP) Consider the above diagram that depicts a cores such as embedded microprocessors, small illustrative example. It only shows the large embedded memories, digital signal portion of the input specification that processors, graphics and multimedia corresponds to the network-attached modules processors, and security encryption engines, and their traffic flows. The nodes represent as well as custom hardware modules. These modules, edges represent traffic flows, and modules can come in a variety of sizes and edge labels represent the length of the two can be either hard or soft macros, possibly as vertices. The NoC Synthesis generates just black boxes with area and power topologies based on the communication estimates and constraints on aspect ratios. To demand graph and comparing with facilitate modularity and interoperability of parameters like power consumption and area IP cores, packet-based communication with usage chooses the best architecture. Below is standard network interfaces is rapidly gaining an example of two architectures generated adoption. Custom NoC architectures are based on the given CDG. being advocated as a scalable solution to packet-based communication. In general, a mixture of network-based communications and conventional wiring may be utilized as appropriate, and not all inter-module communications are necessarily over the on- chip network. For example, an embedded microprocessor may have dedicated connections to its instruction and data cache modules. Our design flow and input specification allow for both interconnection models. Below is an example of a communication demand graph: Figure 3 Sample Input Specification Figure 4 Sample Topologies Generated NoC Synthesis: Given input specification NoC Power and Area Estimation: To information, the NoC synthesis step then evaluate the power and area of the proceeds to synthesize a NoC architecture 110 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 synthesized NoC architecture, we use a state- shortest edge lengths using horizontal and of the- art NoC power-performance simulator vertical edges such that all nodes are called Orion that can provide detailed power interconnected. The RST problem is well characteristics for different power studied with very fast implementations components of a router for different available. We create an RST solver in the inner input/output port configurations. It accurately loop of flow partitioning to generate topologies considers leakage power as well as dynamic for the set partitions considered. switching power, which is important since it is well known that leakage power is becoming an increasingly dominating. Orion also provides area estimates based on a state- of-the-artrouter microarchitecture. MODULE DESCRIPTION Figure 5 Formulation of Synthesis Problem 4.1 Flow Partitioning Flow partitioning is performed in Figure 6 Flow Partitioning Algorithm the outer loop of our synthesis formulation to explore different partitioning of flows to IMPLEMENTATION RESULTS separate subnetworks. We make use of the following algorithm to implement flow 5.1. EXPERIMENTAL SETUP partitioning: We have implemented our proposed algorithm in C. In our 4.2 STEINER TREE BASED implementation, we have designed a TOPOLOGY CONSTRUCTION Rectilinear Steiner Tree solver to generate For each flow partition considered, physical the physical network topologies in the inner network topologies must be decided. In current loop of the algorithm. Simulator ORION 2.0 process technologies, layout rules for does the power and area estimates. The implementing wires dictate physical topologies Results obtained are shown in a line chart for where the network links run horizontally or mere comparisons. A snapshot of the all the vertically. Thus, the problem is similar to results have been shown later in this chapter. Rectilinear Steiner Tree (RST) problem that has All experimental results were obtained on a been extensively studied for the conventional 3.06-GHz Intel P4 processor machine with VLSI routing problem. Given a set of nodes, the 512 MB of memory running Linux. RST problem is to find a network with the 111 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 5.2. EXPERIMENTAL RESULTS synthesis results is difficult in part because of vast differences in the parameters assumed. To evaluate the effectiveness of our algorithms, we have the full mesh implementation for each benchmark for comparison from previous published papers have been taken. These comparisons are signified to show the benefits of custom NoC architectures. Table 1. NOC Power Comparisons ALL FSTs: 64 Points Figure 7 Snapshot of ALL The FSTs Generated Steiner Minimal Tree: 64 Points, length = 56729 Figure 8 Steiner Minimal Tree Generated Method of Evaluation: In all our experiments, Figure 9 NoC Power Comparisons we aim to evaluate the performance of the proposed algorithms. On all benchmarks with The area results, power results, the execution the objective of minimizing the total area as times, and area as well as power well as power consumption of the synthesized improvements of that algorithm are reported. NoC architectures. The total area as well as The results show the algorithm can power consumption includes all network efficiently synthesize NoC architectures that components. We applied the design parameters minimize power and area consumption as of 1 GHz clock frequency, 4-flit buffers, and compared with regular topologies such as 128-bit flits. For evaluation, fair direct mesh and optimized mesh topologies. comparison with previously published NoC 112 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 Table 2. NoC Area Comparisons 6.CONCLUSION AND FUTURE WORK In this research Works have been carried out in context related to Regular topologies like mesh, torus and etc. This work presented an idea on building customizing network on chip with the better flow partitioning and also considered power and area reduction as compared to the already presented Regular topologies, we proposed a formulation of the custom NoC synthesis problem based on the decomposition of the problem into the inter- related steps of deriving a good physical network topology, and providing an comparison in terms of area and power with N o C A r e a C o m p a ris o n s the well established regular topologies. We 2 .5 0 0 0 used the algorithm called CLUSTER for systematically examining different possible 2 .0 0 0 0 set partitioning of flows, and we proposed 1 .5 0 0 0 the use of RST algorithms for constructing C u s to m Are a good physical network topologies. Our ( s q A mr em a ) o p t. M e s h Are a 1 .0 0 0 0 solution framework enables the decoupling of the evaluation cost function from the 0 .5 0 0 0 exploration process, thereby enabling 0 .0 0 0 0 different user objectives and constraints to be 6 7 8 12 14 20 24 25 36 44 considered. Although we use Steiner trees to V e rtic e s generate a physical network topology for each group in the set partition, the final NoC architecture synthesized is not necessarily Figure 10. NoC Area Estimates limited to just trees as Steiner tree implementations of different groups may be connected to each other to form non-tree Thus, the above two line charts in structures. figure 9 and 10 clearly show a reduction in power and area estimates of custom NoC This work does not differentiate the with mesh and optimized mesh topologies. routers/switches (communication modules) Mesh topologies was explained in chapter 2. with the operating modules present in the Eliminating router ports and links that are not chip. In near future, the work of identifying used forms optimized mesh topologies. The the best placement of routers and minimizing power reduction is at an average of 83.43 the number of routers and also the effectives percent and 50 percent as compared to mesh of the customized Network on Chip in terms and optimized mesh topologies respectively. of other parameters like throughput, latency. The area reduction is at an average of 70.95 Link utilization and buffer utilization can be percent as compared to optimized mesh taken into account. topologies. 113 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010  D. Greenfield, A. Banerjee, J. -G. Lee, REFERENCES and S. Moore, “Implications of rent’s rule for NoC design and its fault-tolerance,” in Proc.  Shan Yan, Bill Lin, “ Custom Networks- NOCS, May 2007, pp. 283–294. on-Chip Architectures With Multicast Routing,” IEEE transactions on very large  S. Yan and B. Lin, “Application-specific scale integration (VLSI) systems, vol. 17, no. network-on-chip architecture synthesis based 3, march 2009. on set partitions and Steiner trees,” in Proc. . ASPDAC, 2008, pp. 277–282.  K. Srinivasan, K. S. Chatha, and G. Konjevod, “Linear-programming based  Xilinx, San Jose, CA, “UMC delivers techniques for synthesis of network-on-chip leading-edge 65 nm FPGAs toXilinx,” Des. architectures,” IEEE Trans. Very Large Scale Reuse, Nov. 8, 2006 [Online]. Available: Integr. (VLSI) Syst., vol. 14, no. 4, pp. 407– http://www.design- 420, Apr. 2006. reuse.com/news/14644/umc-edge-65nm- fpgas-xilinx.html  K. Srinivasan, K. S. Chatha, and G. Konjevod, “Application specific network-on-  P. Gratz, K. Sankaralingam, H. Hanson, chip design with guaranteed quality P. Shivakumar, R.McDonald, S. W. Keckler, approximation algorithms,” in Proc. and D. Burger, “Implementation and ASPDAC, 2007, pp. 184–190. evaluation of a dynamically routed processor operand network,” in Proc. NOCS, May  S. Murali, P. Meloni, F. Angiolini, D. 2007, pp. 7–17. Atienza, S. Carta, L. Benini, G . De Micheli, and L. Raffo, “Designing application-specific  N. Enright-Jerger, M. Lipasti, and L.-S. networks on chips with floor plan Peh, “Circuit-switched coherence,” IEEE information,” in Proc. ICCAD, 2006, pp. Computer. Arch. Lett. vol. 6, no. 1, pp. 193– 355–362. 202, Mar. 2007.  L. Zhang, H. Chen, H. Chen, B. Yao, K. . Shan Yan, Student Member, IEEE, and Hamilton, and C.-K. Cheng, “Repeated on- Bill Lin, Senior Member, IEEE “Custom chip interconnect analysis and evaluation of Networks-on-Chip Architectures With delay, power, and bandwidth metrics under Multicast Routing” IEEE Transactions On different design goals,” in Proc. ISQED, Very Large Scale Integration (VLSI) 2007, pp. 251–256. Systems, Vol. 17, No. 3, Pp 342-355, March 2009.  R. Mullins, “Minimizing dynamic power consumption in on-chip networks,” in Proc. Ezhumalai Periyathambi Int. Symp. Syst.-on-Chip, 2006, pp. 1–4. received the B.E degree in Computer Science and  C. -W. Lin, S. -Y. Chen, C. -F. Li, Y. - engineering from Madras W. Chang, and C. -L. Yang, “Efficient University, Chennai , India in obstacle-avoiding rectilinear Steiner tree 1992 and Master Technology construction,” in Proc.Int. Symp. Phys. Des. (M.Tech.,) in computer science and 2007, pp. 127–134. Engineering from J N T University, Hyderabad, India in 2006. He is currently 114 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 6, September 2010 working towards the Ph.D degree in Department of Information and Communication, Anna University, Chennai, India. He is working as Professor in the Department of Computer Science and Engineering , Rajalakshmi Engineering College, Chennai, Tamilnadu, India. His research in reconfigurable architecture, Multi- Core Technology CAD – Algorithms for VLSI Architecture. Theoretical Computer Science. And mobile computing. 115 http://sites.google.com/site/ijcsis/ ISSN 1947-5500