Low Power and Area Consumption Custom Networks-On-Chip Architectures Using RST Algorithms by ijcsis


More Info
									                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                              Vol. 8, No. 6, September 2010

           Low Power and Area Consumption Custom Networks-On-Chip
                      Architectures Using RST Algorithms
                                           P.Ezhumali 2Dr.C.Arun
                               Professor, Dept of Computer Science Engineering
                            Asst. Professor, Dept of Electronics and Communication
                     Ralalakshmi Engineering College, Thandalam-602 105, Chennai, India
                                 carunece@gmail.com, 2ezhu.pubs@gmail.com

Abstract:         Network-on-Chip          (NoC)             paradigm for communications within large
architectures with optimized topologies have                 VLSI systems implemented on a single silicon
been shown to be superior to regular                         chip. The layered-stack approach to the design
architectures (such as mesh) for application                 of the on-chip intercore communications is the
specific     multiprocessor      System-on-Chip              Network-on-Chip (NOC) methodology. In a
(MPSoC) devices. The application specific NoC                NoC system, modules such as processor cores,
design problem takes, as input the system-level              memories and specialized IP blocks exchange
floorplan of the computation architecture .The               data using a network as a "public
objective is to generate an area and power                   transportation" sub-system for the information
optimized NoC topology. In this work, we                     traffic. A NoC is constructed from multiple
consider the problem of synthesizing custom                  point-to-point data links interconnected by
networks-on-chip (NoC) architectures that are                switches (a.k.a. routers), such that messages
optimized. Both the physical links and routers               can be relayed from any source module to any
determine the power consumption of the NoC                   destination module over several links, by
architecture. Our problem formulation is based               making routing decisions at the switches.
on the decomposition of the problem into the
inter-related steps of finding good flow                     A     NoC     is    similar     to    a   modern
partitions, and providing an optimized network               telecommunications network, using digital bit-
implementation for the derived topologies. We                packet switching over multiplexed links.
used Rectilinear–Steiner-Tree (RST)-based                    Although packet switching is sometimes
algorithms for generating efficient and                      claimed as necessity for a NoC, there are several
optimized network topologies. Experimental                   NoC proposals utilizing circuit-switching
results on a variety of NoC benchmarks showed                techniques. This definition based on routers is
that our synthesis results were achieve reduction            usually interpreted so that a single shared bus, a
in power consumption and average hop count                   single crossbar switch or a point-to-point
over different mesh implementations. We                      network is not NoCs but practically all other
analyze the quality of the results and solution              topologies are. This is somewhat confusing
times of the proposed techniques by extensive                since all above-mentioned are networks (they
experimentation with realistic benchmarks and                enable communication between two or more
comparisons with regular mesh-based NoC                      devices) but they are not considered as network-
architectures.                                               on-chips. Note that some erroneously use NoC
                                                             as a synonym for mesh topology although NoC
  Index Terms—Multicast routing, network-on-                 paradigm does not dictate the topology.
chip (NoC), synthesis, system-on-chip (SoC),                 Likewise, the regularity of topology is
topology.                                                    sometimes considered as a requirement, which
                                                             is, obviously, not the case in research
1.Introduction                                               concentrating on "application-specific NoC
                                                             topology synthesis".
Network-on-Chip (NoC) is an emerging

                                                       107                               http://sites.google.com/site/ijcsis/
                                                                                         ISSN 1947-5500
                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                         Vol. 8, No. 6, September 2010

                                                        the complexity of designing wires for
                                                        predictable speed, power, noise, reliability,
                                                        etc., because of their regular, well-controlled
                                                        structure. From a system design viewpoint,
                                                        with the advent of multi-core processor
                                                        systems, a network is a natural architectural
                                                        choice. A NoC can provide separation between
                                                        computation and communication; support
                                                        modularity and IP reuse via standard
                                                        interfaces, handle synchronization issues,
                                                        serve as a platform for system test, and, hence,
                                                        increase engineering productivity.

                                                         Although NoCs can borrow concepts and
                                                         techniques from the well-established domain
                                                         of computer networking, it is impractical to
        figure.1 Topological illustration of a           blindly reuse features of "classical" computer
 4-by-4 grid structured NoC.                             networks and symmetric multiprocessors. In
                                                         particular, NoC switches should be small,
 The wires in the links of the NoC are shared            energy-efficient, and fast. Neglecting these
 by many signals. A high level of parallelism            aspects along with proper, quantitative
 is achieved, because all links in the NoC can           comparison was typical for early NoC
 operate simultaneously on different data                research but nowadays they are considered in
 packets. Therefore, as the complexity of                more detail. The routing algorithms should
 integrated systems keeps growing, a NoC                 be implemented by simple logic, and the
 provides enhanced performance (such as                  number of data buffers should be minimal.
 throughput) and scalability in comparison               Network topology and properties may be
 with previous communication architectures               application-specific. Research on NoC is now
 (e.g., dedicated point-to-point signal wires,           expanding very rapidly, and there are several
 shared buses, or segmented buses with                   companies and universities that are involved.
 bridges). Of course, the algorithms must be             Figure 1 shows how a NoC, in comparison
 designed in such a way that they offer large            with shared buses, could be occupied with
 parallelism and can hence utilize the                   various components as resources
 potential of NoC.

Traditionally, ICs have been designed with               2.EXISTING RELATED WORKS
dedicated point-to-point connections, with one
wire dedicated to each signal. For large                   So far, the communication problems faced
designs, in particular, this has several                by System on chip were tackled by making use
limitations from a physical design viewpoint.           of regular Network on chip architectures. The
The wires occupy much of the area of the chip,          following are the list of popular regular NoC
and in nanometer CMOS technology,                       architectures:
interconnects dominate both performance and
dynamic power dissipation, as signal                     Mesh Architecture.
propagation in wires across the chip requires            Torus Architecture.
multiple clock cycles. NoC links can reduce              Butterfly Fat Tree Architecture.

                                                  108                               http://sites.google.com/site/ijcsis/
                                                                                    ISSN 1947-5500
                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                            Vol. 8, No. 6, September 2010

 Extended Butterfly Fat Tree Architecture                  to design high performance SoCs. While these
                                                           papers mostly focus on the concept of regular
The NoC design problem has received                        NoC architecture (discussing the overall
considerable attention in the literature. Towles           advantages and challenges), to the best of our
and Dally [1] and Benini and De Micheli [2]                knowledge, our work is better than previous
motivated the NoC paradigm. Several existing               custom NoC synthesis formulations and
NoC solutions have addressed the mapping                   efficient way to solve it.
problem to a regular mesh-based NoC
architecture [3], [4]. Hu and Marculescu [3]
proposed a branch-and-bound algorithm for                   PROPOSED SYSTEM
the mapping of computation cores on to mesh-
based NoC architectures. Murali et al. [4]                  3.1 PROBLEM DEFINITION
described a fast algorithm for mesh-based NoC
architectures that considers different routing              •   We consider the problem of synthesizing
functions, delay constraints, and bandwidth                     custom       networks-on-chip    (NoC)
requirements. On the problem of designing                       architectures that are optimized for a
custom NoC architectures without assuming                       given application.
existing network architecture, a number of                  •    We divide the problem statement into
techniques have been proposed [5]–[10]. Pinto                   the flowing interrelated steps:
et al. [7] presented techniques for the
constraint-driven communication architecture                    Physical topology Construction.
synthesis of point-to-point links by using                      Power and Area Comparisons
heuristic-based -way merging. Their technique
is limited to topologies with specific structures               3.2 SYSTEM ARCHITECTURE
that have only two routers between each
source and sink pair. Ogras et al. [5], [6]
proposed graph decomposition and long link
insertion techniques for application-specific
NoC architectures. Srinivasan et al. [8], [9]
presented NoC synthesis algorithms that
consider system-level floor planning, but their
solutions only considered solutions based on a
slicing floorplan where router locations are
restricted to corners of cores and links run
around cores. Murali et al. [10] presented an
                                                            Figure. 2 Proposed System Architecture
innovative deadlock-free NoC synthesis flow
with detailed backend integration that also
                                                            Our NoC synthesis design flow is depicted in
considers the floorplanning process. The
                                                            Figure 2. The major elements in the design
proposed approach is based on the min-cut
                                                            flow are elaborated as follows.
partitioning of cores to routers. This work
presents a synthesis approach based on a set
                                                            Input Specification: The input specification
partitioning formulation that considers
                                                            to our design flow consists of a list of
multicast traffic, Although different in
                                                            modules. As observed in recent trends, many
topology and some other aspects, all the above
                                                            modern SoC designs combine both hard and
papers essentially advocate the advantages of
                                                            soft modules as well as both packet-based
using NoCs and regularity as effective means
                                                            network communications and conventional

                                                     109                               http://sites.google.com/site/ijcsis/
                                                                                       ISSN 1947-5500
                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                          Vol. 8, No. 6, September 2010

wiring. Modules can correspond to a variety              that is optimized for the given specification.
of different types of intellectual property (IP)         Consider the above diagram that depicts a
cores such as embedded microprocessors,                  small illustrative example. It only shows the
large embedded memories, digital signal                  portion of the input specification that
processors, graphics and multimedia                      corresponds to the network-attached modules
processors, and security encryption engines,             and their traffic flows. The nodes represent
as well as custom hardware modules. These                modules, edges represent traffic flows, and
modules can come in a variety of sizes and               edge labels represent the length of the two
can be either hard or soft macros, possibly as           vertices. The NoC Synthesis generates
just black boxes with area and power                     topologies based on the communication
estimates and constraints on aspect ratios. To           demand graph and comparing with
facilitate modularity and interoperability of            parameters like power consumption and area
IP cores, packet-based communication with                usage chooses the best architecture. Below is
standard network interfaces is rapidly gaining           an example of two architectures generated
adoption. Custom NoC architectures are                   based on the given CDG.
being advocated as a scalable solution to
packet-based communication. In general, a
mixture of network-based communications
and conventional wiring may be utilized as
appropriate, and not all inter-module
communications are necessarily over the on-
chip network. For example, an embedded
microprocessor      may      have     dedicated
connections to its instruction and data cache
modules. Our design flow and input
specification allow for both interconnection
models. Below is an example of a
communication demand graph:

    Figure 3 Sample Input Specification
                                                         Figure 4 Sample Topologies Generated
NoC Synthesis: Given input specification
                                                         NoC Power and Area Estimation: To
information, the NoC synthesis step then
                                                         evaluate the power and area of the
proceeds to synthesize a NoC architecture

                                                   110                               http://sites.google.com/site/ijcsis/
                                                                                     ISSN 1947-5500
                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                           Vol. 8, No. 6, September 2010

 synthesized NoC architecture, we use a state-            shortest edge lengths using horizontal and
 of the- art NoC power-performance simulator              vertical edges such that all nodes are
 called Orion that can provide detailed power             interconnected. The RST problem is well
 characteristics    for    different      power           studied with very fast implementations
 components of a router for different                     available. We create an RST solver in the inner
 input/output port configurations. It accurately          loop of flow partitioning to generate topologies
 considers leakage power as well as dynamic               for the set partitions considered.
 switching power, which is important since it
 is well known that leakage power is
 becoming an increasingly dominating. Orion
 also provides area estimates based on a state-
 of-the-artrouter microarchitecture.


  Figure 5 Formulation of Synthesis Problem

 4.1 Flow Partitioning

            Flow partitioning is performed in                  Figure 6 Flow Partitioning Algorithm
 the outer loop of our synthesis formulation to
 explore different partitioning of flows to                IMPLEMENTATION RESULTS
 separate subnetworks. We make use of the
 following algorithm to implement flow                     5.1. EXPERIMENTAL SETUP
 partitioning:                                                          We have implemented our
                                                           proposed algorithm in C. In our
 4.2       STEINER          TREE        BASED              implementation, we have designed a
 TOPOLOGY CONSTRUCTION                                     Rectilinear Steiner Tree solver to generate
For each flow partition considered, physical               the physical network topologies in the inner
network topologies must be decided. In current             loop of the algorithm. Simulator ORION 2.0
process technologies, layout rules for                     does the power and area estimates. The
implementing wires dictate physical topologies             Results obtained are shown in a line chart for
where the network links run horizontally or                mere comparisons. A snapshot of the all the
vertically. Thus, the problem is similar to                results have been shown later in this chapter.
Rectilinear Steiner Tree (RST) problem that has            All experimental results were obtained on a
been extensively studied for the conventional              3.06-GHz Intel P4 processor machine with
VLSI routing problem. Given a set of nodes, the            512 MB of memory running Linux.
RST problem is to find a network with the

                                                    111                               http://sites.google.com/site/ijcsis/
                                                                                      ISSN 1947-5500
                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                       Vol. 8, No. 6, September 2010

 5.2. EXPERIMENTAL RESULTS                            synthesis results is difficult in part because of
                                                      vast differences in the parameters assumed. To
                                                      evaluate the effectiveness of our algorithms, we
                                                      have the full mesh implementation for each
                                                      benchmark for comparison from previous
                                                      published papers have been taken. These
                                                      comparisons are signified to show the benefits
                                                      of custom NoC architectures.

                                                       Table 1. NOC Power Comparisons

           ALL FSTs: 64 Points
     Figure 7 Snapshot of ALL The FSTs

 Steiner Minimal Tree: 64 Points, length =

Figure 8 Steiner Minimal Tree Generated

Method of Evaluation: In all our experiments,               Figure 9 NoC Power Comparisons
we aim to evaluate the performance of the
proposed algorithms. On all benchmarks with            The area results, power results, the execution
the objective of minimizing the total area as          times, and area as well as power
well as power consumption of the synthesized           improvements of that algorithm are reported.
NoC architectures. The total area as well as           The results show the algorithm can
power consumption includes all network                 efficiently synthesize NoC architectures that
components. We applied the design parameters           minimize power and area consumption as
of 1 GHz clock frequency, 4-flit buffers, and          compared with regular topologies such as
128-bit flits. For evaluation, fair direct             mesh and optimized mesh topologies.
comparison with previously published NoC

                                                112                               http://sites.google.com/site/ijcsis/
                                                                                  ISSN 1947-5500
                                                                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                      Vol. 8, No. 6, September 2010

Table 2. NoC Area Comparisons                                                                                        6.CONCLUSION AND FUTURE WORK

                                                                                                                     In this research Works have been carried out
                                                                                                                     in context related to Regular topologies like
                                                                                                                     mesh, torus and etc. This work presented an
                                                                                                                     idea on building customizing network on
                                                                                                                     chip with the better flow partitioning and
                                                                                                                     also considered power and area reduction as
                                                                                                                     compared to the already presented Regular
                                                                                                                     topologies, we proposed a formulation of the
                                                                                                                     custom NoC synthesis problem based on the
                                                                                                                     decomposition of the problem into the inter-
                                                                                                                     related steps of deriving a good physical
                                                                                                                     network topology, and providing an
                                                                                                                     comparison in terms of area and power with
                                           N o C A r e a C o m p a ris o n s
                                                                                                                     the well established regular topologies. We
                    2 .5 0 0 0                                                                                       used the algorithm called CLUSTER for
                                                                                                                     systematically examining different possible
                    2 .0 0 0 0
                                                                                                                     set partitioning of flows, and we proposed
                    1 .5 0 0 0                                                                                       the use of RST algorithms for constructing
                                                                               C u s to m Are a
                                                                                                                     good physical network topologies. Our
( s q A mr em a )

                                                                               o p t. M e s h Are a
                    1 .0 0 0 0                                                                                       solution framework enables the decoupling
                                                                                                                     of the evaluation cost function from the
                    0 .5 0 0 0
                                                                                                                     exploration process, thereby enabling
                    0 .0 0 0 0                                                                                       different user objectives and constraints to be
                                  6   7   8 12 14 20 24 25 36 44                                                     considered. Although we use Steiner trees to
                                                  V e rtic e s                                                       generate a physical network topology for
                                                                                                                     each group in the set partition, the final NoC
                                                                                                                     architecture synthesized is not necessarily
                                 Figure 10. NoC Area Estimates                                                       limited to just trees as Steiner tree
                                                                                                                     implementations of different groups may be
                                                                                                                     connected to each other to form non-tree
               Thus, the above two line charts in                                                                    structures.
   figure 9 and 10 clearly show a reduction in
   power and area estimates of custom NoC                                                                             This work does not differentiate the
   with mesh and optimized mesh topologies.                                                                          routers/switches (communication modules)
   Mesh topologies was explained in chapter 2.                                                                       with the operating modules present in the
   Eliminating router ports and links that are not                                                                   chip. In near future, the work of identifying
   used forms optimized mesh topologies. The                                                                         the best placement of routers and minimizing
   power reduction is at an average of 83.43                                                                         the number of routers and also the effectives
   percent and 50 percent as compared to mesh                                                                        of the customized Network on Chip in terms
   and optimized mesh topologies respectively.                                                                       of other parameters like throughput, latency.
   The area reduction is at an average of 70.95                                                                      Link utilization and buffer utilization can be
   percent as compared to optimized mesh                                                                             taken into account.

                                                                                                               113                               http://sites.google.com/site/ijcsis/
                                                                                                                                                 ISSN 1947-5500
                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                         Vol. 8, No. 6, September 2010

                                                          [8] D. Greenfield, A. Banerjee, J. -G. Lee,
REFERENCES                                               and S. Moore, “Implications of rent’s rule for
                                                         NoC design and its fault-tolerance,” in Proc.
[1] Shan Yan, Bill Lin, “ Custom Networks-               NOCS, May 2007, pp. 283–294.
on-Chip Architectures With Multicast
Routing,” IEEE transactions on very large                [9] S. Yan and B. Lin, “Application-specific
scale integration (VLSI) systems, vol. 17, no.           network-on-chip architecture synthesis based
3, march 2009.                                           on set partitions and Steiner trees,” in Proc.
.                                                        ASPDAC, 2008, pp. 277–282.
 [2] K. Srinivasan, K. S. Chatha, and G.
Konjevod,      “Linear-programming       based            [10] Xilinx, San Jose, CA, “UMC delivers
techniques for synthesis of network-on-chip              leading-edge 65 nm FPGAs toXilinx,” Des.
architectures,” IEEE Trans. Very Large Scale             Reuse, Nov. 8, 2006 [Online]. Available:
Integr. (VLSI) Syst., vol. 14, no. 4, pp. 407–           http://www.design-
420, Apr. 2006.                                          reuse.com/news/14644/umc-edge-65nm-
[3] K. Srinivasan, K. S. Chatha, and G.
Konjevod, “Application specific network-on-              [11] P. Gratz, K. Sankaralingam, H. Hanson,
chip design with guaranteed quality                      P. Shivakumar, R.McDonald, S. W. Keckler,
approximation    algorithms,”    in   Proc.              and D. Burger, “Implementation and
ASPDAC, 2007, pp. 184–190.                               evaluation of a dynamically routed processor
                                                         operand network,” in Proc. NOCS, May
[4] S. Murali, P. Meloni, F. Angiolini, D.               2007, pp. 7–17.
Atienza, S. Carta, L. Benini, G . De Micheli,
and L. Raffo, “Designing application-specific             [12] N. Enright-Jerger, M. Lipasti, and L.-S.
networks on chips with floor plan                        Peh, “Circuit-switched coherence,” IEEE
information,” in Proc. ICCAD, 2006, pp.                  Computer. Arch. Lett. vol. 6, no. 1, pp. 193–
355–362.                                                 202, Mar. 2007.

 [5] L. Zhang, H. Chen, H. Chen, B. Yao, K.              [13]. Shan Yan, Student Member, IEEE, and
Hamilton, and C.-K. Cheng, “Repeated on-                 Bill Lin, Senior Member, IEEE “Custom
chip interconnect analysis and evaluation of             Networks-on-Chip Architectures With
delay, power, and bandwidth metrics under                Multicast Routing” IEEE Transactions On
different design goals,” in Proc. ISQED,                 Very Large Scale Integration (VLSI)
2007, pp. 251–256.                                       Systems, Vol. 17, No. 3, Pp 342-355, March
 [6] R. Mullins, “Minimizing dynamic power
consumption in on-chip networks,” in Proc.                            Ezhumalai       Periyathambi
Int. Symp. Syst.-on-Chip, 2006, pp. 1–4.                              received the B.E degree in
                                                                      Computer      Science     and
 [7] C. -W. Lin, S. -Y. Chen, C. -F. Li, Y. -                         engineering from Madras
W. Chang, and C. -L. Yang, “Efficient                                 University, Chennai , India in
obstacle-avoiding rectilinear Steiner tree                            1992 and Master Technology
construction,” in Proc.Int. Symp. Phys. Des.            (M.Tech.,) in computer science and
2007, pp. 127–134.                                      Engineering from J N T University,
                                                        Hyderabad, India in 2006. He is currently

                                                  114                               http://sites.google.com/site/ijcsis/
                                                                                    ISSN 1947-5500
                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                          Vol. 8, No. 6, September 2010

working towards the Ph.D degree in
Department       of      Information      and
Communication, Anna University, Chennai,
India. He is working as Professor in the
Department of Computer Science and
Engineering , Rajalakshmi Engineering
College, Chennai, Tamilnadu, India. His
research in reconfigurable architecture, Multi-
Core Technology CAD – Algorithms for VLSI
Architecture. Theoretical Computer Science.
And mobile computing.

                                                   115                               http://sites.google.com/site/ijcsis/
                                                                                     ISSN 1947-5500

To top