Low Power and Area Consumption Custom Networks-On-Chip Architectures Using RST Algorithms
IJCSIS is an open access publishing venue for research in general computer science and information security. Target Audience: IT academics, university IT faculties; industry IT departments; government departments; the mobile industry and computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; computer science, computer applications, multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. The average paper acceptance rate for IJCSIS issues is kept at 25-30% with an aim to provide selective research work of quality in the areas of computer science and engineering. Thanks for your contributions in September 2010 issue and we are grateful to the experienced team of reviewers for providing valuable comments.
- views:
- 66
- posted:
- 10/10/2010
- language:
- English
- pages:
- 9

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
Low Power and Area Consumption Custom Networks-On-Chip
Architectures Using RST Algorithms
1
P.Ezhumali 2Dr.C.Arun
1
Professor, Dept of Computer Science Engineering
2
Asst. Professor, Dept of Electronics and Communication
Ralalakshmi Engineering College, Thandalam-602 105, Chennai, India
1
carunece@gmail.com, 2ezhu.pubs@gmail.com
Abstract: Network-on-Chip (NoC) paradigm for communications within large
architectures with optimized topologies have VLSI systems implemented on a single silicon
been shown to be superior to regular chip. The layered-stack approach to the design
architectures (such as mesh) for application of the on-chip intercore communications is the
specific multiprocessor System-on-Chip Network-on-Chip (NOC) methodology. In a
(MPSoC) devices. The application specific NoC NoC system, modules such as processor cores,
design problem takes, as input the system-level memories and specialized IP blocks exchange
floorplan of the computation architecture .The data using a network as a "public
objective is to generate an area and power transportation" sub-system for the information
optimized NoC topology. In this work, we traffic. A NoC is constructed from multiple
consider the problem of synthesizing custom point-to-point data links interconnected by
networks-on-chip (NoC) architectures that are switches (a.k.a. routers), such that messages
optimized. Both the physical links and routers can be relayed from any source module to any
determine the power consumption of the NoC destination module over several links, by
architecture. Our problem formulation is based making routing decisions at the switches.
on the decomposition of the problem into the
inter-related steps of finding good flow A NoC is similar to a modern
partitions, and providing an optimized network telecommunications network, using digital bit-
implementation for the derived topologies. We packet switching over multiplexed links.
used Rectilinear–Steiner-Tree (RST)-based Although packet switching is sometimes
algorithms for generating efficient and claimed as necessity for a NoC, there are several
optimized network topologies. Experimental NoC proposals utilizing circuit-switching
results on a variety of NoC benchmarks showed techniques. This definition based on routers is
that our synthesis results were achieve reduction usually interpreted so that a single shared bus, a
in power consumption and average hop count single crossbar switch or a point-to-point
over different mesh implementations. We network is not NoCs but practically all other
analyze the quality of the results and solution topologies are. This is somewhat confusing
times of the proposed techniques by extensive since all above-mentioned are networks (they
experimentation with realistic benchmarks and enable communication between two or more
comparisons with regular mesh-based NoC devices) but they are not considered as network-
architectures. on-chips. Note that some erroneously use NoC
as a synonym for mesh topology although NoC
Index Terms—Multicast routing, network-on- paradigm does not dictate the topology.
chip (NoC), synthesis, system-on-chip (SoC), Likewise, the regularity of topology is
topology. sometimes considered as a requirement, which
is, obviously, not the case in research
1.Introduction concentrating on "application-specific NoC
topology synthesis".
Network-on-Chip (NoC) is an emerging
107 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
the complexity of designing wires for
predictable speed, power, noise, reliability,
etc., because of their regular, well-controlled
structure. From a system design viewpoint,
with the advent of multi-core processor
systems, a network is a natural architectural
choice. A NoC can provide separation between
computation and communication; support
modularity and IP reuse via standard
interfaces, handle synchronization issues,
serve as a platform for system test, and, hence,
increase engineering productivity.
Although NoCs can borrow concepts and
techniques from the well-established domain
of computer networking, it is impractical to
figure.1 Topological illustration of a blindly reuse features of "classical" computer
4-by-4 grid structured NoC. networks and symmetric multiprocessors. In
particular, NoC switches should be small,
The wires in the links of the NoC are shared energy-efficient, and fast. Neglecting these
by many signals. A high level of parallelism aspects along with proper, quantitative
is achieved, because all links in the NoC can comparison was typical for early NoC
operate simultaneously on different data research but nowadays they are considered in
packets. Therefore, as the complexity of more detail. The routing algorithms should
integrated systems keeps growing, a NoC be implemented by simple logic, and the
provides enhanced performance (such as number of data buffers should be minimal.
throughput) and scalability in comparison Network topology and properties may be
with previous communication architectures application-specific. Research on NoC is now
(e.g., dedicated point-to-point signal wires, expanding very rapidly, and there are several
shared buses, or segmented buses with companies and universities that are involved.
bridges). Of course, the algorithms must be Figure 1 shows how a NoC, in comparison
designed in such a way that they offer large with shared buses, could be occupied with
parallelism and can hence utilize the various components as resources
potential of NoC.
Traditionally, ICs have been designed with 2.EXISTING RELATED WORKS
dedicated point-to-point connections, with one
wire dedicated to each signal. For large So far, the communication problems faced
designs, in particular, this has several by System on chip were tackled by making use
limitations from a physical design viewpoint. of regular Network on chip architectures. The
The wires occupy much of the area of the chip, following are the list of popular regular NoC
and in nanometer CMOS technology, architectures:
interconnects dominate both performance and
dynamic power dissipation, as signal Mesh Architecture.
propagation in wires across the chip requires Torus Architecture.
multiple clock cycles. NoC links can reduce Butterfly Fat Tree Architecture.
108 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
Extended Butterfly Fat Tree Architecture to design high performance SoCs. While these
papers mostly focus on the concept of regular
The NoC design problem has received NoC architecture (discussing the overall
considerable attention in the literature. Towles advantages and challenges), to the best of our
and Dally [1] and Benini and De Micheli [2] knowledge, our work is better than previous
motivated the NoC paradigm. Several existing custom NoC synthesis formulations and
NoC solutions have addressed the mapping efficient way to solve it.
problem to a regular mesh-based NoC
architecture [3], [4]. Hu and Marculescu [3]
proposed a branch-and-bound algorithm for PROPOSED SYSTEM
the mapping of computation cores on to mesh-
based NoC architectures. Murali et al. [4] 3.1 PROBLEM DEFINITION
described a fast algorithm for mesh-based NoC
architectures that considers different routing • We consider the problem of synthesizing
functions, delay constraints, and bandwidth custom networks-on-chip (NoC)
requirements. On the problem of designing architectures that are optimized for a
custom NoC architectures without assuming given application.
existing network architecture, a number of • We divide the problem statement into
techniques have been proposed [5]–[10]. Pinto the flowing interrelated steps:
et al. [7] presented techniques for the
constraint-driven communication architecture Physical topology Construction.
synthesis of point-to-point links by using Power and Area Comparisons
heuristic-based -way merging. Their technique
is limited to topologies with specific structures 3.2 SYSTEM ARCHITECTURE
that have only two routers between each
source and sink pair. Ogras et al. [5], [6]
proposed graph decomposition and long link
insertion techniques for application-specific
NoC architectures. Srinivasan et al. [8], [9]
presented NoC synthesis algorithms that
consider system-level floor planning, but their
solutions only considered solutions based on a
slicing floorplan where router locations are
restricted to corners of cores and links run
around cores. Murali et al. [10] presented an
Figure. 2 Proposed System Architecture
innovative deadlock-free NoC synthesis flow
with detailed backend integration that also
Our NoC synthesis design flow is depicted in
considers the floorplanning process. The
Figure 2. The major elements in the design
proposed approach is based on the min-cut
flow are elaborated as follows.
partitioning of cores to routers. This work
presents a synthesis approach based on a set
Input Specification: The input specification
partitioning formulation that considers
to our design flow consists of a list of
multicast traffic, Although different in
modules. As observed in recent trends, many
topology and some other aspects, all the above
modern SoC designs combine both hard and
papers essentially advocate the advantages of
soft modules as well as both packet-based
using NoCs and regularity as effective means
network communications and conventional
109 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
wiring. Modules can correspond to a variety that is optimized for the given specification.
of different types of intellectual property (IP) Consider the above diagram that depicts a
cores such as embedded microprocessors, small illustrative example. It only shows the
large embedded memories, digital signal portion of the input specification that
processors, graphics and multimedia corresponds to the network-attached modules
processors, and security encryption engines, and their traffic flows. The nodes represent
as well as custom hardware modules. These modules, edges represent traffic flows, and
modules can come in a variety of sizes and edge labels represent the length of the two
can be either hard or soft macros, possibly as vertices. The NoC Synthesis generates
just black boxes with area and power topologies based on the communication
estimates and constraints on aspect ratios. To demand graph and comparing with
facilitate modularity and interoperability of parameters like power consumption and area
IP cores, packet-based communication with usage chooses the best architecture. Below is
standard network interfaces is rapidly gaining an example of two architectures generated
adoption. Custom NoC architectures are based on the given CDG.
being advocated as a scalable solution to
packet-based communication. In general, a
mixture of network-based communications
and conventional wiring may be utilized as
appropriate, and not all inter-module
communications are necessarily over the on-
chip network. For example, an embedded
microprocessor may have dedicated
connections to its instruction and data cache
modules. Our design flow and input
specification allow for both interconnection
models. Below is an example of a
communication demand graph:
Figure 3 Sample Input Specification
Figure 4 Sample Topologies Generated
NoC Synthesis: Given input specification
NoC Power and Area Estimation: To
information, the NoC synthesis step then
evaluate the power and area of the
proceeds to synthesize a NoC architecture
110 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
synthesized NoC architecture, we use a state- shortest edge lengths using horizontal and
of the- art NoC power-performance simulator vertical edges such that all nodes are
called Orion that can provide detailed power interconnected. The RST problem is well
characteristics for different power studied with very fast implementations
components of a router for different available. We create an RST solver in the inner
input/output port configurations. It accurately loop of flow partitioning to generate topologies
considers leakage power as well as dynamic for the set partitions considered.
switching power, which is important since it
is well known that leakage power is
becoming an increasingly dominating. Orion
also provides area estimates based on a state-
of-the-artrouter microarchitecture.
MODULE DESCRIPTION
Figure 5 Formulation of Synthesis Problem
4.1 Flow Partitioning
Flow partitioning is performed in Figure 6 Flow Partitioning Algorithm
the outer loop of our synthesis formulation to
explore different partitioning of flows to IMPLEMENTATION RESULTS
separate subnetworks. We make use of the
following algorithm to implement flow 5.1. EXPERIMENTAL SETUP
partitioning: We have implemented our
proposed algorithm in C. In our
4.2 STEINER TREE BASED implementation, we have designed a
TOPOLOGY CONSTRUCTION Rectilinear Steiner Tree solver to generate
For each flow partition considered, physical the physical network topologies in the inner
network topologies must be decided. In current loop of the algorithm. Simulator ORION 2.0
process technologies, layout rules for does the power and area estimates. The
implementing wires dictate physical topologies Results obtained are shown in a line chart for
where the network links run horizontally or mere comparisons. A snapshot of the all the
vertically. Thus, the problem is similar to results have been shown later in this chapter.
Rectilinear Steiner Tree (RST) problem that has All experimental results were obtained on a
been extensively studied for the conventional 3.06-GHz Intel P4 processor machine with
VLSI routing problem. Given a set of nodes, the 512 MB of memory running Linux.
RST problem is to find a network with the
111 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
5.2. EXPERIMENTAL RESULTS synthesis results is difficult in part because of
vast differences in the parameters assumed. To
evaluate the effectiveness of our algorithms, we
have the full mesh implementation for each
benchmark for comparison from previous
published papers have been taken. These
comparisons are signified to show the benefits
of custom NoC architectures.
Table 1. NOC Power Comparisons
ALL FSTs: 64 Points
Figure 7 Snapshot of ALL The FSTs
Generated
Steiner Minimal Tree: 64 Points, length =
56729
Figure 8 Steiner Minimal Tree Generated
Method of Evaluation: In all our experiments, Figure 9 NoC Power Comparisons
we aim to evaluate the performance of the
proposed algorithms. On all benchmarks with The area results, power results, the execution
the objective of minimizing the total area as times, and area as well as power
well as power consumption of the synthesized improvements of that algorithm are reported.
NoC architectures. The total area as well as The results show the algorithm can
power consumption includes all network efficiently synthesize NoC architectures that
components. We applied the design parameters minimize power and area consumption as
of 1 GHz clock frequency, 4-flit buffers, and compared with regular topologies such as
128-bit flits. For evaluation, fair direct mesh and optimized mesh topologies.
comparison with previously published NoC
112 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
Table 2. NoC Area Comparisons 6.CONCLUSION AND FUTURE WORK
In this research Works have been carried out
in context related to Regular topologies like
mesh, torus and etc. This work presented an
idea on building customizing network on
chip with the better flow partitioning and
also considered power and area reduction as
compared to the already presented Regular
topologies, we proposed a formulation of the
custom NoC synthesis problem based on the
decomposition of the problem into the inter-
related steps of deriving a good physical
network topology, and providing an
comparison in terms of area and power with
N o C A r e a C o m p a ris o n s
the well established regular topologies. We
2 .5 0 0 0 used the algorithm called CLUSTER for
systematically examining different possible
2 .0 0 0 0
set partitioning of flows, and we proposed
1 .5 0 0 0 the use of RST algorithms for constructing
C u s to m Are a
good physical network topologies. Our
( s q A mr em a )
o p t. M e s h Are a
1 .0 0 0 0 solution framework enables the decoupling
of the evaluation cost function from the
0 .5 0 0 0
exploration process, thereby enabling
0 .0 0 0 0 different user objectives and constraints to be
6 7 8 12 14 20 24 25 36 44 considered. Although we use Steiner trees to
V e rtic e s generate a physical network topology for
each group in the set partition, the final NoC
architecture synthesized is not necessarily
Figure 10. NoC Area Estimates limited to just trees as Steiner tree
implementations of different groups may be
connected to each other to form non-tree
Thus, the above two line charts in structures.
figure 9 and 10 clearly show a reduction in
power and area estimates of custom NoC This work does not differentiate the
with mesh and optimized mesh topologies. routers/switches (communication modules)
Mesh topologies was explained in chapter 2. with the operating modules present in the
Eliminating router ports and links that are not chip. In near future, the work of identifying
used forms optimized mesh topologies. The the best placement of routers and minimizing
power reduction is at an average of 83.43 the number of routers and also the effectives
percent and 50 percent as compared to mesh of the customized Network on Chip in terms
and optimized mesh topologies respectively. of other parameters like throughput, latency.
The area reduction is at an average of 70.95 Link utilization and buffer utilization can be
percent as compared to optimized mesh taken into account.
topologies.
113 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
[8] D. Greenfield, A. Banerjee, J. -G. Lee,
REFERENCES and S. Moore, “Implications of rent’s rule for
NoC design and its fault-tolerance,” in Proc.
[1] Shan Yan, Bill Lin, “ Custom Networks- NOCS, May 2007, pp. 283–294.
on-Chip Architectures With Multicast
Routing,” IEEE transactions on very large [9] S. Yan and B. Lin, “Application-specific
scale integration (VLSI) systems, vol. 17, no. network-on-chip architecture synthesis based
3, march 2009. on set partitions and Steiner trees,” in Proc.
. ASPDAC, 2008, pp. 277–282.
[2] K. Srinivasan, K. S. Chatha, and G.
Konjevod, “Linear-programming based [10] Xilinx, San Jose, CA, “UMC delivers
techniques for synthesis of network-on-chip leading-edge 65 nm FPGAs toXilinx,” Des.
architectures,” IEEE Trans. Very Large Scale Reuse, Nov. 8, 2006 [Online]. Available:
Integr. (VLSI) Syst., vol. 14, no. 4, pp. 407– http://www.design-
420, Apr. 2006. reuse.com/news/14644/umc-edge-65nm-
fpgas-xilinx.html
[3] K. Srinivasan, K. S. Chatha, and G.
Konjevod, “Application specific network-on- [11] P. Gratz, K. Sankaralingam, H. Hanson,
chip design with guaranteed quality P. Shivakumar, R.McDonald, S. W. Keckler,
approximation algorithms,” in Proc. and D. Burger, “Implementation and
ASPDAC, 2007, pp. 184–190. evaluation of a dynamically routed processor
operand network,” in Proc. NOCS, May
[4] S. Murali, P. Meloni, F. Angiolini, D. 2007, pp. 7–17.
Atienza, S. Carta, L. Benini, G . De Micheli,
and L. Raffo, “Designing application-specific [12] N. Enright-Jerger, M. Lipasti, and L.-S.
networks on chips with floor plan Peh, “Circuit-switched coherence,” IEEE
information,” in Proc. ICCAD, 2006, pp. Computer. Arch. Lett. vol. 6, no. 1, pp. 193–
355–362. 202, Mar. 2007.
[5] L. Zhang, H. Chen, H. Chen, B. Yao, K. [13]. Shan Yan, Student Member, IEEE, and
Hamilton, and C.-K. Cheng, “Repeated on- Bill Lin, Senior Member, IEEE “Custom
chip interconnect analysis and evaluation of Networks-on-Chip Architectures With
delay, power, and bandwidth metrics under Multicast Routing” IEEE Transactions On
different design goals,” in Proc. ISQED, Very Large Scale Integration (VLSI)
2007, pp. 251–256. Systems, Vol. 17, No. 3, Pp 342-355, March
2009.
[6] R. Mullins, “Minimizing dynamic power
consumption in on-chip networks,” in Proc. Ezhumalai Periyathambi
Int. Symp. Syst.-on-Chip, 2006, pp. 1–4. received the B.E degree in
Computer Science and
[7] C. -W. Lin, S. -Y. Chen, C. -F. Li, Y. - engineering from Madras
W. Chang, and C. -L. Yang, “Efficient University, Chennai , India in
obstacle-avoiding rectilinear Steiner tree 1992 and Master Technology
construction,” in Proc.Int. Symp. Phys. Des. (M.Tech.,) in computer science and
2007, pp. 127–134. Engineering from J N T University,
Hyderabad, India in 2006. He is currently
114 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 6, September 2010
working towards the Ph.D degree in
Department of Information and
Communication, Anna University, Chennai,
India. He is working as Professor in the
Department of Computer Science and
Engineering , Rajalakshmi Engineering
College, Chennai, Tamilnadu, India. His
research in reconfigurable architecture, Multi-
Core Technology CAD – Algorithms for VLSI
Architecture. Theoretical Computer Science.
And mobile computing.
115 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "