141

Document Sample
141 Powered By Docstoc
					              UNICAST WORMHOLE MESSAGE ROUTING IN IRREGULAR
                          COMPUTER NETWORKS
                 SHARAD JAISWAL1, LEV ZAKREVSKI2, MEHMET MUSTAFA1, MARK KARPOVSKY1
      1
          ({sjaiswal, mmustafa, markkar}@bu.edu) ECE Dept., Boston University,8 St. Mary’s Street, Boston, MA 02215
      2
           (zakr@adm.njit.edu) ECE Dept., New Jersey Institute of Technology, University Heights, Newark, NJ 07102



ABSTRACT                                                          efficient deadlock-free routing algorithms in irregular
                                                                  topologies introduces new challenges, which we address
     In this paper we consider the problem of deadlock-           in this paper.
free unicast wormhole routing in computer networks with
irregular topologies, such as Networks of Workstations                 Existing routing strategies can be divided into
(NOWs). The proposed algorithm consists of two stages.            adaptive [7,9,12-14], that consider existing queue sizes
At the first stage, we minimize the set of turns in the           and non-adaptive techniques [10,13,15,16]. In this paper,
network graph which should be prohibited for deadlock             we will consider non-adaptive methods, which can
prevention by breaking all cycles in the channel                  however be easily extended for adaptive routing. Several
dependency graph. Proposed approach guarantees that the           routing methods currently exist for regular topologies,
constructed set of prohibited turns is irreducible and the        such as 2-D meshes, tori and hypercubes
fraction of the prohibited turns does not exceed 1/3 for          [7,8,10,12,15,17,18]. The more difficult problem of
any network topology. At the second stage, routing tables         routing in the presence of faults [2,3,9,15,17-20] has also
are constructed based on the set of prohibited turns that         been studied. If the number of faults is large then the
minimize average message path lengths and delivery                fault-tolerant routing problem for networks with regular
times in a decentralized manner. Given N, the number of           topologies becomes almost equivalent to routing in an
nodes, and d the maximal number of ports in routers, the          arbitrary topology [19-21].
complexity of the proposed algorithm does not exceed
O(N2d). This algorithm is invoked when a change in the                 For an irregular topology most of the existing routing
topology of the network is detected, which we assume is           strategies are based on spanning trees and (up/down)
infrequent. We also present results of simulation                 routing [1,2]. According to this strategy, once a spanning
experiments of average latency for uniform, transpose and         tree is constructed, any two nodes can communicate with
local traffic patterns, saturation points and scalability.        each other along the tree without any deadlocks. The
These results illustrate the performance and advantages of        drawbacks of this approach are the long message paths
the proposed approach as compared with the existing               and high load on the edges near the root node [1]. This
up/down techniques [1,2].                                         method can be improved by allowing shortcuts using
                                                                  edges not belonging to the spanning tree [1], but this
KEYWORDS: routing algorithms, wormhole routing,                   could result in deadlocks due to the formation of cycles in
turn model, deadlock prevention                                   the channel dependency graph.

                                                                       To measure the efficiency of a routing strategy, the
1. INTRODUCTION AND                                               average message delivery time can be used [10,13,14,16]
                                                                  as a parameter for comparison. The average message
   FORMULATION OF THE PROBLEM                                     delivery time is a function of a message generation rate.
     Recently, Networks of Workstations (NOWs) have               At a message generation rate known as saturation point,
emerged as an inexpensive alternative to massively                delivery time increases exponentially. Any good routing
parallel multiprocessors [1,3,4]. NOWs are comprised of           strategy aims to increase the maximal sustainable
a collection of routers or switches, communication links          throughput and decrease the delivery time for generation
and workstations interconnected in an irregular topology.         rates near the saturation point.
In order to minimize network latency and achieve high
bandwidth communications, recent experimental and                     We assume that the given network consists of N
commercial switches for NOWs implement wormhole                   nodes connected by E edges constituting a connected
routing [2,4,5]. However, wormhole routing is susceptible         network graph G. In general, the network graph can be
to deadlocks [6-13] because packets are allowed to hold           considered to be a multigraph with several edges between
many resources while requesting others. Design of                 the same two nodes [1,8,10]. In particular, if k virtual
channels are used where each physical channel is split          prohibited turns is discussed minimizing average message
into k logical channels using time multiplexing techniques      path lengths and thus average delivery time.
[10,11], every pair of nodes is connected either by 0 or k
virtual edges.                                                       The complexities of the algorithms described in
                                                                Sections 2 and 3 do not exceed O(N2d), where N is the
     Every routing algorithm prohibits some of the turns        number of nodes, and d is the maximal number of ports in
in network graph G. A turn (a,b,c) is a three-tuple of          routers. These algorithms are invoked only when there is
nodes such that (a,b) and (b,c) are edges in the network        a change in the topology of the original network. Results
graph G. In order to correctly model existing switch-           of simulation experiments on average latency for uniform,
based networks such as Myrinet [4], we assume that G is         local and transpose traffic patterns, saturation points and
symmetric, i.e. if (a,b) is an edge in G , then (b,a) is also   scalability for the TPBR-algorithm are presented in
an edge. These channels can be used simultaneously              Section 4. These results clearly show the advantages and
without contention thus if (a,b,c) is prohibited, then          benefits of the proposed approach as compared with the
(c,b,a) is also prohibited, and we will consider these two      up/down approach.
turns as one. The total number of turns is T = idi(di-1)/2,
where di is the degree of node i. For the up/down routing
[1,2,20] first a spanning tree for G is constructed and         2. TPBR-ALGORITHM FOR
nodes are labeled preserving the partial order defined by       CONSTRUCTING A SET OF
the tree where the root has label 1 and a turn (a,b,c) is       PROHIBITED TURNS FOR DEADLOCK
prohibited if b>a and b>c.                                      PREVENTION
     It was shown in [12] for meshes and tori and in [3,20]          In this section we describe the TPBR-algorithm for
for irregular topologies that reduction in the number of        creating set of prohibited turns Z(G) for a given graph G
prohibited turns results in a decrease of average path          with N(G) nodes. The TPBR-algorithm is a recursive
lengths of messages and in a reduction of average               algorithm, in which at each step one node is selected and
delivery time accompanied by an increase in throughput.         all turns through the selected node are either permitted or
In Section 4 we present simulation results for random           prohibited. For example, if after deleting a node a with
topologies with N=256 nodes. Fig.3 is showing a strong          degree da and all edges incident on it, the remaining graph
correlation between fraction of prohibited turns and            G-a is still connected, then we prohibit all da(da-1)/2 turns
average delivery time. Reduction of the fraction of             (c,a,b) and permit all turns (a,b,c). Algorithm is invoked
prohibited turns from 30% to 20% results in about 100%          recursively as long as there are some edges in the
increase in maximal sustainable throughput.                     remaining graph.

     For the up/down approach [1,2,20] the fraction of               Following properties can be proven for the set Z(G)
turns which are prohibited depends on the selection of the      of prohibited turns generated by the TPBR-algorithm:
spanning tree for a given network topology and could be
close to 1. The problem of construction of an optimal               1. Any cycle in G contains at least one turn included
spanning tree is NP-hard. In [3] a method for minimizing               in Z(G)
the fraction z, of turns to be prohibited was presented.            2. |Z(G)|  1/3 |T(G)|, where T(G) is the set of all
With this method the fraction of prohibited turns does not             turns in graph G;
exceed 1/3 for any topology but the approach does not
guarantee an irreducible set of prohibited turns. The set of        3. The      TPBR-algorithm        maintains      graph's
prohibited turns is irreducible if deletion of any turn from           connectivity. For any two connected nodes a and b
the set results in cycles in the channel dependency graph              in the original graph, there exists at least one path
and deadlocks in the system. In the next section, we                   between a and b, without any turns from Z(G)
describe the TPBR (Turn Prohibition Based Routing)                  4. Set Z(G) is irreducible. Deletion of any turn from
algorithm for deadlock prevention, which results in                   Z(G) creates a cycle in G containing no turns from
irreducible set of prohibited turns maintaining the upper             Z(G).
bound for the fraction of prohibited turns at 1/3 for any
network topology. In Section 2 we also present                       We note that the TPBR -algorithm has a complexity
experimental results for random topologies with 256             of the order of O(N2d), where N is the number of nodes in
nodes showing that the proposed TPBR-algorithm results          G, and d is the maximal degree of the node. The complete
in a considerable reduction in a number of prohibited           information about Z(G) can be represented in 3 arrays of
turns as compared to the up/down approach. We note that         length N. First array shows the order in which nodes are
a set of prohibited turns for deadlock prevention does not
                                                                selected, the second one shows the order in which
completely specify the routing strategy, i.e. several           components of connectivity are indexed, and the third one
routing strategies can satisfy the same set of restrictions     stores the information about special edges.
on turns in the network graph. In Section 3, decentralized
construction of routing tables based on selected set of
  Description of TPBR(G):

  0) Initialize: Z(G) := , all nodes and edges are marked as
  non-special, HALF_LOOP: = 0.
  1) if N(G) < 2, no turns are prohibited, then return
  2) a non-special node a is selected in G, such that (d2a-
  2)/i (di-1) is minimal where the summation is over nodes i
  sharing an edge with node a.
  3) Components of connectivity G1,…,Gk are constructed in
  graph G - a such that any edge between nodes in different
  components should include node a. Following rules apply:
     - if a special node exist in G then it should be in G1 ,
     the first connectivity component.
     - else, component Gi, connected to a in G with fewer
     number of edges, should have a larger index i.
  4) for i=2,3,…,k, one edge connecting component Gi to a is       Fig.1. Fraction of prohibited turns, z(G)=|Z(G)| / |T(G)|
  marked as a special edge.                                        as a function of average node degrees for the TPBR-
  5) all turns (b,a,c) are included in Z(G), except turns, for
                                                                   algorithm and the up/down approaches for randomly
  which (a,b) is a special edge, bGi, cGj and i>j an all         configured irregular topologies with 256 nodes.
  turns (a,b,c) are permitted.
  6) TPBR (G1)
  7) for i=2,3,…,k {                                                 10     11      4      7      8                     3
        if (HALF_LOOP = 0) AND (in G exists a sequence of
  nodes a, x1,…, xj, a, such that x1,…xj  Gi-1),
           then HALF_LOOP := 1.
        if (HALF_LOOP = 1)                                           9      14       5     6      13        1    2      12
           then node in Gi, connected to a in G with special
  edge, is declared and marked as special.                           Fig. 2. Example illustrating the construction of
                                                                     prohibited turns by the TPBR-algorithm. Special
           TPBR (Gi)
                                                                     nodes and edges are shown in black, nodes are
     }                                                               labeled in order according to their selection by the
  8) return                                                          TPBR-algorithm

     To illustrate the efficiency of the TPBR-algorithm
compared with the up/down approach, we show in Fig.1             3. CONSTRUCTION                       OF       ROUTING
percentages of prohibited turns generated using these two        TABLES
algorithms for random networks with 256 nodes, as
function of average node degree. One can see from Fig.1               In this section we describe a decentralized algorithm
that moving from the up/down curve to the TPBR-                  for the construction of local routing tables for a given set
algorithm curve results in a 15% to 50% reduction of the         of prohibited turns Z(G), minimizing average path length
fraction of prohibited turns.                                    and average latency. Our goal is: for any source s and
                                                                 destination d select the shortest routing path a1,…,am
     In Fig.2 , TPBR-algorithm is illustrated for a graph        (a1=s, am=d) among all paths, satisfying the routing
with N = 14 nodes. Nodes are labeled in the order that           restrictions imposed by the set Z(G). Here we reiterate
they are selected by the TPBR-algorithm. Prohibited              that these computations are infrequent as they are
turns, e.g. (3,2,12) and special nodes, e.g. 13 and 14, and      performed only when network topology changes are
special edges e.g. (13,1) are shown in dark, heavy lines.        detected which we assume do not occur often.
For this example, |Z(G)|=12 out of |T(G)|=50 turns are
prohibited. We note, that if node 3 or 12 were selected by            For any intermediate node i, the algorithm estimates
the TPBR-algorithm instead of node 2, the total number of        the length of the shortest permitted (under the restriction
prohibited turns would have gone down to 11. This                on turns generated by the TPBR-algorithm) path between
example suggests that although the TPBR-algorithm                adjacent nodes of i and the destination, and routes the
constructs an irreducible set of prohibited turns this set is    message to neighbor j with the shortest estimated path
not necessarily minimal.                                         length provided that the corresponding turns at nodes i
                                                                 and j are permitted. This algorithm can be implemented in
                                                                 both hardware and software depending on speed/memory
                                                                 parameters of routers. We assume that every node has up
to d neighbors. The term "node" here is the router                compare the efficiency of the up/down approach with the
component of the processor-router pair. Hence, it would           TPBR-algorithm.
have up to d+1 channel buffers, including the buffers for
the consumption channel to the processor.
                                                                  4. SIMULATION EXPERIMENTS
     Initially, every router knows its set of permitted and
prohibited turns. This can be represented by (d+1)(d+1)               We have implemented an event-driven simulator to
matrix P, such that P(i,j)=1 if the turn from input buffer i      evaluate the performance of the TPBR-algorithm for
to the output buffer j is permitted and i,j{0,1,…,d}             wormhole-routed networks. In all experiments the TPBR-
where i, j = 0 correspond to the consumption channel. It          algorithm and the up/down routing algorithms are
follows from the TPBR-algorithm that P is symmetric and           compared. Our experiments were performed on randomly
that P(0,i) = 1.                                                  generated connected graphs ranging in size from 32 to
                                                                  256 nodes. The experiments were also performed for
     For every node, two matrices R(i,k) and D(i,k) are           different node degrees. A typical simulation would be
constructed, where i{0,1,…,d}, k{1,2,…,N}, N is the             averaged over a 100 random graphs in each of which
number of nodes. If a message coming in from input port i         10,000 messages were exchanged.
to be forwarded to destination node k, should be routed to
output port R(i,k). Elements of R take values from 0 to d.             The following assumptions for our experiments are
The distance matrix D(i,k) is the length, in hops of the          similar to those used by [16,18]. All network channels we
path from input buffer i to destination node k. Distance          studied for the TPBR-algorithm are local, transpose and
matrix D is used at the first routing stage only, while the       are bi-directional and symmetric. The message length was
routing matrix R is used for on-line routing during the           constant (200 flits) and the input/output buffers in the
second stage. The total memory required to store these            wormhole routers were 1-flit deep. The message queues at
matrices is of the order of 2(d+1)N. For d=4, N=1,000 it is       each node are of infinite length. Output channel/buffer
around 10K words, and a hardware implementation for               contention is resolved using the FIFO queuing policy,
this algorithm is feasible.                                       with each incoming flit being time stamped on its arrival
                                                                  at the router input buffer. In our simulations, we used
     Now, we describe a decentralized procedure for               mostly uniform traffic pattern where each node can send a
constructing the matrices R and D at every node as                message to any other node with equal probability.
follows:                                                          Communications arising from nodes are independent and
                                                                  identically distributed by the Poisson process with the
  Ra and Da for node a are initialized as follows:               generation rate equal to p (messages/cycle/node, the
 Ra(i,j):=X, Ra(i,a):=0, Da(i,j):=X, Da(i,a):=0 where X is        probability of message generation for any cycle, at any
 interpreted as undetermined.                                     node). Also a separate experiment has been conducted
  At each step, elements of Ra and Da are recalculated,          that investigated the impact of different traffic patterns on
 using matrices R1,…,Rd, D1,… Dd of neighbors 1,…,d of            the latency and on saturation point. Generally
 node a. After t steps all paths of length up to t hops will be   performances of routing algorithms are measured in terms
 determined.                                                      of the average message latencies and saturation point
    The rule for step t (initially, t=1) is the following:       (throughput) which is defined as the highest sustainable
      if (Ra (i,j) = X) then for all m, neighbor of a that        message generation rates.
 P(i,m)=1
           if ( Dm (y,i) = t-1) then                                   First set of experiments provide additional evidence
                  { Ra (i,j):= m ; Da (i,j):=t }                  that supports choosing the fraction of the prohibited turns
                                                                  as the criterion for high-efficiency routing. This can be
 where y is the input port for node m incident on the
 neighboring node a.
                                                                  shown by comparing the message generation rates at
                                                                  which the network reaches saturation for different
                                                                  numbers z(G) of prohibited turns. In Fig.3, it can be seen
     For the hardware realization, at each step t every           that the networks with larger fraction z(G)=|Z(G)| / |T(G)|
node should transmit to its neighbors, messages with              reached saturation at lower generation rates. For random
numbers i, such that Dm (y,i) = t-1. During the entire pre-       graphs with N=256 nodes with degree d=4, the reduction
routing procedure, up to N such numbers can be sent by            in z(G) from 30% to 20% results in 100% increase for the
every link. The algorithm is terminated after L steps,            saturation point.
where L is the maximal possible length of the minimal
permitted path between two nodes, or if at some step t no              In the next set of simulation experiments
changes have been made in any of the matrices. We note,           performances of the TPBR-algorithm and the UP/DOWN
that the proposed algorithm can be used to construct a set        are studied. In Fig.4 average message latency in
of shortest paths for any given set of prohibited turns.          simulation cycles versus message generation probability
This property has been used in our simulations to                 for both algorithms are shown. At the saturation point the
                                                                  average latency graph has an almost infinite slope. It is
observed that on the average TPBR-algorithm provides             mean path length. For the uniform traffic pattern there is
performance improvements about 15% with respect to               no such restriction With the local traffic pattern the
maximum message generation rates over the up/down                saturation point is the largest and with the transpose
approach.                                                        traffic it is the smallest. This is a further experimental
                                                                 verification that messages that in general travel farther,
                                                                 i.e. have longer path lengths as in the case of transpose
                                                                 traffic, have longer average latencies.




   Fig. 3 Average saturation point versus fraction of
prohibited turns for random graphs with N=256 nodes of
                        degree = 4

                                                                 Fig. 5 Scalability of the TPBR-algorithm compared with
                                                                             the Up/Down routing algorithm.




Fig. 4 Average message latency versus generation rate for
           the up/down and TPBR-algorithms.
                                                                 Fig. 6 Performance of the TPBR-algorithm under different
     Third set of experiments deals with scalability issues                          traffic patterns.
for the TPBR-algorithm. We measure maximal
sustainable throughput for networks of different sizes. It
is observed that the algorithm scales well, offering better
performance for larger graphs. Persistent superiority of         5. CONCLUSIONS
the TPBR-algorithm over the up/down approach is clearly
visible in Fig.5.                                                    In this paper we proposed a new algorithm for
                                                                 wormhole routing in irregular topologies. We have
     Lastly in Fig.6 we see the general behavior for the         shown, that the fraction of turns, which should be
three traffic patterns. Three traffic patterns that we studied   prohibited to break all cycles in a channel dependency
for the TPBR-algorithm are local, transpose and uniform.         graph can be used as the efficient criterion for
In transpose traffic pattern, source and destination pairs       performance evaluation of routing strategies. evaluation
are chosen so that path length is greater than the mean          of routing strategies. We developed an algorithm which
path length. In the local traffic path length is less than the   generates irreducible set of prohibited turns containing
not more than 1/3 of total number of turns for any           [8] W. Dally and C. Seitz, L. "Deadlock-Free Message
topology. As far as we know this is the first published          Routing    in   Multiprocessor     Interconnection
non-trivial upper bound on the fraction of turns to be           Networks," IEEE Trans. on Comput. vol. 36, pp. 547-
prohibited to prevent deadlocks in networks. We also             553, 1987.
developed a decentralized algorithm for construction of
routing tables, based on selected sets of prohibited turns   [9] J. Duato "A New Theory of Deadlock-Free Adaptive
and minimizing average delivery time. The complexity of          Routing in Wormhole Networks," IEEE Trans. on
the proposed algorithms does not exceed O(N2d), where N          Parallel and Distributed Systems vol. 4, pp.1320-
is the number of nodes, and d is the maximal number of           1331, 1993.
ports in routers. The results of computer simulations        [10] J. Duato, S. Yalamanchili and L. Ni, M.
illustrate advantages of the proposed approach as                 Interconnection   Networks:    An      Engineering
compared with the existing up/down approach. The                  Approach, Los Alamitos, IEEE CS Press, 1997.
methods can be easily extended to the cases of several
virtual networks, adaptive routing and multicasting. (For    [11] E. Fleury and P. Fraigniaud, "A General Theory for
the latter case additional channel dependencies, due to           Deadlock      Avoidance      in     Wormhole-Routed
consumption channels, have to be taken into account.)             Networks," IEEE Trans. on Parallel and Distributed
                                                                  Systems, vol. 9, pp. 626-638, 1998.

6. ACKNOWLEDGMENTS                                           [12] C. Glass and L. Ni "The Turn Model for Adaptive
                                                                  Routing," Journal of ACM, vol. 5, pp. 874-902, 1994.
    The authors would like to thank Prof. L. B. Levitin
                                                             [13] L. Ni, M. and P. McKinley, K. "A Survey of
from Boston University for many useful suggestions. This          Wormhole Routing Techniques in Directed
work was supported by the NSF under Grant MIP
                                                                  Networks," Computer, vol. 26, pp. 62-76, 1993.
9630096.
                                                             [14] R. Boppana and S. Chalasani "A Comparison of
                                                                  Adaptive Wormhole routing Algorithms," Computer
REFERENCES                                                        Architecture News, vol. 21, no. 2, pp. 351-360, 1993.
[1] R.Liebeskind-Hadas, D. Mazzoni and R. Rajagopalan        [15] R. Boppana, V. and S. Chalasani "Fault-Tolerant
    "Tree-Base Multicast Routing in the Mesh with No              Wormhole Routing Algorithms in Mesh Networks,"
    Virtual Channels," Proc. of the First Merged Int.             IEEE Trans. on Comput. vol. 44, pp. 848-864, 1995.
    Parallel Processing Symp. and Symp. on Parallel and
    Distributed Processing, pp.244-249, 1998.                [16] B. Ciciani and M. Colajanni, Paolucci, C.
                                                                  "Performance Evaluation of Deterministic Routing in
[2] M. Schroeder and et al. "Autonet: A High-Speed self           k-ary n-cubes," Parallel Computing, no. 24, pp.
    configuring Local Area Network Using Point-to-                2053-2075, 1998.
    point Links," (Technical Report 59, DEC SRC, April
    1990).                                                   [17] Y. Boura, M. and C. Das, R. "Fault-Tolerant Routing
                                                                  in Mesh Networks," Proc. of Int. Conf. on Parallel
[3] L. Zakrevski, S. Jaiswal, L. Levitin and M.                   Processing vol. O, pp. 106-109, 1995.
    Karpovsky "A New Method for Deadlock
    Elimination in Computer Networks With Irregular          [18] C. Glass and Ni,L. "Fault-Tolerant Wormhole
    Topologies," Proc. of the IASTED Conf. PDCS-99,               Routing in Meshes," Proc. of Int. Symp. on Fault-
    1999                                                          Tolerant Computing, 1993.

[4] N. Boden and e. al. "Myrinet: A Gigabit per second       [19] L. Zakrevski and M. Karpovsky, G. "Fault-Tolerant
    Local Area Network," IEEE Micro, pp. 29-35, 1995              Message Routing for Multiprocessors," Parallel and
                                                                  Distributed Processing, pp.714-731, 1998
[5] R. W. Horst. "ServerNet™ Deadlock Avoidance and
    Fractahedral Topologies," Proc. of IEEE Int. Parallel    [20] L. Zakrevski and M. Karpovsky, G. "Fault-Tolerant
    Processing Symp., pp.274-280, 1996                            Message Routing in Computer Networks," Proc. of
                                                                  Int. Conf. on PDPA-99, pp.2279-2287, 1999.
[6] L. Zakrevski, A. "Fault-Tolerant Wormhole message        [21] L. Zakrevski and M. Karpovsky "Unicast Message
    Routing in Computer Communication Networks,"                  Routing in Communication Networks with Irregular
    Ph.D. Thesis, Boston University, College of                   Topologies," Proc. of CAD-99, 1999.
    Engineering, pp.55-102, 2000.
[7] W. Dally and H. Aoki "Deadlock-Free Adaptive
    Routing in Multiprocessor Networks Using Virtual
    Channels," IEEE Trans. on Parallel and Distributed
    Systems vol. 8, pp. 466-475, 1997.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:3/31/2012
language:
pages:6