Carrier Grade IP Networks

Document Sample
Carrier Grade IP Networks Powered By Docstoc
					MPLS – Carrier Grade IP Networks

                Carrier Grade IP Networks (May 2003)
                                                   Colenso van Wyk, Pr.Eng.

                                                                    QoS enabled IP networks.
   Abstract—The supply of profitable but more demanding                High availability is a must if IP networks are to become
services, such as packetised voice, and mission critical private    the universal infrastructure for high value applications.
networks (VPNs) has drawn the attention of service providers        Voice and private line services are the high revenue
to the reliability and availability aspects of the IP network.
They require their IP networks to comply with the same
                                                                    profitable services for telcos. Trusting these services to these
reliability standards as current legacy networks in order to        unreliable, unpredictable IP networks would be an
support real-time services such as voice and video. This paper      unnecessary risk which is why, despite predictions to the
will look at the various aspects affecting reliability of network   contrary, carriers have not done so.
elements and services and will show that in order to achieve the
levels of availability of the current legacy networks, these new                      II. DEFINING RELIABILITY
IP cores need to be built on a cornerstone philosophy of
reliability and availability rather than having reliability and        High dependability means several things : robustness and
available as a feature.                                             stability, traffic isolation, traffic engineering, fault isolation,
                                                                    manageability and the ability to provide a guaranteed QoS.
  Index Terms—Virtual Private Network (VPN), Quality of                A systematic view of the concepts of dependability
Service (QoS), Public Switched Telephone Network (PSTN),
                                                                    consists of three parts : the threats to, the attributes of and
Label Distribution Protocol (LDP), Forward Equivalency Class
(FEC), Multi Protocol Labels Switching (MPLS), Interior             the means by which dependability is attained [1].This
Gateway Protocol (IGP), Label Switched Path (LSP), Route            relationship is shown in Table 1.
Processor (RP), Routing Information Base (RIB).
                                                                    DEPENDABILITY             THREATS            FAULTS
                      I. INTRODUCTION                                                                            FAILURES
                                                                                            ATTRIBUTES           AVAILABILITY
   The stringent five nines or 99.999% availability figure
which is applicable to voice telephony will soon be required
for IP networks. This corresponds to an accumulated outage
of 5 minutes per year. [19]. The reliability of the current                                                      INTEGRITY
internet has been studied using different ISPs over several                                                      MAINTAINABILITY
months and this reported a average network unavailability                                      MEANS             FAULT PREVENTION
equivalent to 471 minutes per year. [7]. This equates to                                                         FAULT TOLERANCE
99.910% availability. On occasions connectivity in the                                                           FAULT REMOVAL
internet is lost for long periods while routers reconfigure                                                      FAULT FORECASTING
their routing tables and converge on a new topology. It was
                                                                    Table 1
also shown that the internet recovers slowly, with a average
BGP convergence time of 3 minutes and frequently taking                Threats can be defined in terms of faults, failures and
up to 15 minutes. By contrast to this SDH rings can restore         errors. There are several categories of faults which needs to
service through protection configuration within 50ms and            be protected against. These categories are design faults,
such a glitch will hardly be noticeable to the user. A study by     physical faults and interaction faults. A fault originally
the University of Michigen quantified the reasons for               causes an error within one or more components. An error is
network outages during a year long study of a particular ISP        detected if its presence is indicated by and error message or
and found that 33% of the outages could be directly                 error signal. Errors that are present but not detected are
attributed to router reliability or maintenance issues.[6]          latent errors. A failure is active when it produces an error.
                                                                       Dependability is an integrative concept that encompasses
   With these availability figures in mind we can evaluate an       the basic attributes shown in Table 1. These attributes can be
article published in the IEEE Communications magazine.              described using the following definitions :
This article states the following : “All critical elements now
                                                                     Availability : Readiness for correct service
exist for implementing a QoS-enabled IP network”. This
                                                                     Reliability : Continuity of correct service
statement is true if we look at the recent developments in           Safety : Absence of catastrophic consequences on the user
MPLS switching and the associated QoS and traffic                    and the environment
engineering mechanisms, but is does not imply any                    Confidentiality : Absence of unauthorized disclosure of
improved reliability or availability of these services on the        information
                                                                     Integrity: Absence of improper system state alterations
MPLS – Carrier Grade IP Networks

 Maintainability : Ability to undergo repairs and                     5) Acts of nature
 modifications                                                       Damage cause by rain, snow, lightning, fire, wind and
   Besides the attributes shown in Table 1, other secondary           6) Vandalism
attributes could also be defined. An example of this could be        Internal harm to telephone network. This study was done
robustness which indicates the dependability of the system        in the US and therefore the contribution of vandalism would
with respect to external faults. The notion of secondary          be higher if the study was based on local conditions.
attributes is especially relevant for security and here we can        7) Overload/Congestion
distinguish between accountability, authenticity and non-            Failures caused by exceeding the network capacity.
reputability. Dependability is can also be referred to as
survivability.                                                      The contribution of these metrics to the availability of a
   Finally there are means through which dependability can        PSTN under study between 1992 ad 1994 [19] indicated the
be achieved . The development of a dependable system calls        contribution towards outages as shown in Table 2.
for the combined utilization of a set of four techniques as
shown in Table 1.                                                 Category                    Weight
                                                                  Overload/Congestion         44%
 Fault prevention : How to prevent the occurrence or              Human Company               14%
 introduction of faults                                           Human External              14%
 Fault tolerance : How to deliver correct service in the          Nature                      18%
 presence of faults                                               Hardware                    7%
 Fault removal : How to reduce the number or severity of          Software                    2%
                                                                  Vandalism                   1%
 Fault forecasting : How to estimate the present number,
 future incidence, and likely consequence of faults               Table 2
                                                                     Table 3 shows the same metrics above but the information
   This concept which is outlined in this section is central to   has been translated into customer minutes. This shows a
understanding and mastering of the various threats that may       significant difference from the data in Table 2
affect a system. These are the elements that we need to use
as building blocks in our philosophy when designing next          Category                    Weight
generation network elements and services .                        Human Company               25%
                                                                  Human Others                24%
           III. VOICE NETWORK DEPENDABILITY                       Hardware failures           19%
   The PSTN is a widely used, highly reliable, fault tolerant     Software failures           14%
system. The PSTN’s dependability stems from a design that         Acts of Nature              11%
successfully exploits the loose coupling of system                Overload/Congestion         6%
components. The PSTN has many similarities with other             Vandalism                   1%
types of distributed systems and therefore distributed system     Table 3
survivability concepts are applicable to the design of the
PSTN.                                                                Despite the enormous size of the PSTN it averaged and
    The outages which occur in the PSTN can be classified         availability rate better than 99.999 percent in the period of
into seven categories. [20]                                       the study [19]. The question is why the worlds largest
    1) Human Company                                              computerized distributed system is also amongst the most
   This includes failures caused by humans that are in any        reliable. The answer to this question lies in the following :
way involved with the telephone company including                   A. Reliable Software and Hardware
company contractors and company vendors who are in
                                                                     Telephone switch manufacturers focus much of their
contact with the network equipment. Some of these errors
                                                                  research on developing highly reliable systems. Designers
could be caused by procedural errors, accidental cable cuts
                                                                  devote about half of the software in telephone switches to
and erroneous hardware replacement.
                                                                  error detection and correction. Their software development
    2) Human External
                                                                  process typically incorporates the most sophisticated
   This includes errors made by humans which are in no way
                                                                  practices, supplemented by elaborate quality assurance
affiliated with the telephone company. Errors in this
category primarily include cable cuts due to unreported
                                                                     This is complimented with sophisticated fault tolerant
                                                                  hardware design where parallel processes could detect and
    3) Software
   This includes faults directly related to faults in software    bypass faults internally in the network element within 125uS.
    4) Acts of Nature                                             This internal protection activity would be transparent to the
   This includes failure of network components as well as         network connected to this element
power outages and cable corrosion.                                  B. Dynamic Rerouting
                                                                     Intermittent failures are usually not catastrophic. A brief
                                                                  failure in one network component will not affect national
MPLS – Carrier Grade IP Networks

PSTN availability figures significantly. However in order for     communication between most Internet core backbone
the PSTN to reroute calls it must keep a good deal of             routers. [8].
information globally and maintaining consistent distributed
                                                                    C. Fault repair characteristics
databases can require complex interactions amongst system
elements. Two factors can be identified as significant to            While the Internet backbone routing protocol, BGP, is
contributing to a systems safety/dependability. These are         believed to be resilient to faults and converge on new routes
interactions and coupling.                                        very quickly, measurements shows that the time to repair in
   Systems with simple linear interactions have components        the case of a fault is actually in the order of minutes,
that affect only other components that are functionally           sometimes taking up to 30 minutes. Several findings have
downstream. Loosely coupled systems have more flexibility         been made through simulation of 400 Internet routing
in time constraints, operation sequencing and assumptions         topologies [8] and the following observations was made.
about the environment than do tightly coupled systems                 1) The upper bound on delay when a route to a
                                                                      destination fails is linearly related to the length of the
  C. Loose Coupling                                                   longest possible path between the source and the
The PSTN can be classified as loosely coupled because it              destination.
can reroute calls dynamically along many paths.                       2) On average, larger ISPs provide faster repair times
                                                                      than smaller ISPs for a given route.
  D. Human Intervention                                               3) Misconfiguration is frequently the cause of errant
Operators monitor telephone switches 24 hours a day and               paths and BGP is currently venerable to these and they
usually have the ability to modify switch databases on the            could potentially cause major outages. Secure BGP is
fly.                                                                  being looked at to address these vulnerabilities.

             IV. IP ROUTING CHARACTERISTICS                          Generally it can be said that the lack of fail over due to
   It is widely believed that the Internet is a highly fault      delayed BGP routing convergence will potentially become
tolerant, survivable network. In particular, the Internet is      one of the key factors in holding back the deployment of IP
attributed with the ability to route packets around faults        networks for carrying highly reliable services. The industry
quickly, in a matter of seconds. However, empirical data          is reacting to this shortfall and several approaches are being
gathered for over a 10 month period from the experimental         used to address this. The detail of these is presented in the
injection and measurement of several hundred thousand             following sections.
inter-domain routing faults that shows the time required for
Internet backbone routing protocols to re-route around            V. NEXT GENERATION NETWORK ELEMENT REQUIREMENTS
failures is actually several orders of magnitude longer,             The high availability figure of 99.999%, which is a
sometimes taking more than 30 minutes. [8].                       condition for a carrier grade IP network, can only be
                                                                  achieved by deploying truly fault-tolerant network elements.
  A. Route Availability and Failure
                                                                     Therefore a new breed of network element needs to be
   Availability can be defined as the duration of which a path    defined for the purpose of building next generation networks
to a network destination was present in a providers network.      based on IP cores. The reason for this re-definition is that
Route availability data analyzed [8] indicates that 65% of        traditionally IP cores were primarily used for carrying non-
the internet routes from three providers shows a route            SLA related traffic. This scene is now changing and we need
availability of an order of magnitude less than the PSTN          the core elements of tomorrow to adopt the resilience
which is at 99.999% [19].                                         characteristics of yesterday’s network elements. Relating the
   The rate of failure and fail-over can also be analyzed for     old and the new technologies is not simplistic because the
inter-domain paths. Failure is defined as a loss of a             features are different but metrics could be defined for setting
previously available routing table path to a given network        realistic element requirements which will ease the operator’s
while fail-over of a route represents a change in inter-          selection process when choosing a suitable equipment
domain reachability for that route. If we examine data            vendor as partner. British Telecom has done some
gathered in a study [8] we see that 50% of routes from three      groundwork in this regard [5] and found the key concerns
providers exhibit a mean-time to failure of 15 days or less. It   prioritized by carriers today as shown in Table 4.
also shows that 75% and more of the routes failed at least
once within a 30 day period.                                       1   Equipment reliability and Stability
  B. Fault Occurrence characteristics                              2   Scalability
   Two historical incidents which directly or indirectly           3   Performance
affected the majority of internet backbone paths causing           4   Feature Support
major Internet failures shows what the characteristics af          5   Management
failures are.                                                      6   Total cost of ownership
   April 25, 1997 – Misconfigured router causes en effective       7   Environmental considerations
shut down of major Internet backbones for up to two hours.        Table 4
                                                                    In addition to these there are also a number of future
   November 8, 1998 – Router interoperability issue causes
                                                                  requirements identified by carriers. These are shown in
persistent, pathological oscillation and failure in the
MPLS – Carrier Grade IP Networks

Table 5.                                                           availability                         existing PSTN switches
                                                                   Table 6
1    Denial of service attack mitigation
2    Wire rate performance of interfaces
                                                                     B. Scalability
3    System access security
4    Port density improvements                                        A core router must be able to evolve to satisfy changing
5    Quality of service support                                    network demands without requiring full platform
                                                                   replacement. The hardware of a core router must be
Table 5                                                            expandable in service and the software must scale
  If we analyze the top three requirements in Table 4 we can       proportionally [5]. Scalability should be possible over a five
derive associated metrics describing them.                         year lifetime without the need of chassis replacement and it
                                                                   should be possible to upgrade any equipment without
  A. Reliability and Stability                                     affecting traffic.
   “The reliability and stability of router hardware and
                                                                     C. Performance
software are absolutely vital to global carriers as they seek to
deliver a reliable, resilient network capable of meeting              The features which are essential here are wire speed
service availability targets. Significant improvements are         packet forwarding and efficient protocol implementations.
required in these areas, particularly in the area of software      Metrics could be defined to evaluate efficient protocol
reliability, if carriers are to realize their ambitious plans to   implementations.
support a range of mission critical and real-time services for        In the case of BGP we could derive a route capacity and a
business customers”. [5]. The areas of interest are shown in       route flap profile associated with this capacity both short
Table 6.                                                           term and continuous. BGP convergence time could also be
                                                                   specified. For IGP we could set metrics for total route table
Feature        Requirement            Metric                       capacity, TE support and convergence time. Several metrics
System         To minimize            Availability to              could be derived for MPLS which includes LDP capabilities,
availability   downtime of            underwrite a max of 2        number of TE tunnels supported, number of FEC supported
               network services       hrs downtime of overall      as well as protection performance.
                                      network service in              The remaining concerns are not less important but through
                                      40yrs.                       defining the first three concerns in more detail gives an
                                      Full system reboot           impression of what a core router should look like.
                                      within 180s
Hardware       1+1 route                                                     VI. MAKING IP ROUTING MORE RELIABLE
redundancy     processor                                              State-of-the-art Border Gateway Protocol (BGP) routers,
               1:N power supply                                    after booting, initializing interfaces, establishing Interior
               1:N fans                                            Gateway Protocol (IGP) relationships and synchronizing the
               1:N switch fabric                                   IGP databases, require anywhere from two to ten minutes, or
               1:N interface cards                                 more, to retrieve all the BGP paths from neighboring BGP
Software       1+1 control plane                                   routers. A single software upgrade or control card failure on
redundancy     Core base OS with      Demonstratable               most of today’s routers thus ruins a service provider’s
               separate sub-          isolation of software        chances of achieving the desired ‘five 9s’ network
               components (e.g.       failures
                                                                   availability., This high availability figure is still required,
                                                                   despite the fact that MPLS cores are, as network layer
Hittless       Minimize               Full OS upgrade with
                                                                   reachability needs to be learned via some mechanism (BGP
software       downtime               no loss of peerings (e.g.
upgrades                              BGP)                         today). The problem becomes more apparent, not less visible
                                      Sensible fallback            as some would claim, when service providers begin to
                                      Isolated sub component       deploy MPLS-based Virtual Private Networks (VPN) since
                                      upgrade with no packet       the network reachability information must first be distributed
                                      loss                         and aligned to update the routing tables before the BGP can
               Minimize time ,        60% reduction in OS          distribute Virtual Local Area Network (VLAN) membership
               complexity and         upgrade cost                 information over the network.
               impact on service      No loss of device               While there is a wide consensus about the need for a
                                      manageability during sw      highly available internetwork, suppliers and standardization
                                      upgrades and no loss of      bodies are formulating different solutions. Basically there
                                      traffic forwarding           are two schools of thought: in-the-box solutions with non-
Hitless        Minimize               Hot swappable line           stop routing and network solutions that rely on the
hardware       downtime and           cards, route processor       participation of the network, or at least of the immediate
upgrades       operational impact     PSU and fans                 neighbors.
Software       High MTBF              Comparable with
availability                          existing PSTN switches
Hardware       High MTBF              Comparable with
MPLS – Carrier Grade IP Networks

  A. Network solutions or graceful restart                       availability. Forwarding continues even during flapping of
   The industry has adopted various names for the network        routes caused by network instabilities. In the public network
based solutions . Amongst these are graceful restart             environment, which is sensitive to instability and subject to
mechanisms, restart signaling mechanisms or hitless restart      denial-of-service attacks, the lack of a clear separation
mechanisms. A selection of these mechanisms is applicable        between these functions has frequently proved disastrous for
to the Inter-Domain Routing (IDR) protocol (i.e. BGP) as         operators, who experience a negative press and an exodus of
well as to the IGPs (e.g. OSPF and IS-IS). These network         customers after being hit by a network outage. Implementing
solutions rely on the ability of routers to keep their data      fully duplicated processing within the control plane and
plane operational while restarting their control plane. The      redundant communication towards the interfaces of the
forwarding engines continue to route packets according to        router is a first step towards non-stop routing but is not
the network topology learned up to the beginning of the          sufficient. Both Route Processors (RP) in the control plane
restart process. This blind operation poses a significant risk   communicate with the protection switched interfaces of the
of routing loops and black holes in the network if routing       router via a switch fabric consisting of several redundant
information changes before the restarting router completes       planes. The next step towards non-stop routing is achieved
it’s routing table updates and convergence.                      by both RPs operating in symbiosis. One is selected as the
   Topology changes which occur during the restart are           active RP while the other one is maintained in a hot standby
ignored and introduced later when the control plane recovers     mode of operation. Each routing protocol on the active
from its outage. Obviously, the neighboring routers are          control card reconciles its data and protocol state with the
informed about the preservation of the forwarding state of a     related protocols on the inactive card. This reconciliation is
router during restart. This is done through a “I will be back”   done sequentially, protocol by protocol, incrementally
indication to the neighbour. This prevents them from             keeping track of changes during reconciliation. In the case of
resetting their adjacencies (i.e. routing information to the     TCP, which is used by BGP, the inactive TCP/IP stack is
neighboring router or routers), and ensures that packets are     kept synchronized to the active. All the other transport
still routed through the restarting router. Consequently, the    protocols are stateless, so synchronization is unnecessary.
network solution will only work if all neighboring routers       The RIB of the inactive RP is updated by the routing
provide support for the restart procedure. The ability to        protocols running on the inactive RP. Once both RPs are
remain on the forwarding path during restart is also known       fully aligned, an activity switch can be triggered at any time
as “non-stop forwarding” or “headless routing”; it is not to     either by a failure detection mechanism or by the operator.
be confused with non-stop routing. Network solutions only
minimize the negative effects on routing resulting from              2) Benefits of non-stop routing
planned control plane restarts but they do not solve the            The implementation of non-stop routing increases network
problem.                                                         stability and availability during both planned and unplanned
   Another approach used by IP network providers is to           switching between control cards. Before this functionality
duplicate non-reliable routing elements in order to provide a    was available, a router would temporarily disappear from the
more dependable service. This approach has limitations in        network topology, causing route flapping and recomputation
terms of implementation complexities, increase in CAPEX          of shortest paths by its peers. When the router returned to
and OPEX.                                                        life, an intensive exchange of routing information would take
                                                                 place, again raising the risk of routing storms and route
  B. Non-Stop Routing                                            flapping (as neighboring routing tables were updated), which
   The principle of non-stop routing is similar to that          could result in transient routing loops and packet loss in the
implemented by PSTN switch manufactures. It enables the          network. With non-stop routing, the neighboring routers are
element to perform internal error detection, protection and      totally unaware of control card activity switches resulting
restoration which is transparent to the network. This            from failures or software upgrades. The routing topology
mechanism allows the operator to bridge one of the major         and reachability are unaffected. Not only is the inter domain
stumbling blocks in achieving high availability on IP            routing protocol BGP made non-stop, but also the interior
networks namely the re-convergence time of BGP and IGP           gateway protocols, such as Open Shortest Path First (OSPF)
in the case of MPLS.                                             and Open Systems Interconnection (OSI) Intermediate
                                                                 System to Intermediate System (IS-IS) intra-domain routing
    1) Principles of non-stop routing                            exchange protocol. As BGP is running on a Transmission
   In any given router architecture, one can distinguish         Control Protocol (TCP) session, the TCP state is also
between the “muscles” of a router (the forwarding engine in      preserved to achieve a true non-stop control plane.
the data plane) and the “brains” of a router (the routing           Non-stop routing enables hitless software upgrades.
engine in the control plane). The forwarding engine moves        Software can be upgraded on the inactive processor. Activity
incoming packets through the destination lookup process at       is then switched over from the active processor and the new
wire speed, maximizing the throughput of the router. The         software run, without causing a single route to flap in the
routing engine calculates the most efficient route to any        network. Using non-stop routing, the router and the network
Internet destination from the control information received in    are protected against hardware failures. If the active control
routing protocol updates, and downloads that information to      card hardware fails for any reason, the inactive takes over,
the forwarding engine. Separating the forwarding and             again without causing a single route to flap in the network.
routing engines, dramatically improves overall system            Similarly, non-stop routing protects against software
MPLS – Carrier Grade IP Networks

failures. If a software fault causes the active control card to      [6]     Reliable Routing and Forwarding, BT Exact Technologies
crash, the inactive takes over and, if it was the routing code       [7]     Is IP going to take over the world of communications, Pablo
                                                                             Molinero-Fernandez et al., Stanford University
that caused the router to crash, immediately tears down the
                                                                     [8]     Resilience characteristics of the internet backbone routing
peering session that caused the failure, without flapping any                infrastructure, Craig Lanovitz et al, Microsoft research
other routes in the network.                                         [9]     Next generation core routing switches, Alcatel technical paper
                                                                     [10]    Fast restoration of real time communication service from
    3) Non-StopRouting Performance                                           component failures in multi hop networks, Sengejae Han and
   Implementations of Non-Stop Routing have been tested                      Kang G. Shin, The University of Michigan
by independent LABs who had found “the software to be a              [11]    MPLS : Resilient and scalable, MPLS Forum and EANTAC
                                                                             test report MPLS World 2003 congress
state of the art resiliency technology providing the switch          [12]    MPLS : Architectural considerations for OAM, presentation
with the ability to recover from both hardware and software                  at MPLS world 2003
failures with zero impact on both forwarding and routing             [13]    T1A1.2/98-001R6, Reliability and survivability aspects of
protocols”.[21]. Tests have also found that neither software                 interactions between internet and PSTN,
upgrades, hardware failures or software failures caused any          [14]    T1A1.2/2003-006, Draft Standard, Reliability of next
loss of BGP peering, TCP retransmission or packet loss in                    generation network elements
                                                                     [15]    Performance and QoS Testing, Sebastien Maillet, Next
the dataplane. [22]                                                          generation networks,
  C. Path protection and restoration                                 [16]    7670 RSP IP performance and QoS test results, BT Exact
   Another method introduces with MPLS is the use of fast            [17]    T1.TR.78-2003, Technical Report, Access availability of
rerouting which is a node protection mechanism deployed to                   routers in IP networks
minimize packet loss during LSP failure by rerouting traffic         [18]     T1.TR.70-2001, Technical report, Reliability framework for
onto backup LSPs. The restoration time for services using                    IP based services
these protection mechanisms has recently been tested in an           [19]    Sources of failure in the PSTN, D. Richard Kuhn, National
                                                                             Institute of Standards and Technology
interoperability event amongst multiple vendors and the              [20]    Lessons from the PSTN for dependable computing, P.
restoration time for services across the LSPs was measured                   Enriguez et al., A study of FCC disruption reports
between 14 and 34ms [11].                                            [21]    Alcatel 7770 Summary report, BT Exact Technologies test
  D. Operation Administration and Maintenance                        [22]    Alcatel 7670 RSP non stop routing and routing protocol test,
   The primary goal of OAM is to ensure the customer is                      BT Exact Technologies
receiving the expected service. The MPLS forum is currently
developing OAM functionality to be used in the MPLS
plane. The functions to be performed includes connectivity                                      IX. BIOGRAPHY
verification, forward defect indication and reverse defect              Colenso van Wyk is a Professional Engineer working as
indication. Other OAM mechanisms include the definition of           Principal Systems Consultant in the Alcatel Fixed Network
LSP Ping which will offer a diagnostic ability for service           Division. Based in South Africa he has been working in the
verification across MPLS networks.                                   telecommunications industry as end user, channel, consultant
                                                                     and equipment manufacturer since 1992. During the 10 year
                       VII. CONCLUSION                               period he successfully completed projects ranging from
   When we plan to build a new MPLS based infrastructure             software design and implementation to circuit, packet, cell
for hosting the future IP based telecommunications services          and MPLS switching network design.
we need to ensure that the reliability of this network is equal
                                                                           e-mail :
or better than that provided by current solutions. The most
                                                                           Tel : +27 11 542 3000
important objective is to achieve at least 99.999%
                                                                           Fax : +27 11 542 3284
availability on this new network. In order to achieve this we
                                                                           P.O. Box 8443
will have to evaluate more that the element’s feature set, but
                                                                           Halfway House, 1685
take a holistic view of the overall architecture, network
design philosophy, methodology and track record of the
equipment vendor in designing, building and deploying
highly resilient service platforms.

                       VIII. REFERENCES
[1]   Fundamental Concepts of dependability, Algirdasn et al.,
      Research report from the University of Newcastle
[2]   MPLS : adding value to networking, Rudy Hoebeke et
      al.,Alcatel Technology Whitepaper
[3]   Building reliable IP networks with non-stop routing, Alcatel
      Technology Whitepaper
[4]   Carrier Grade IP, Daniel Hoefkens, Alcatel Technology
[5]   Carrier requirements of Core IP routers 2002, BT Exact

Shared By: