APS

Document Sample
APS Powered By Docstoc
					     Automatic
Protection Switching
  Yaakov (J) Stein
  CTO
  RAD Data Communications




                        Mar 2012
Course Outline
• General protection switching principles

• Examples of protection mechanisms

    • SONET/SDH
    • Ethernet linear protection
    • Ethernet ring protection
    • MPLS fast reroute
    • MPLS-TP APS




                                            Y(J)S APS Slide 2
General principles

      Definition
      References
      Traffic types
      Network topologies
      Triggers
      Protection classes
      Entities
      Protection types
      Signaling



                           Y(J)S APS Slide 3
                                       Definition
Automatic Protection Switching (APS)
   is a functionality of carrier-grade transport networks
   is often called resilience
          since it enables service to quickly recover from failures
   is required to ensure high reliability and availability

APS includes :
   detection of failures (signal fail or signal degrade) on a working channel
   switching traffic transmission to a protection channel
   selecting traffic reception from the protection channel
   (optionally) reverting back to the working channel once failure is repaired

Automatic means uses (at most) control plane protocols
   – no management layer or manual operations needed

                                                                                  Y(J)S APS Slide 4
                       Some useful references
G.808.1 – generic linear protection
G.808.2 – generic ring protection (not yet written)
G.841 and G.842 – SDH
G.774.3/4/9/10 – SDH protection management
G.870 and G.873.1 – OTN
G.8031 – Ethernet linear protection
G.8032 – Ethernet ring protection
G.8131 – T-MPLS APS
Y.1720 – MPLS
I.630 – ATM
M.495 – analog signal protection
G.781 – clock selection (can be used to protect synchronization)
RFC 4090 – MPLS Fast ReRoute
RFC 6372 – MPLS-TP Survivability Framework
RFC 6378 – MPLS-TP Linear Protection


                                                                   Y(J)S APS Slide 5
                                   Traffic types
In a network with APS capabilities, there are three types of traffic :

   protected traffic
    – traffic that may be rapidly switched to protection channel
    –   at any time it may be on the working channel or protection channel

   Nonpreemptible Unprotected Traffic (NUT)
    – noncritical traffic that does not require protection mechanism
    – not affected by protection mechanism
    – somewhat less expensive to customer

   extra (preemptible) traffic
    – best effort background traffic that runs on protection channel
    – preempted (blocked) when protection channel is needed
    – very inexpensive to customer


                                                                             Y(J)S APS Slide 6
                         Network topologies
APS can be defined for any topology with redundant links
    e.g., for tree topologies no protection is possible
We will often discuss protection of individual links
However, there are two topologies that are of particular interest :

   rings
     – protection is natural for rings
           although there are other reasons for using rings as well

     – rings are so important that protection for other topologies
           is often called linear protection




   dense meshes
     – for this topology multiple local bypasses can be preconfigured
     – protection switching is similar to routing change, but faster
          often called “Fast ReRoute” (FRR)




                                                                        Y(J)S APS Slide 7
                                           Triggers

Protection switching is usually triggered by a failure
    although the operator may manually force a protection switch

A failure is declared when a fault condition
    persists long enough
         for the ability to perform the required function
         to be considered terminated
Failures are Signal Fail (SF) or Signal Degrade (SD) (of various types)
and may be :
   detected by physical layer
   indicated by signaling (e.g. AIS)
   detected by OAM mechanisms

When there is no SF or SD, the state is called No Request (NR)

                                                                          Y(J)S APS Slide 8
                                             Switching time (1)
SONET/SDH protection switching takes place in under 50 ms
Regarding multiplex section shared protection rings, G.841 states :
The following network objectives apply:
      1) Switch time – In a ring with no extra traffic, all nodes in the idle state (no detected failures,
      no active automatic or external commands, and receiving only Idle K-bytes), and with less
      than 1200 km of fibre, the switch (ring and span) completion time for a failure on a single
      span shall be less than 50 ms. On rings under all other conditions, the switch completion
      time can exceed 50 ms (the specific interval is under study) to allow time to remove extra
      traffic, or to negotiate and accommodate coexisting APS requests.

while for linear VC trail protection, it says :
The following network objectives apply:
      1) Switch time – The APS algorithm for LO/HO VC trail protection shall operate as fast as
      possible. A value of 50 ms has been proposed as a target time. Concerns have been
      expressed over this proposed target time when many VCs are involved. This is for further
      study. Protection switch completion time excludes the detection time necessary to initiate the
      protection switch, and the hold-off time.

There are similar statements in other clauses as well
                                                                                                             Y(J)S APS Slide 9
                                  Switching time (2)
This 50 ms time has become the golden standard
    and new protection schemes are expected to meet this objective
However, studying the literature that lead up to SONET/SDH standards
    shows that the objective was to attain the minimum possible time
    for the sum of
    –   persistent (i.e. non-transient) failure detection
    –   speed of light propagation
    –   signaling protocol time
    –   regaining sync alignment
and 50 ms was the minimum that was considered practical !
Many modern standards have “built in” 50 ms
    and much marketing literature boasts “faster than 50 ms”
But there is really nothing special about 50 ms
   50 ms gaps in voiced speech are noticeable,
       but not fatal if infrequent
   50 ms of data at high rates can not be stored and later forwarded
   timing circuits can withstand much more than 50 ms without clock
                                                                        Y(J)S APS Slide 10
                              Protection classes

It is useful to distinguish two different protection classes

   path protection (AKA trail protection, end-to-end protection)
    – when a failure is detected on the end-to-end path
        we switch to an alternative end-to-end path
    – the failure is usually detected by end-to-end OAM

   local protection (AKA local restoration, SNC protection, bypass, detour)
    – we protect individual network elements, links, or groups of same
    – when such an entity fails
        only that local entity is bypassed
    – the failure may be detected by link OAM or physical layer means


                                                                        Y(J)S APS Slide 11
                                APS entities (1)
The following entities are important in APS
   working channel – channel used when no failure exists
   protection channel – channel used when a failure exists
   head-end – entity transmitting data to working/protection channel
   tail-end – entity receiving data from the working/protection channel

Note:    we will usually consider traffic to be bidirectional
         so that the head-end for one direction
         is the tail-end for the opposite direction

                              working channel


                             protection channel
          head-end                                       tail-end



                                                                           Y(J)S APS Slide 12
                                  APS entities (2)
   Bridge – function at head-end that connects traffic (including extra traffic) to the
    working and protection channels
   Selector – function at tail-end that extracts traffic (perhaps extra traffic) from
    the working or protection channel
   APS signaling channel – channel used to communicate between head-
    end and tail-end for APS purposes
   Trail termination – function responsible for failure detection
    including injection and extraction of OAM

                                  working channel
        head-end                                                        tail-end
                                  protection channel
         (bridge)                                                      (selector)
                                  signaling channel


                                                                                    Y(J)S APS Slide 13
                          Revertive operation

Reversion means returning to use the working channel
   after the failure has been rectified
Protection mechanisms can be revertive or nonrevertive

Revertive mechanisms may be preferable
  when the working channel has better performance (free BW, BER, delay)
  when there are frequent switches (easier to manage)
  when there is extra traffic
but nonrevertive also has advantages
   only one service disruption due to protection switching
   may be simpler to implement



                                                                     Y(J)S APS Slide 14
                                 Uni/bi-directional
  We will usually consider bidirectional traffic
  but even then the failures can be uni- or bi- directional
  and for unidirectional failures there can be uni- or bi- directional switching

                                   unidirectional
unidirectional                       protection working channel
    failure
                                                   protection channel in use
                                                                         working channel

                                                                        protection channel

                                    bidirectional
bidirectional                        protection working channel
   failure
                                                   protection channel in use
                                                                         working channel


                                                                 protection channel in use
                                                                                      Y(J)S APS Slide 15
                   Uni- / bi- directional switching
Unidirectional switching may be advantageous
   for 1+1 - faster and no signaling channel is needed
   no unnecessary service disruption for direction without failure
   higher chance of protection under multiple failures
   easier to implement for local protection
   maintains extra traffic in direction without failure

But bidirectional may be preferable
   easier management since directions traverse same network elements
   does not disrupt delay balance between direction
   may simplify repair since failed spans are unused




                                                                      Y(J)S APS Slide 16
                             Protection types

We distinguish several different protection types
  1+1
  1:1
  1:n
  m:n
  (1:1)n

Each type has its applicability, advantages, and disadvantages
and there are trade-offs between
   simplicity
   BW consumption
   protection switch time
   signaling requirements


                                                                 Y(J)S APS Slide 17
                                  1+1 protection
Simplest and fastest form of protection
   but wasteful - only 50% of actual physical capacity is used
Head-end bridge always sends data on both channels
Tail-end selector chooses channel to use (based on BER, dLOS, etc.)
For unidirectional1+1 switching there is no need for APS signaling
If non-revertive
     there is no distinction between working and protection channels



                                    channel A




                                    channel B


                                                                       Y(J)S APS Slide 18
                                  1:1 protection
Head-end bridge usually sends data on working channel
When failure detected it starts sending data over protection channel
  and tail-end needs to select the protection channel
When not in use, protection channel can be used for extra traffic

However, since failure is detected by tail-end, APS signaling is needed

Protection channel should have OAM running to ensure its functionality

                              working channel


                                extra traffic
                              protection channel

                               APS signaling
                                                                          Y(J)S APS Slide 19
                                   1:n protection
One protection channel is allocated for n working channels
Only can protect one working channel at a time
but improbable that more than 1 working channel will simultaneously fail
Only 1/(n+1) of total capacity is reserved for protection




                                 working channels

                                  protection channel

                                                                       Y(J)S APS Slide 20
                                 m:n protection
To enable protection of more than 1 channel
m protection channels are allocated for n working channels (m < n)
m simultaneous failures can be protected
Less protection capacity dedicated than for n times 1:1
When failure detected,
  1 of the m protection channels need to be assigned and signaled
High complexity but conserves resources




                                working channels



                                protection channels
                                                                     Y(J)S APS Slide 21
                                (1:1)n protection
This is like n times 1:1 but the n protection channels share bandwidth
Only 1 failed working channel can be protected
This is different from 1:n since
 n protection channels are preconfigured
 n working channels need not be of the same type

Protection bandwidth must be at least that of the largest working channel




                                                                         Y(J)S APS Slide 22
                                  APS algorithm

We have seen that protection switching is a tricky business
So it is not surprising that network elements that support APS
    run an APS algorithm

This algorithm inputs :
   configuration (protection type, revertive?, available channels, …)
   failure indications (NR, SF, SD)
   operator commands
   APS signaling (more on that soon)
and makes switching decisions

The algorithm maintains state information for head-end and tail-end

APS algorithms are detailed in standards documents

                                                                         Y(J)S APS Slide 23
                                        Priority

Not every failure event / operator command results in a protection switch

For example
    in 1:n protection the protection channel may already be in use !

Conflicts are resolved by assigning priorities to events/commands

When an event is detected or a command received
    the APS algorithm will not act
    if an event/command or equal or higher priority is already in effect

True failure conditions usually have higher priority than manual commands




                                                                           Y(J)S APS Slide 24
                                          Timers
Even failure events with priority are not acted upon immediately
   to do so would cause unnecessary switches after transient defects
The APS algorithm may maintains several timers, such as
   Holdoff timers
    – the time between detection of a SF or SD event
       and the APS algorithm acting upon this even
    – the algorithm usually used is called “peek twice”
       i.e., the condition is checked again after the timer expires
   Wait To Restore timer
    – for revertive switching, the time between detection of the failure being
       cleared and the APS algorithm acting upon this event
    –   also used in SDH optimized bidirectional 1+1 (nonrevertive)
   Guard timer
    – for rings – blockout time during which APS messages are ignored (since
        they may be old and outdated)

                                                                       Y(J)S APS Slide 25
                                     APS signaling

In all types except unidirectional 1+1, some APS signaling is needed
APS signaling is used to synchronize between head-end and tail-end
It is critical that head-end and tail-end always be in the same state

Example messages include :
   No Request (NR)
   by tail-end to inform head-end of Signal Failure (SF)
   by head-end to confirm the event’s priority
   by head-end to report the particular protection channel
   by head-end to inform tail-end of Reverse (bidirectional) Request (RR)
   by tail-end after failure cleared to Wait To Restore (WTR)
   by tail-end after failure cleared to Do Not Revert (DNR) for nonrevertive


                                                                         Y(J)S APS Slide 26
                            APS signaling phases

When APS signaling is used, it needs to be as rapid as possible
Depending on the scenario it may be
   1-phase tailhead (fastest)
    –   tail-end informs head-end of failure
    –   both ends uniquely know the protection channel to be used
    –   only for 1+1 and unidirectional-(1:1)n    (including 1:1)
   2-phase    1) tailhead 2) headtail
    –   tail-end informs head-end of failure
    –   head-end signals that it has switched to protection channel
    –   not for bidirectional-1:n or m:n
   3-phase 1) tailhead 2) headtail 3) tailhead (slowest)
    –   works for all protection types (including m:n)

                                                                      Y(J)S APS Slide 27
                           Examples of 1-phase

Example of when 1-phase signaling is possible is 1:1 or (1:1)n
1. upon detection of failure the tail-end sends SF to the head-end
    and immediately changes its selector (blind switch)
    upon receipt the head-end changes the bridge setting
    (no priority is checked)


1-phase can also be used for bidirectional 1:1
1. upon detection of failure the tail-end sends SF to the head-end
    and immediately changes both its selector and bridge
    upon receipt the head-end changes its bridge and selector




                                                                     Y(J)S APS Slide 28
                            Example of 2-phase

2-phase is useful for unidirectional 1:n with priority checking
1. upon detection of failure the tail-end sends SF to the head-end
    but does not change its selector
2. the head-end checks priority
    sends confirmation to tail-end (with identity of working channel)
    the bridge setting is changed
3. the tail-end changes its selector




                                                                        Y(J)S APS Slide 29
                            Example of 3-phase

3-phase signaling is imperative for bidirectional 1:n
1. upon detection of failure the tail-end sends SF to the head-end
    but does not change its selector
2. the head-end checks priority, and sends confirmation to tail-end
    head-end changes its bridge setting
    and also sends a reverse request
3. the tail-end changes selector
    checks priority and sends confirmation to head-end
    tail-end changes its bridge setting (as head-end of opposite direction)
    head-end receives confirmation and changes its selector




                                                                        Y(J)S APS Slide 30
                                      For G.805 buffs
to add 1+1 trail protection to a trail - expand a trail termination function
we use a special transport processing function - the protection switch


       unprotected
          trail                                           protected trail




  the unprotected TTs report status
  to the protection switch



                                                                               Y(J)S APS Slide 31
SONET/SDH APS




                Y(J)S APS Slide 32
                             SONET protection ?
SONET/SDH networks need to be highly reliable (five nines)
Down-time should be minimal (less than 50 msec)
So systems must repair themselves (no time for manual intervention)
Upon detection of a failure (dLOS, dLOF, high BER)
   the network must reroute traffic (protection switching)
   from working channel to protection channel
SDH APS is unidirectional
SDH APS may be revertive


                             working channel


                            protection channel
           head-end NE                              tail-end NE



                                                                      Y(J)S APS Slide 33
                                         SONET/SDH layers

                           ADM                    regenerator               ADM
     Path                    Line                    Section                  Line                    Path
  Termination             Termination              Termination             Termination             Termination


                                                      path
                 line                             line (MS section)                       line
                section                 section                  section                 section


Between regenerators there are sections (regenerator sections)
Between ADMs there are lines (multiplex sections)
Between path terminations there are paths
Protection can be at OC-n level (different physical fibers)
    or at STM/VC level
    or end-to-end path (trail protection)

                                                                                                     Y(J)S APS Slide 34
                                                         Line APS
               3 rows                                      90 columns
                        9 rows



                                       Synchronous Payload Envelope
               6 rows




A1   A2   J0
                                 TOH
B1   E1   F1

D1   D2   D3
                                   TOH consists of
H1   H2   H3

B2   K1   K2                          3 rows of section overhead - frame sync, trace, EOC, …
D4   D5   D6
                                      6 rows of line overhead - pointers, SSM, FEBE, and
D7   D8   D9
                                        Line APS signaling uses bytes K1 and K2
DA   DB   DC

S1   M0   E2

                                                                                            Y(J)S APS Slide 35
                                      HO Path APS
                                                                                   J1
                                                                                  B3
                                                                                  C2
                                                                                  G1
                                                                                   F2
                                                                                  H4
                                                                                   F3
                                                                                  K3
                                                                                  N1
                                                                                POH


POH is responsible for type, status, path performance monitoring, VCAT, trace
HO Path APS signaling uses 4 MSBs of byte K3

                                                                                Y(J)S APS Slide 36
                                LO Path APS
1                    30    59           87




                                               V5
                                V1
VC OH is responsible for
                                                J2
    Timing, PM, REI, …
                                V2
LO Path APS signaling is                       N2
   4 MSBs of byte K4
                                V3
                                               K4

                                V4
                                              VC OH
                                               Y(J)S APS Slide 37
                              How does it work?

Head-end and tail-end NEs have bridges (muxes)
Head-end and tail-end NEs maintain bidirectional signaling channel
Signaling is contained in K bytes of protection channel
For line APS
 K1 – tail-end status and requests
 K2 – head-end status


            head-end bridge                         tail-end bridge
                               working channel




                    protection channel   signaling channel


                                                                      Y(J)S APS Slide 38
                           Linear 1+1 protection
Can be at OC-n level (different physical fibers)
    or at STM/VC level (SubNetwork Connection Protection)
    or end-to-end path (called trail protection)


Head-end bridge always sends data on both channels
Tail-end chooses channel to use based on BER, dLOS, etc.
No need for signaling
If non-revertive
     there is no distinction between working and protection channels

                            working channel


                            protection channel
            head-end NE                            tail-end NE
                                                                       Y(J)S APS Slide 39
                            Linear 1:1 protection
Head-end bridge usually sends data on working channel
When tail-end detects failure it signals (using K1) to head-end
Head-end then starts sending data over protection channel
When not in use
   protection channel can be used for (discounted) extra traffic
    (pre-emptible unprotected traffic)

May be at any layer (but only OC-n level protects against fiber cuts)


                                  working channel


                                    extra traffic
                                  protection channel

                                                                        Y(J)S APS Slide 40
                             Linear 1:N protection

In order to save BW
     we allocate 1 protection channel for every N working channels
N limited to 14
     4 bits in K1 byte from tail-end to head-end
    – 0 protection channel
    – 1-14 working channels
    – 15 extra traffic channel




                                 working channels

                                 protection channel

                                                                     Y(J)S APS Slide 41
                  Two fiber vs. Four-fiber rings
Ring based protection is popular in North America (100K+ rings)
Full protection against physical fiber cuts
Simpler and less expensive than mesh topologies
Protection at line (multiplexed section) or path layer
Four-fiber rings
     fully redundant at OC level
     can support bidirectional routing at line layer
Two-fiber rings
     support unidirectional routing at line layer




                      2 fibers in opposite directions
                                                                  Y(J)S APS Slide 42
               Unidirectional vs. bidirectional
Unidirectional routing
    working channel B-A same direction (e.g. clockwise) as A-B
    management simplicity: A-B and B-A can occupy same timeslots
    Inefficient: waste in ring BW and excessive delay in one direction
Bidirectional routing
     A-B and B-1 are opposite in direction
     both using shortest route
     spatial reuse: timeslots can be reused in other sections


       A-B          B                       A-B          B
                                                                   B-C
                                                      B-A

         A                                     A

                                                             C-B
                              B-A                       C

                                                                         Y(J)S APS Slide 43
                         UPSR vs. BLSR (MS-SPRing)
     UPSR        Unidirectional          Path switching          Two-fiber
     BLSR        Bidirectional           Line switching          Four-fiber



Of all the possible combinations, only a few are in use
Unidirectional (routing) Path Switched Rings
    protects tributaries
    extension of 1+1 to ring topology
Bidirectional (routing) Line Switched Rings (two-fiber and four-fiber versions)
     called Multiplex Section Shared Protection Ring in SDH
     simultaneously protects all tributaries in STM
     extension of 1:1 to ring topology



                                                                         Y(J)S APS Slide 44
                                            UPSR
Working channel is in one direction
   protection channel in the opposite direction
All path traffic is “added” in both directions (1+1)
     decision as to which to use is made at drop point (no signaling)
Normally non-revertive, so effectively two diversity paths

Good match for access networks
   1 access resilient ring
       less expensive than fiber pair per customer
Inefficient for core networks
     no spatial reuse
          every signal in every span
               in both directions                             2 rings
     node needs to continuously monitor
          every tributary to be “dropped”

                                                        SONET ADM
                                                                        Y(J)S APS Slide 45
                                               BLSR

Switch at line level – less monitoring
When failure detected tail-end NE signals head-end NE
Works for unidirectional/bidirectional fiber cuts, and NE failures
Two-fiber version
   half of OC-N capacity devoted to protection
   only half capacity available for traffic
                                                                 wrap-around
Four-fiber version
    full redundant OC-N devoted to protection
    twice as many NEs as compared to two-fiber
                                                                        2 rings



                                     Example
                                     recovery from unidirectional fiber cut
                                                                                  Y(J)S APS Slide 46
Ethernet linear APS


         STP
         LAG
         G.8031




                      Y(J)S APS Slide 47
                                          STP
The original Spanning Tree Protocol automatically removed loops
   from arbitrary networks (with loops)
However, its convergence was very slow (about a minute)
STP can not be used as a protection mechanism
   since its reconvergence time is very long
   due to a cumbersome protocol
   and long holdoff timer settings
An evolutionary update called Rapid STP 802.1w
   was incorporated into 802.1D-2004 clause 17
   that converges in about the same time as STP
   but can reconverge after a topology change in less than 1 second
RSTP can be used to detect failures and reconverge
   and thus can be used as a primitive protection mechanism
However, the switching time will be many tens of ms to 100s of ms

                                                                      Y(J)S APS   Slide 48
                                         Use of LAG
Ethernet “link aggregation” (AKA bonding, Ethernet trunk, inverse mux, NIC teaming)
   enables bonding several ports together as single uplink
Defined by 802.3ad task force and folded into 802.3-2000 as clause 43
Binding of ports to Link Aggregation Groups (LAGs) distributed via
   Link Aggregation Control Protocol (LACP)

LACP uses slow protocol frames (up to 5 per second)
Links may be dynamically added/removed from LAG
   and LACP continuously monitors to detect if changes needed
Upon link failure LAG delivers traffic at a reduced rate

Thus LAG can be used as a primitive protection mechanism

When used this way it is called worker/standby or N+N mode

The restoration time will be on the order of 1 second
                                                                                 Y(J)S APS   Slide 49
                                           G.8031
Q9 of SG15 in the ITU-T is responsible for protection switching
In 2006 it produced G.8031 Linear Ethernet Protection Switching
G.8031 uses standard Ethernet formats, but is incompatible with STP
The standard addresses
 point-to-point VLAN connections
 SNC (local) protection class
 1+1 and 1:1 protection types
 unidirectional and bidirectional switching for 1+1
 bidirectional switching for 1:1
 revertive and nonrevertive modes
 1-phase signaling protocol

G.8031 uses Y.1731 OAM CCM messages in order to detect failures
G.8031 defines a new OAM opcode (39) for APS signaling messages
Switching times should be under 50 ms (only holdoff timers when groups)
                                                                          Y(J)S APS   Slide 50
                                        G.8031 signaling
The APS signaling message looks like this :

     MEL      VER=0        OPCODE=39              FLAGS=0               OFFSET=4
     (3b)      (5b)              (1B)                 (1B)                   (1B)

    req/state prot. type    requested sig          bridged sig            reserved
      (4b)      (4b)             (1B)                  (1B)                  (1B)
    END=0
      (1B)      –   regular APS messages are sent 1 per 5 seconds
                –   after change 3 messages are sent at max rate (300 per sec)
where
    req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)
    prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.)
    requested and bridged signal identify incoming / outgoing traffic
     since only 1+1 and 1:1 they are either null or traffic (all other values reserved)
                                                                                     Y(J)S APS   Slide 51
                      G.8031 1:1 revertive operation
In the normal (NR) state :
   head-end and tail-end exchange CCM (at 300 per second rate)
    on both working and protection channels
   head-end and tail-end exchange NR APS messages
    on the protection channel (every 5 seconds)
When a failure appears in the working channel
   tail-end stops receiving 3 CCM messages on working channel
   tail-end enters SF state
   tail-end sends 3 SF messages at 300 per second on the APS channel
   tail-end switches selector (bi-d and bridge) to the protection channel
   head-end (receiving SF) switches bridge (bi-d and selector) to protection channel
   tail-end continues sending SF messages every 5 seconds
   head-end sends NR messages but with bridged=normal
When the failure is cleared
   tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min)
   tail-end sends WTR message to head-end (in nonrevertive - DNR message)
   tail-end sends WTR every 5 seconds
   when WTR expires both sides enter NR state
                                                                                        Y(J)S APS   Slide 52
Ethernet ring APS

        G.8032
        RPR
        CLEER




                    Y(J)S APS Slide 53
                              Ethernet rings ?
Ethernet has become carrier grade :
   deterministic connection-oriented forwarding
   OAM
   synchronization
The only thing missing to completely replace SDH is ring protection
However, Ethernet and ring architectures don’t go together
  Ethernet has no TTL, so looped traffic will loop forever
  STP builds trees out of any architecture – no loops allowed
There are two ways to make an Ethernet ring
  open loop
   – cut the ring by blocking some link
   – when protection is required - block the failed link
   closed loop
    – disable STP (but avoid infinite loops in some way !)
    – when protection is required - steer and/or wrap traffic
                                                                      Y(J)S APS Slide 54
                     Ethernet ring protocols
Open loop methods
  G.8032 (ERPS)
  rSTP (ex 802.1w)
  RFER (RAD)
  ERP (NSN)
  RRST (based on RSTP)
  REP (Cisco)
  RRSTP (Alcatel)
  RRPP (Huawei)
  EAPS (Extreme, RFC 3619)
  EPSR (Allied Telesis)
  PSR (Overture)
Closed loop methods
   RPR (IEEE 802.17)
   CLEER and NERT (RAD)

                                               Y(J)S APS Slide 55
                                       G.8032
Q9 of SG15 produced G.8032 between 2006 and 2008

G.8032 is similar to G.8031
   strives for 50 ms protection (< 1200 km, < 16 nodes)
   – but here this number is deceiving as MAC table is flushed
   standard Ethernet format but incompatible with STP
   uses Y.1731 CCM for failure detection
   employs Y.1731 extension for R-APS signaling (opcode=40)
   R-APS message format similar to APS of G.8031
    (but between every 2 nodes and to MAC address 01-19-A7-00-00-01)
   revertive and nonrevertive operation defined

However, G.8032 is more complex due to
  requirement to avoid loop creation under any circumstances
  need to localize failures
  need to maintain consistency between all nodes on ring
  existence of a special node (RPL owner)
                                                                       Y(J)S APS Slide 56
                                                     RPL
G.8032v1 defines the Ring Protection Link (RPL)
    as the link to be blocked (to avoid closing the loop) in NR state
One of the 2 nodes connected to the RPL
   is designated the RPL owner
Unlike RFER
    there is only one RPL owner
    the RPL and owner are designated before setup
    operation is usually revertive

All ring nodes are simultaneously in 1 of 2 modes – idle or protecting
    in idle mode the RPL is blocked
    in protecting mode the failed link is blocked and RPL is unblocked
    in revertive operation
     once the failure is cleared the block link is unblocked
     and the RPL is blocked again

                                                                          Y(J)S APS Slide 57
                        G.8032 revertive operation
In the idle state :
   adjacent nodes exchange CCM at 300 per second rate (including over RPL)
   exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds (but not over
    RPL)
   R-APS messages are never forwarded
When a failure appears between 2 nodes
   node(s) missing CCM messages peek twice with holdoff time
   node(s) block failed link and flush MAC table
   node(s) send SF message (3 times @ max rate, then every 5 sec)
   node receiving SF message will check priority and unblock any blocked link
   node receiving SF message will send SF message to its other neighbor
   in stable protecting state SF messages over every unblocked link
When the failure is cleared
   node(s) detect CCM and start guard timer (blocks acting on R-APS messages)
   node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)
   RPL owner receiving NR starts WTR timer
   when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB
   node receiving NR RB flushes table, unblocks any blocked ports, sends NR RB
                                                                                   Y(J)S APS   Slide 58
                                        G.8032-2010
After coming out with G.8032 in 2008 (G.8032v1)
   the ITU came out with G.8032-2010 (G.8032v2) in 2010
This new version is not backwards-compatible with v1
   but a v2 node must support v1 as well (but then operation is according to v1)
                                                                                      RPL
                                                                    RPL       RPL             RPL
                                                                     next     owner         neighbor
Major differences :                                                neighbor


   2 designated nodes – RPL owner node and RPL neighbor node
                           and for optional flush-optimization “next neighbor node”
   significant changes to
      – state machine
      – priority logic
      – commands (forced/manual/clear) and protocol
   new Wait To Block timer
   supports more general topologies (sub-rings)
      – ladders (For Further Study in v1)                            subring ring            subring
      – multi-ring
   ring topology discovery
   virtual channel based on VLAN or MAC address
                                                          ladder
                                                                                       Y(J)S APS Slide 59
                                   RPR – 802.17
Resilient Packet Rings
 are compatible with standard Ethernet, but different frame format
 are robust (lossless, <50ms protection, OAM)
 are fair (based on client throttling)
 support QoS (3 classes – A, B, C)
 are efficient (full spatial reuse)
                                                            ringlet0
 are plug and play (automatic station autodiscovery)
 extend use of existing fiber rings


counter-rotating add/drop ringlets, running                ringlet1
   SONET/SDH (any rate, PoS, GFP or LAPS) or
   “packetPHY” (1 or 10 Gb/s ETH PHY)

developed by 802.17 WG
     based on Cisco’s Spatial Reuse Protocol (RFC 2892)   ringlet selection


                                                                      Y(J)S APS   Slide 60
                                  Basic RPR queuing

traffic going around ring                                              traffic for local sink
placed into internal buffer
                                                                       placed in output buffer
in dual-transit queue mode
                                                                            according to service class




                                                        A
                                              B
                                      C
     placed into 1 of 2 buffers
     according to service class
sent according to fairness


                                              PTQ


                                              STQ




                                                                        traffic from local source
                                                            fairness
                                                    A
                                          B
                                  C




Primary/Secondary Transit Queue                                         sent according to fairness
                                                                        first sent to ringlet selection


                                                                                                          Y(J)S APS   Slide 61
                          RPR service classes
RPR defines 3 main classes
 class A : real time (low latency/FDV)
 class B : near real time (bounded predictable latency/FDV)
 class C : best effort




          class     use      info rate    D/FDV        FE

         A0       RT       reserved      low        No

         A1       RT       allocated,    low        No
                           reclaimable
         B-CIR near RT allocated,        bounded    No
                           reclaimable
         B-EIR    near RT opportunistic unbounded Yes

         C        BE       opportunistic unbounded Yes
                                                               Y(J)S APS   Slide 62
                                       RPR Class use
A0 ring BW is reserved – not reclaimed even if no traffic
in dual-transit queue mode:
 class A frames from the ring are queued in PTQ
 class B, C in STQ


priority for egress
   frames in PTQ
   local class A frames
   local class B (when no frames in PTQ)
   frames in STQ
   local class C (when no PTQ, STQ, local A or B)

Notes:
class A have minimal delay
class B have higher priority than STQ transit frames, so bounded delay/FDV
classes B and C share STQ, so once in ring have similar delay



                                                                             Y(J)S APS   Slide 63
                               RPR - protection

rings give inherent protection against single point of failure
RPR specifies 2 mechanisms
 steering
 wrapping (optional)


(implementations may also do wrapping then steering)




                                          steering info


                                                                 wrap
                                                                 Y(J)S APS   Slide 64
                                  NERT and CLEER
New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring
Similar to RPR but uses real Ethernet format
NERT and CLEER distinguish between
 ring nodes
 switches connected to ring nodes

Traffic in ring is MAC-in-MAC encapsulated
 External MACs are of ring node
 Internal MACs are original

Unexpected external MACs discarded                   ring nodes
External MACs learned as in 1ah
Ring nodes forward according to table
NERT floods, CLEER never floods
Protection switch only involves changing table
   so service restoration is fast                     switches
                                                                    Y(J)S APS   Slide 65
MPLS fast reroute

         IP FRR
         RFC 4090




                    Y(J)S APS Slide 66
                                          IP FRR
True protection mechanisms do not exist for connectionless IP
In practice, routing protocols discover breaks and recalculate routes
    but this usually takes a long time
Link-state IGPs detect link-down state using hellos
    for OSPF - typically every 10 sec, and detection after 40 sec
    and then Dijkstra algorithm avoids the failed link
BFD can be used to speed up the detection
However,
  the information still has to be propagated further (seconds?)
  and FIBs updated (100s of ms)
Various IP Fast ReRoute (IP FRR) mechanisms have been proposed
    but true protection is best done at the MPLS level


                                                                        Y(J)S APS Slide 67
                                  MPLS fast reroute
RSVP-TE enables MPLS traffic engineering by fine control over placement
   specifies explicit path using information gathered from IGP
   resources may be reserved at LSRs along the way
RFC 4090 defines extensions to RSVP-TE – Fast ReRoute (FRR)
LSRs along the path preconfigure local bypasses (detours)
Upon detection of failure by
   BFD (specified in microseconds, typically 10s of ms) or                not
   RSVP hellos (RFC default is 5 ms) or                               discussed in
   RESV / PATH messages (driven by IGP)                                RFC 4090
upstream LSR simply enables the detour
Since this is a local action, it should be fast
RFC 4090 only discusses adding FRR to RSVP-TE network
    but its use with LDP is possible if there is a single label generator


                                                                            Y(J)S APS Slide 68
                                   PLRs and MPs

A fundamental entities in MPLS FRR are
   Point of Local Repair (PLR)
   Merge Point (MP)
A PLR is the LSR before the failed element (link or node)
All LSRs except the egress LER can be PLRs
The PLR is solely responsible for the FRR (no explicit APS signaling)
During path setup, potential PLRs create detours towards the egress LER
A MP is the LSR where the detour rejoins the LSP
All LSRs except the ingress LER can be MPs
                 ingress                             egress
                   LER     PLR                 MP     LER




                                                                        Y(J)S APS Slide 69
                                     Methods
RFC 4090 defines two different protection methods
Usually one or the other is employed in a given network

One-to-one backup
  each LSP protected separately
  detour LSP created for each LSP at each potential PLR
  no labels pushed                     PLR                MP




Facility backup
   backup tunnel for multiple LSPs
   bypass tunnel created at each potential PLR
   uses label stacking
                                                  PLR           MP




                                                                Y(J)S APS Slide 70
                             NHOP and NNHOP

MPLS FRR can bypass a failed link or a failed node
In order to bypass a single failed link
    we need an alternative path to the next hop (NHOP)
                            PLR           MP




In order to bypass a single failed node, we need an alternative path to the
    next next hop (NNHOP)

                          PLR             MP




                                                                      Y(J)S APS Slide 71
    MPLS TP APS

RFC 6372 (MPLS-TP Survivability Framework)
RFC 6378 (MPLS-TP Linear Protection)
draft-ietf-mpls-tp-ring-protection




                                             Y(J)S APS Slide 72
     MPLS-TP resilience


Since it strives to be a carrier-grade transport network
  TP has strong protection switching requirements
APS has been almost as contentious issue as OAM
 and indeed the arguments are inter-related
RFC 6372 gives a general framework
 and differentiates between
  – linear
  – shared-mesh and
  – ring protection



                                                           Y(J)S APS Slide 73
          Linear protection
from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)
• 1+1, 1:1, 1:n and uni/bidi are supported
• APS signaling protocol (for all modes except 1+1 uni)
        is single-phase
        and called the Protection State Coordination protocol
• PSC messages are sent over the protection channel
• APS messages are sent over the GACh with a single channel type
  message functions identified by a request field
• 6 states: normal, protecting due to failure, admin protecting,
           WTR, protection path unavailable, DNR
• when revertive, a WTR timer is used



                                                          Y(J)S APS   Slide 74
      PSC message format
          GAL Label (13)                  TC    S=1         TTL          GAL


  0001     VER      00000000          PSC channel type                  GACh


 Ver Request PT R         Res           FPath               Path


           TLV Length                                 Res                 PSC


                          Optional TLVs

Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR
PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n
R = Revertive
FPath = which path has fault Path = which data path is on protection channel
                                                                               Y(J)S APS Slide 75
         PSC control logic states

Normal state - no trigger events reported
Unavailable state - protection path is unavailable
Protecting failure state –
  traffic is being transported on the protection path
Protecting administrative state –
  operator issued command switching traffic to protection path
Wait-to-Restore state - recovering from working path SF/SD
                          WTR timer not up
Do-not-Revert state - recovered from a protecting state
                        but operator has configured DNR



                                                           Y(J)S APS Slide 76
          PSC local requests
In order from highest to lowest priority :

1. Clear (operator command)
2. Lockout of protection (operator command)
3. Forced Switch (operator command)
4. Signal Fail on protection (OAM / control-plane / server indication)
5. Signal Fail on working (OAM / control-plane / server indication)
6. Signal Degrade on working (OAM / control-plane / server indication)
7. Clear Signal Fail/Degrade (OAM / control-plane / server indication)
8. Manual Switch (operator command)
9. WTR Expires (WTR timer)
10. No Request (default)




                                                                 Y(J)S APS Slide 77
               Linear protection – ITU style
from draft-zulr-mpls-tp-linear-protection-switching
Similar to previous, but uses Y.1731/G.8031 format (no surprise!)


         GAL Label (13)           TC      S=1     TTL       GAL


 0001     VER      00000000     allocated channel type      GACh


 MEL     VER     OPCODE=39     FLAGS=0          OFFSET=4
                                                            G.8031
  req    prot    requested      bridged
                    sig                         reserved
 state   type                      sig

END=0

                                                                  Y(J)S APS   Slide 78
             Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node failures
draft-ietf-mpls-tp-ring-protection (not yet RFC)
Between any 2 LSRs can define a Sub-Path Maintenance Entity
So between 2 LSRs on a ring there are 2 SPMEs –
  we define 1 as the working channel and 1 as the protection channel
Now we re-use the linear protection mechanisms, including the PSC protocol
draft-helvoort-mpls-tp-ring-protection-switching
Both counter-rotating rings carry working and protection traffic
The bandwidth on each ring is divided
  X BW is dedicated to working traffic and Y dedicated to protection traffic
The protection bandwidth of one ring is used to protect the other ring
Each node should have information about the sequence of ring nodes
MPLS-TP Ring Protection Switching is G.8032-like, but forwards non-NR msgs

                                                                       Y(J)S APS   Slide 79

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:8/21/2012
language:Latin
pages:79