Control Management by mxz28712

VIEWS: 3 PAGES: 59

Control Management document sample

More Info
									    Rethinking Network Control & Management
       The Case for a New 4D Architecture

                                       David A. Maltz
            Carnegie Mellon University/Microsoft Research
                                          Joint work with
                      Albert Greenberg, Gisli Hjalmtysson
              Andy Myers, Jennifer Rexford, Geoffrey Xie,
                         Hong Yan, Jibin Zhan, Hui Zhang



1
    The Role of Network Control and Management

       Many different network environments
           Access, backbone networks
           Data-center networks, enterprise/campus
           Sizes: 10-10,000 routers/switches
       Many different technologies
           Longest-prefix routing (IP), fixed-width routing (Ethernet),
            label switching (MPLS, ATM), circuit switching (optical, TDM)
       Many different policies
           Routing, reachability, transit, traffic engineering, robustness
    The control plane software binds these elements
      together and defines the network



2
            We Can Change the Control Plane!


       Pre-existing industry trend towards
        separating router hardware from software
           IETF: FORCES, GSMP, GMPLS
           SoftRouter [Lakshman, HotNets’04]
       Incremental deployment path exists
           Individual networks can upgrade their control
            planes and gain benefits
           Small enterprise networks have most to gain
           No changes to end-systems required

3
                         A Clean-slate Design


       What are the fundamental causes of network problems?
       How to secure the network and protect the infrastructure?
       How to provide flexibility in defining management logic?
       What functionality needs to be distributed – what can be
        centralized?
           How to reduce/simplify the software in networks?
           What would a “RISC” router look like?
       How to leverage technology trends?
           CPU and link-speed growing faster than # of switches



4
                           Three Principles for
                      Network Control & Management

    Network-level Objectives:
       Express goals explicitly
           Security policies, QoS, egress point selection
       Do not bury goals in box-specific configuration

                                           Reachability matrix
                                           Traffic engineering rules



                                         Management
                                            Logic


5
                           Three Principles for
                      Network Control & Management

    Network-wide Views:
       Design network to provide timely, accurate info
           Topology, traffic, resource limitations
       Give logic the inputs it needs

                                            Reachability matrix
                                            Traffic engineering rules



                                          Management
                                             Logic

                       Read state info
6
                            Three Principles for
                       Network Control & Management

    Direct Control:
       Allow logic to directly set forwarding state
           FIB entries, packet filters, queuing parameters
       Logic computes desired network state, let it implement it

                                            Reachability matrix
                                            Traffic engineering rules

                        Write state
                                          Management
                                             Logic

                        Read state info
7
              Overview of the 4D Architecture

                                                  Network-level
                                                  objectives
                            Decision

    Network-wide           Dissemination       Direct
    views              Discovery               control

                               Data

     Decision Plane:
        All management logic implemented on centralized
         servers making all decisions
        Decision Elements use views to compute data plane
         state that meets objectives, then directly writes this
         state to routers

8
              Overview of the 4D Architecture

                                                 Network-level
                                                 objectives
                            Decision

    Network-wide          Dissemination       Direct
    views              Discovery              control

                              Data

     Dissemination Plane:
        Provides a robust communication channel to each
         router – and robustness is the only goal!
        May run over same links as user data, but logically
         separate and independently controlled


9
               Overview of the 4D Architecture

                                                   Network-level
                                                   objectives
                             Decision

     Network-wide           Dissemination       Direct
     views              Discovery               control

                               Data

      Discovery Plane:
         Each router discovers its own resources and its local
          environment
         E.g., the identity of its immediate neighbors



10
               Overview of the 4D Architecture

                                                   Network-level
                                                   objectives
                             Decision

     Network-wide          Dissemination       Direct
     views              Discovery              control

                               Data

      Data Plane:
         Spatially distributed routers/switches
         Can deploy with today’s technology
         Looking at ways to unify forwarding paradigms
          across technologies

11
                   Concerns and Challenges

        Distributed Systems issues
            How will communication between routers and DEs
             survive failures in the network?
            Latency means DE’s view of network is behind
             reality. Will the control loop be stable?
            What is the overhead to/from the DEs?
            What happens in a network partition?
        Networking issues
            Does the 4D simplify control and management?
            Can we create logic to meet multiple objectives?

12
              The Feasibility of the 4D Architecture

     We designed and built a prototype of the 4D Architecture
        4D Architecture permits many designs – prototype is a
         single, simple design point
        Decision plane
            Contains logic to simultaneously compute routes and enforce
             reachability matrix
            Multiple Decision Elements per network, using simple election
             protocol to pick master
        Dissemination plane
            Uses source routes to direct control messages
            Extremely simple, but can route around failed data links



13
                   Evaluation of the 4D Prototype

        Evaluated using Emulab (www.emulab.net)
            Linux PCs used as routers (650 – 800MHz)
            Tested on 9 enterprise network
             topologies (10-100 routers each)




             Example network with
             49 switches and 5 DEs




14
               Performance of the 4D Prototype


     Trivial prototype has performance comparable to well-
        tuned production networks
        Recovers from single link failure in < 300 ms
            < 1 s response considered “excellent”
            Faster forwarding reconvergence possible
        Survives failure of master Decision Element
            New DE takes control within 1 s
            No disruption unless second fault occurs
        Gracefully handles complete network partitions
            Less than 1.5 s of outage



15
         Fundamental Problem: Wrong Abstractions
     Shell scripts      Traffic Eng       Management Plane
         Planning tools       Databases • Figure out what is happening in
                                          network
     Configs SNMP        netflow modems • Decide how to change it
                 OSPF
      Link                                Control Plane
                 OSPF          Routing
     metrics                              • Multiple routing processes on
                  BGP          policies     each router
                                          • Each router with different
      OSPF                                  configuration program
                           OSPF
       BGP                                • Huge number of control knobs:
                  FIB       BGP
                                            metrics, ACLs, policy


         FIB                              Data Plane
                           FIBPacket
                               filters    • Distributed routers
                                          • Forwarding, filtering, queueing
                                          • Based on FIB or labels
16
           Good Abstractions Reduce Complexity

Management
  Plane
                 Configs
                                                  Decision
     Control                                       Plane
     Plane      FIBs, ACLs          FIBs, ACLs Dissemination
 Data Plane                                      Data Plane


     All decision making logic lifted out of control plane
        Eliminates duplicate logic in management plane
        Dissemination plane provides robust
         communication to/from data plane switches
17
        Today: Simple Things are Hard to Do



                                  D




     Access                           Inter-POP
     Networks                         Links



18
          Fundamental Problem: Configurations Allow Too
                   Many Degrees of Freedom
         Computing configuration files that cause control plane to
          compute desired forwarding states is intractable
             NP-hard in many cases
             Requires predictive model of control plane behavior
         Configurations files form a program that defines a set of
          forwarding states
             Very hard to create program that permits only desired states, and
              doesn’t transit through bad ones




     Forwarding states        Auto-adaptation leads         Direct Control avoids
     allowed by configs         to/thru bad states               bad states
19
     Fundamental Problem: Conflation of Issues


        Ideal case: all routing information flooded to
         all routers inside network
            Robustness achieved via flooding
        Reality: routing information filtered and
         aggregated extensively
            Route filtering used to implement security and
             resource policies
            Route aggregation used to achieve scalability



20
     4D Separates Distributed Computing Issues from
                   Networking Issues

         Distributed computing issues ! protocols and network
          architecture
             Overhead
             Resiliency
             Scalability
         Networking issues ! management logic
             Traffic engineering and service provisioning
             Egress point selection
             Reachability control (VPNs)
             Precomputation of backup paths



21
                              Future Work


        Scalability
            Evaluate over 1-10K switches, 10-100K routes
            Networks with backbone-like propagation delays
        Structuring decision logic
            Arbitrate among multiple, potentially competing objectives
            Unify control when some logic takes longer than others
        Protocol improvements
            Better dissemination and discovery planes
        Deployment in today’s networks
            Data center, enterprise, campus, backbone (RCP)


22
                              Future Work


        Experiment with network appliances
            Traffic shapers, traffic scrubbers
        Expand relationships with security
            Using 4D as mechanism for monitoring/quarantine
        Formulate models that establish bounds of 4D
            Scale, latency, stability, failure models, objectives
        Generate evidence to support/refute principles



23
     Questions?




24
         Direct Control Provides Complete Control


        Zero device-specific configuration
        Supports many models for “pushing” routes
            Trivial push – convergence requires time for all
             updates to be receive and applied – same as today
            Synchronized update – updates propagated, but
             not applied till agreed time in the future – clock
             skew defines convergence time
            Controlled state trajectory – DE serializes updates
             to avoid all incorrect transient states



25
       Fundamental Problem: Wrong Abstractions

     interface Ethernet0
       ip address 6.2.5.14 255.255.255.128   access-list 143 deny 1.1.0.0/16
     interface Serial1/0.5 point-to-point    access-list 143 permit any
       ip address 6.2.2.85 255.255.255.252   route-map 8aTzlvBrbaW deny 10
       ip access-group 143 in                 match ip address 4
       frame-relay interface-dlci 28         route-map 8aTzlvBrbaW permit 20
                                              match ip address 7
                                             ip route 10.2.2.1/16 10.2.1.7
     router ospf 64
       redistribute connected subnets
       redistribute bgp 64780 metric 1 subnets
       network 66.251.75.128 0.0.0.127 area 0
     router bgp 64780
       redistribute ospf 64 match route-map 8aTzlvBrbaW
       neighbor 66.253.160.68 remote-as 12762
       neighbor 66.253.160.68 distribute-list 4 in
26
      Fundamental Problem: Wrong Abstractions
              2000

                          Size of configuration files in a
Lines in                  single enterprise network (881
                          routers)
config file
              1000




                0
                     0                                       881
                         Router ID (sorted by file size)
27
28
29
     Fundamental Problem: Conflating Distributed
       Systems Issues with Networking Issues
                                  Routing
                                  Process

                                  D     left
                    D
                                                              D
                Routing                                   Routing
                Process      D                            Process

     D          D    left                                 D       left

        Distributed Systems Concern: resiliency to link failures
            Solution: multiple paths through routing process graph


30
     Fundamental Problem: Conflating Distributed
       Systems Issues with Networking Issues
                                  Routing
                                  Process

                                 D     right

                                                              D
                Routing                                   Routing
                Process      D                            Process

     D          D    left                                 D       left


        Distributed Systems Concern: resiliency to link failures
            Solution: multiple paths through routing process graph
31
     Fundamental Problem: Conflating Distributed
       Systems Issues with Networking Issues
                                     Routing
                                     Process
                                                                Filter routes to D

                                    D      left
                     D
                                                                   D
                Routing                                        Routing
                Process        D                               Process

     D          D     left                                     D       left

        Networking Concern: implement resource or security
         policy
            Solution: restrict flow of routing information, filter routes,
             summarize/aggregate routes
32
     4D Supports Network Evolution & Expansion


        Decision logic can be upgraded as needed
            No need for update of distributed protocols
             implemented in software distributed on every
             switch
        Decision Elements can be upgraded as
         needed
            Network expansion requires upgrades only to
             DEs, not every switch




33
                        Reachability Example



                      R1     Chicago (chi)             R2

Data Center                  New York (nyc)                    Front Office
                                                       R5



                      R3                               R4



        Two locations, each with data center & front office
        All routers exchange routes over all links

34
              Reachability Example



              R1   Chicago (chi)     R2

Data Center        New York (nyc)         Front Office
                                     R5



              R3                     R4

               chi-DC
                chi-FO
               nyc-DC
               nyc-FO

35
              Reachability Example



              R1   Packet filter:     R2
                   Drop nyc-FO -> *         chi
                   Permit *
Data Center                                Front Office
                   Packet filter:     R5
                   Drop chi-FO -> *         nyc
                   Permit *

              R3                      R4

               chi-DC
                chi-FO
               nyc-DC
               nyc-FO

36
                    Reachability Example



                   R1   Packet filter:      R2
                        Drop nyc-FO -> *          chi
                        Permit *
Data Center                                      Front Office
                        Packet filter:      R5
                        Drop chi-FO -> *          nyc
                        Permit *

                   R3                       R4

        A new short-cut link added between data centers
        Intended for backup traffic between centers

37
                       Reachability Example



                      R1   Packet filter:            R2
                           Drop nyc-FO -> *                chi
                           Permit *
Data Center                                               Front Office
                           Packet filter:            R5
                           Drop chi-FO -> *                nyc
                           Permit *

                      R3                             R4

        Oops – new link lets packets violate security policy!
        Routing changed, but
        Packet filters don’t update automatically
38
     Prohibiting Packets from chi-FO to nyc-DC




39
                      Reachability Example



                           Packet filter:          R2
                     R1    Drop nyc-FO -> *              chi
                           Permit *
Data Center                                             Front Office
                           Packet filter:          R5
                           Drop chi-FO -> *              nyc
                           Permit *

                     R3                            R4

        Typical response – add more packet filters to plug the
         holes in security policy



40
                       Reachability Example



                           Drop nyc-FO -> *      R2
                      R1
                                                        chi
Data Center                                            Front Office
                                                 R5     nyc
                           Drop chi-FO -> *

                      R3                         R4

        Packet filters have surprising consequences
        Consider a link failure
        chi-FO and nyc-FO still connected

41
                      Reachability Example



                           Drop nyc-FO -> *          R2
                     R1
                                                           chi
Data Center                                               Front Office
                                                     R5    nyc
                          Drop chi-FO -> *

                     R3                              R4
        Network has less survivability than topology suggests
        chi-FO and nyc-FO still connected
        But packet filter means no data can flow!
        Probing the network won’t predict this problem
42
     Allowing Packets from chi-FO to nyc-FO




43
            Multiple Interacting Routing Processes


                 Client
                                              Server
     OSPF                                              OSPF
                               Internet

      FIB                                               FIB
                          Policy1         Policy2


                   OSPF         BGP       OSPF         OSPF
     OSPF


      FIB                           FIB                 FIB

44
     The Routing Instance Graph of a
          881 Router Network




45
     Reconvergence Time Under
         Single Link Failure




46
     Reconvergence Time When
        Master DE Crashes




47
     Reconvergence Time When
        Network Partitions




48
     Reconvergence Time When
        Network Partitions




49
                Many Implementations Possible

                                  Single redundant decision
                                    engine

                                  Multiple decision engines
                                  • Hot stand-by
                                  • Divide network & load share
                                  Distributed decision engines
                                  • Up to one per router


     Choice can be based on reliability requirements
     • Dessim. Plane can be in-band, or leverage OOB links
     Less need for distributed solutions (harder to reason about)
     • More focus on network issues, less on distributed protocols
50
                  Direct Expression Enables New Algorithms


                                                       D




     OSPF normally calculates a single path to each destination D
        OSPF allows load-balancing only for equal-cost paths to avoid loops
        Using ECMP requires careful engineering of link weights

                                                        D




     Decision Plane with network-wide view can compute multiple paths
     • “Backup paths” installed for free!
     • Bounded stretch, bounded fan-in
51
                            Systems of Systems


        Systems are designed as components to be used in larger
         systems in different contexts, for different purposes,
         interacting with different components
            Example: OSPF and BGP are complex systems in its own right,
             they are components in a routing system of a network,
             interacting with each other and packet filters, interacting with
             management tools …
        Complex configuration to enable flexibility
            The glue has tremendous impact on network performance
            State of art: multiple interactive distributed programs written in
             assembly language
        Lack of intellectual framework to understand global
         behavior

52
                 Supporting Network Evolution


        Logic for controlling the network needs to change
         over time
            Traffic engineering rules
            Interactions with other networks
            Service characteristics
        Upgrades to field-deployed network equipment must
         be avoided
            Very high cost
            Software upgrades often require hardware upgrades (more
             CPU or memory)




53
                 Supporting Network Evolution
                            Today
        Today’s “Solution”
            Vendors stuff their routers with software implementing all
             possible “features”
              – Multiple routing protocols
              – Multiple signaling protocols (RSVP, CR-LDP)
              – Each feature controlled by parameters set at configuration time to
                achieve late binding
            Feature-creep creates configuration nightmare
              – Tremendous complexity for syntax & semantics
              – Mis-interactions between features is common

     Our Goal: Separate decision making logic from the field-
       deployed devices


54
               Supporting Network Expansion


        Networks are constantly growing
            New routers/switches/links added
            Old equipment rarely removed
        Adding a new switch can cause old
         equipment to become overloaded
            CPU/Memory demands on each device should not
             scale up with network size




55
                Supporting Network Expansion
                            Today
        Routers run a link-state routing protocol
            Size of link-state database scales with # of routers
            Expanding network can exceed memory limits of old routers
        Today’s “Solution”
            Monitor resources on all routers
            Predict approach of exhaustion and then:
              – Global upgrade
              – Rearchitecture of routing design to add summarization, route
                aggregation, information hiding

     Our Goal: make demands scale with hardware (e.g., # of
       interfaces)


56
                  Supporting Remote Devices


        Maintaining communication with all network
         devices is critical for network management
            Diagnosis of problems
            Monitoring status and network health
            Updating configuration or software
        “the chicken or the egg….”
            Cannot send device configuration/management
             information until it can communicate
            Device cannot communicate until it is correctly
             configured

57
                  Supporting Remote Devices
                            Today

        Today’s “Solution”
            Use PSTN as management network of last resort
            Connect console of remote routers to phone modem
            Can’t be used for customer premise equipment (CPE):
             DSL/cable modems, integrated access devices (IADs)
            In a converged network, PSTN is decommissioned
     Our Goal: Preserve management communication to any
       device that is not physically partitioned, regardless of
       configuration state




58
                             Recent Publications


        G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On
         Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL,
         March 2005.
        J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H.
         Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,”
         Proceedings of ACM HotNets-III, San Diego, CA, November 2004.
        D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing
         Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004
         Conference on Applications, Technologies, Architectures, and Protocols for Computer
         Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004.
        D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford,
         “Structure Preserving Anonymization of Router Configuration Data,” Proceedings
         of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.




59

								
To top