Knowledge Transfer GIG and AFRL by zag15981

VIEWS: 9 PAGES: 31

									Knowledge Transfer: GIG and AFRL




                          Ken Birman and
                          Shankar Sastry




       June 26-28. 2005           All Hands Meeting
Context

   The US government is in the midst of a
    massive shift towards much broader use of
    networks
    –   Main acronym: Global Information Grid, or GIG
    –   Others: Network Centric Enterprise Systems
        (NCES), Service Oriented Architecture (SOA)
   Key technologies:
    –   Windows platforms, some use of Linux
    –   Web Services and CORBA

                  TRUST All Hands Meeting      June 26-28, 2005
Advantage: Information


  “The natural formation of the country is the soldier's best ally;
    but a power of estimating the adversary, of controlling the
    forces of victory, and of shrewdly calculating difficulties,
    dangers and distances, constitutes the test of a great
    general. He who knows these things, and in fighting puts his
    knowledge into practice, will win his battles. ”

              - General Sun-Tzu Wu, 512BC




                  TRUST All Hands Meeting               June 26-28, 2005
Disavantage: Complexity

   While the concept of interconnected systems
    is extremely appealing… the reality is often
    quite a different matter
   So-called “exponential state space explosion”
    can occur: suddenly we have to worry about
    all the possible simultaneous states reachable
    by these complex systems
   Costs of building, deploying, and operating
    integrated systems are skyrocketing!

               TRUST All Hands Meeting   June 26-28, 2005
The proposed architecture?




           TRUST All Hands Meeting   June 26-28, 2005
Note:
HAIPEs
shown
support
VPNs;
additional
HAIPEs
needed for
redundancy




                Activity focuses on this proposal: Architecture
             Key to understanding addition additional routing is
                   options, “virtual private networks” (VPNs)
                 based on effectively increases degree of
             interconnection between components of Air Force
              Systems, secret end-to-end connectivity to traffic,
             These use and also improves keys to encrypt non-
                              Air Force networks
               which can then travel over an insecure network.
Also creates new connectivity to the public Internet
               Later (how much later?) solution is simplified, still
                 through firewalls
                using VPN technologies. Understand this as a
                simplified network hosting a plethora of “virtual”
               network overlays protected by VPN security keys,
              but interconnected selectively by “tunnels” that allow
               desired forms of enterprise-to-enterprise traffic to
                          pass from network to network
Assured Information Sharing

                              Discovery
                                            Post


                            Retrieve
                                        Meta
                                        Data
                                       Catalog

                                       Verification



                                                   Retrieve

Consumers Characteristics                                  GIG, COI, Partner
                               Access                     Interaction Policies
               Entity        Enforcement
                User
               Profile
                                                 Deconflicted     Policies
                                                 Policies
              Environ       Verification
              -mental
              Factors
JBI Prometheus Study

   Idea here is to work with AFRL to help them
    drive the work that might yield much better
    options
   Scope? A nine-month study
   So, we’re doing two studies!
    –   One is for the “operational” Air Force (the CIO’s
        office)
    –   And the other is for AFRL (the JBI team)


                   TRUST All Hands Meeting        June 26-28, 2005
Forms of guidance anticipated?

   Hope is that study will identify and categorize three
    classes of challenges
    –   Near term, low risk: : Opportunities that can be pursued at
        low risk with existing technologies
    –   Mid term, medium risk: Demands research investigation but
        path forward is fairly clear
    –   Long term, higher risk: Significant unknowns, major
        challenges, yet without progress JBI functionality could be
        seriously impacted




                    TRUST All Hands Meeting              June 26-28, 2005
What makes the problem hard?

   Enterprise integration is creating systems of systems
    –   Talking about quality of service properties is hard simply
        because the components have ill-defined properties and may
        be large, balky legacies with very indirect connectivity and
        limited APIs
    –   Layers can and often do interact with one-another
    –   Software engineering looms as a large unknown, particularly
        with COTS products and components




                    TRUST All Hands Meeting             June 26-28, 2005
Drill down: Scalability

   We want predictable, stable performance, reliability,
    security
   … despite
    –   Large numbers of users
    –   Large physical extent of network
    –   Increasing rates of infrastructure disruption (purely because
        of growing span of network)
    –   Wide range of performance profiles
    –   Growth in actual volume of work applications are being asked
        to do




                     TRUST All Hands Meeting             June 26-28, 2005
Scalable Publish Subscribe

   A popular paradigm; we’ll use it to illustrate our points
   Used to link large numbers of information sources in
    commercial or military settings to even larger
    numbers of consumers
    –   Track down the right servers
    –   Updates in real-time as data changes
   Happens to be a top military priority, so one could
    imagine the government tackling it…




                    TRUST All Hands Meeting        June 26-28, 2005
Current presumed approach?

   Basically, client-server
    –   Clients publish and subscribe by talking to a server
    –   Server sees all the data
   Does this “scale”?
    –   Perhaps, if we can manage to partition the
        workload, but in practice that may be very hard
   Research question: can we design a scalable
    solution that replaces the server with a “farm”
    of servers?
                   TRUST All Hands Meeting        June 26-28, 2005
  Publisher offers new events to a proxy
 server. Subjects are partitioned among
the server sets. In this example there are
 four partitions: blue, green, yellow and
red. Server set and partition function can    Subscriber must
            adjust dynamically                identify the best
                                                  servers.


                                                                           Subjects are
                                                                        partitioned among
                                                                        servers hence one
      log                                                                subscriber may
                                                                          need multiple
                                                                           connections
                                           publish
        Server cluster




                           Like the subscribers, each publisher connects to the “best”
                         proxy (or proxies) given its own location in the network. The
                         one selected must belong to the partition handling the subject
                                                   of the event.
Poor Scalability

   Long “rumored” for distributed computing
    technologies and tools
   Famous study by Jim Gray points to scalability issues
    in distributed databases
   Things that scale well:
    –   Tend to be stateless or based on soft state
    –   Have weak reliability semantics
    –   Are loosely coupled




                     TRUST All Hands Meeting          June 26-28, 2005
Do current technologies scale?

Category                        Typical large use               Limits?

Client-Server and object-       LAN system, perhaps 250         Server capacity limits scale.
oriented environments           simultaneous clients


Web-like architectures          Internet, hundreds of clients   No reliability guarantees


Publish-subscribe               About 50 receivers, 500 in      Throughput becomes unstable
Group multicast                 hierarchies                     with scale. Multicast storms

Many-Many DSM                   Rarely seen except in small     Update costs grow with cluster
                                clusters                        size

Shared database                 Farm: 50-100; RACS: 100’s,      Few successes with rapidly
                                RAPS: 10’s                      changing real-time data




                            TRUST All Hands Meeting                           June 26-28, 2005
Swiss Stock Exchange Problem: Reliable multicast
too “fragile” in large deployments!




                                         Most members are
                                         healthy….


                                         … but one is slow




               TRUST All Hands Meeting          June 26-28, 2005
With 32 processes….


                                                                     Virtually synchronous Ensemble multicast protocols
                                                     250
        average throughput on nonperturbed members




                                                                                                       ideal
                                                     200




                                                     150
                                                                                      actual
                                                     100




                                                     50




                                                       0
                                                           0   0.1      0.2     0.3     0.4     0.5    0.6    0.7    0.8   0.9
                                                                                        perturb rate

                                                               TRUST All Hands Meeting                                           June 26-28, 2005
The problem got worse as the system scaled
up


                                                                     Virtually synchronous Ensemble multicast protocols
                                                     250
                                                                                                                group size: 32
        average throughput on nonperturbed members


                                                                                                                group size: 64
                                                                                                                group size: 96
                                                     200                              32

                                                     150




                                                     100

                                                                              96
                                                     50




                                                       0
                                                           0   0.1      0.2     0.3    0.4     0.5    0.6     0.7    0.8     0.9
                                                                                       perturb rate

                                                               TRUST All Hands Meeting                                             June 26-28, 2005
Not confined to stock exchanges

   Navy Cooperative Engagement Capability:
    –   Links radars, weapons systems within group of warships
    –   … successfully demonstrated in the lab settings
   But once in the field…
    –   Struggled with erratic communication links
    –   Performance dropped with each additional warship
            Naturally, the load presented to the system tends to rise
            Hence CEC was basically not successful
   Tale shows that scalability can determine success or
    failure of an exciting capability



                       TRUST All Hands Meeting                   June 26-28, 2005
Why doesn’t anything scale?

   Actually, some systems do scale (like NYSE)
   But many don’t. With weak semantics…
    –   Faulty behavior may occur more often as system size
        increases (think “the Internet”)
   With strong semantics…
    –   Encounter a system-wide cost (e.g. membership
        reconfiguration, congestion control)
    –   That can be triggered more often as a function of scale (more
        failures, or more network “events”, or bigger latencies)
   Gray’s O(n2) database degradation reflects very
    similar issues… a new law of nature?



                     TRUST All Hands Meeting             June 26-28, 2005
Fight fire with fire!

   Turn to randomized protocols…
   … with probabilistic reliability goals
   This overcomes the scalability problems just
    seen
   Then think about how to “present” mechanism
    to user




               TRUST All Hands Meeting   June 26-28, 2005
Tools in our toolkit

   Traditional deterministic tools:
    –   Virtual synchrony: Only in small groups
    –   Paxos: So called state machine replication. Ditto
    –   Transactions: One-copy serializability. Even DB
        vendors don’t use this approach
   New-age probabilistically reliable options:
    –   Bimodal multicast
    –   Astrolabe
    –   DHTs


                  TRUST All Hands Meeting        June 26-28, 2005
  Publisher offers new events to a proxy
 server. Subjects are partitioned among
the server sets. In this example there are
 four partitions: blue, green, yellow and
red. Server set and partition function can    Subscriber must
            adjust dynamically                identify the best
                                                  servers.


                                                                          Subjects are
                                                       We can use   Bimodal Multicast
                                                                       partitioned among
                                                                       servers hence one
                                                                    here
      log                                                                subscriber may
                                                                          need multiple
                                                                           connections
                                           publish   This replication problem looks like
        Server cluster                                an instance of virtual synchrony


                                      Perhaps this client can use
                                      Astrolabe to pick a server



                           Like the subscribers, each publisher connects to the “best”
                         proxy (or proxies) given its own location in the network. The
                         one selected must belong to the partition handling the subject
                                                   of the event.
                       Publisher uses Astrolabe to identify the
                               correct set of receivers


                                                                                                                                            Subscriber must
                                                                                                                                            identify the best
                                                                                                                                                servers.




                                                    log



                                                                                                                                     Bimodal Multicast
Astrolabe manages configuration and
                Server cluster
connection parameters, tracks system
       membership and state.

                                  Virtual “summary” table
                                          Name            Avg    WL contact    SMTP contact
                                                          Load

                                           SF             2.6    123.45.61.3    123.45.61.17


SQL query                                  NJ

                                          Paris
                                                          1.8

                                                          3.1
                                                                 127.16.77.6

                                                                 14.66.71.8
                                                                                127.16.77.11

                                                                                14.66.71.12
                                                                                                               SQL query

    Name      Load    Weblogic?   SMTP?            Word          …             Name       Load     Weblogic?   SMTP?    Word     …
                                                  Version                                                              Version

    swift     2.0        0          1               6.2                        gazelle     1.7        0          0       4.5

   falcon     1.5        1          0               4.1                        zebra       3.2        0          1       6.2

   cardinal   4.5        1          0               6.0                         gnu           .5      1          0       6.2




                    San Francisco                                                             New Jersey




  The combined technologies solve the initial problem!
Back to AF studies

   Our basic idea right now is to use examples
    such as JBI scalability to get AF “on board” to
    explore the space as a whole
    –   From this would come a broader AFRL research
        agenda
    –   And TRUST team members could then make
        proposals to tackle the associated problems
   In effect: AF/CIO office asks AFRL to explore
    questions that we would help them study

                 TRUST All Hands Meeting     June 26-28, 2005
Taxonomy of research topics


          Dark Core                                    NCES/SOA

key mgt    assured       firewall        discovery       QoS          Info.
           routing        policy                                   architecture
                         admin.

 Application         Managed           Self -*                      Better tools for
containment           secure         (autonomic                      developing
                      tunnels       management)                       solutions
   (virtual
 enterprise)
                                         scalability     Time        High
                                                        critical   availability




                         TRUST All Hands Meeting                      June 26-28, 2005
NCES Technology Story in a Nutshell
   NCES Services                State of the Art                                        Example of Open Issue

      Discovery         Web Services (UDDI/WSDL)               Completely unclear how to support “policies” and “preferences”.



      Mediation         XML translation using semantic web     The entire topic has the feel of a research experiment. Little is known about how
                                                               to do automated “translation”. Much work is needed here. Over time, will need a
                                                               GIG “information architecture”


    Collaboration       Currently, this is mostly a white      Guarantees of reliability and rapid response are particularly weak. Clearly a
                        paper vision. Initial systems will     tremendous amount or work is needed to apply these in sensitive military
                        resemble existing chat and             contexts. There will be thousands of early projects and all are likely to encounter
                        collaboration tools.                   major technology gaps.


     Messaging          Based on publish-subscribe and         XML encoding is bulky and content filtering is very slow. Scalability of these
                        message queuing middleware             products is fair to terrible and quality of service guarantees are minimal.
                                                               Research is needed on scalable, stable solutions that work well even under
                                                               stress


 Enterprise Services    Focus here is on adapting tools like   Scale of the military network management, application management and security
                        Tivoli to GIG. Not a bad place to      management problem dwarfs prior experience and existing tools don’t scale well
                        start…                                 enough. Existing solutions require far too much hand-configuration and manual
                                                               supervision.


 Application Service    Offers transactional technologies      These are valuable and relatively mature. The primary issue that arises concerns
                                                               systems in which enormous databases are acquired at various “locations” and
                                                               are too large to combine on one site.


   Storage Service      File systems and databases             Overall, a mature and well-understood area, yet understanding of how to manage
                                                               replicated storage requires far more attention.

 User Assist. Service   Online help                            Industry coverage of this topic is adequate.


   Info. Assurance      Based on VPNs (single dark core)       Unclear how key material and firewall policies can be managed on massive scale.
                                                               Potential for flash virus outbreaks a very serious threat. Research needed on
                                                               “virtual enterprises” that might combine VMM with VPN with information
                                                               assurance tools…
                                            TRUST All Hands Meeting                                                 June 26-28, 2005
Summary?

   Shankar and Birman see AF as a major
    community with which TRUST may want to
    work closely
   We’re starting with studies
    –   Understanding them is key to helping them!
    –   Technology thrown over the wall won’t help here
   But as these mature, they may yield major
    opportunities for the TRUST research team

                  TRUST All Hands Meeting       June 26-28, 2005

								
To top