yahoo - PDF

Document Sample
yahoo - PDF Powered By Docstoc
					A First Look at Inter-Data Center                                       Outline
Traffic Characteristics via Yahoo!                                            2

            Datasets                        Overview of yahoo data sets

                                            Identify yahoo prefix

               INFOCOM 2011
                                            Traffic characteristics

    UNIVERSITY OF MINNESOTA-TWIN CITIES     Discussion and conclusion


    Overview of Yahoo database                         Five major yahoo data centers
                       3                                                      4

                                            Dallas (DAX), Washington DC (DCP),Palo Alto
                                               Provide most of the core services
                                               the largest yahoo data centers in terms of the amount of traffic
                                            Hong Kong (HK), United Kingdom (UK)
                                            Yahoo’s border routers connect to several other ISPs
                                             to reach its clients and other data centers
                                            These data centers are directly connected to each
                                             other through a private network service
                        Sampled flow                                                    Classification of Flows
                                   5                                                                     6

 Use the anonymized NetFlow datasets collected at                      D2C traffic
  the border routers of five major yahoo data centers                     The traffic exchanged between yahoo servers and clients

 Each record in the Netflow                                            D2D traffic
    timestamp, source and destination IP address, transport layer        The traffic exchanged between different yahoo servers at
     port number, source and destination interface on the router, IP       different locations
     protocol, number of bytes and packets exchanged                    Client
 IP address are permuted to hide the identities of the                   non-yahoo host connect to yahoo server
 yahoo users
    Prefix preserving schemes
    a,b,c,d permute to w,x,y,z =>a,b,c, d’ permute to w,x,y,z’

                      Two-step process                                                Identify the D2C prefixes
                                   7                                                                     8

 Separate yahoo IP addresses from non-yahoo IP                         Based on degree and port
  addresses in the D2C traffic                                            Yahoo D2C prefix if it talks to large number of other prefix and
                                                                           a large fraction of the traffic use popular service TCP port
 Extract the D2D IP addresses
                                                                        Choose top α prefix, at least β fraction
                                                                        β =0.5, α is stable beyond 400 for DAX
                                                                        Identify IP address(prefixes) that belong to yahoo
                  Choose α and β                      Localizing inferred D2C prefixes
                           9                                                10

                                               Observe traffic direction
                                               If it appears in both incoming and outgoing D2C
                                                traffic seen at that location
                                               Assign appropriate location to each prefix

           Identify yahoo D2D prefix                   Interface result and validation
                          11                                                12

 D2D traffic is mostly symmetric
 Carried in yahoo’s private network
 Two type interfaces on each border router
   Foreign interfaces

   Local interfaces
                Total prefixs extract                  Traffic statistics at DAX
                            13                                     14

 95% (DAX)
 95% (DCP)
 75% (PAO)
 100% (UK)
 75% (HK)
 Less than 5% non-yahoo prefixes were classified as
 Around 5% yahoo prefixes were assigned incorrect

             D2C service classification
                            15                                     16

 Using transport layer ports in the traffic
                                   17                                                                 18

 The number of IPs providing each D2C service and
  the overlapping number of IPs between each pair of

                           D2D traffic                                               D2D Communication Patterns
                                   19                                                                 20

 Identifying D2D port                                                       Degree of each prefix
 Considered as D2D port if it meets two constraints                           The number of
                                                                                unique IP prefixes
   A port p is frequently used in D2D traffic
                                                                                that it talks to
   Entropy for the distribution of other ports it talks to is close to 1
                                                                             Follow a power
 Found 37 D2D ports
                                                                             law distribution
   Cover more than 95% of the overall D2D traffic

   The top frequently used ports include 80,25, 1971, 14011, 5017,
    5019, 14020, and 14030
                           21                                                                   22

   Cross-Correlation between D2C & D2D Traffic                         Two major types of D2D traffic
                           23                                                                   24

 HK and UK data centers act like the “satellite” data    D2C-triggered D2D traffic
  centers                                                   Local D2C-triggered D2D traffic

 The data centers in US seem to act more like a                  Generate request traffic from a local host to a remote host

  “backbone” data centers                                     Foreign D2C-triggered D2D traffic
                                                                  Requested by a yahoo server from other data centers
                                                          Background D2D traffic
                                                            Regular traffic exchanged among the back-end servers

                                                            The traffic incurred by other network events
      Comparing three types of D2D traffic
                                 25                                                               26

                          Discussion                                                       Concussion
                                 27                                                               28

 Data inference                                                        Develop novel heuristics to infer the yahoo IP
   Develop some simple and intuitive heuristics                         addresses and locations
 Flow classification                                                   Study D2D and client traffic characteristics
   Provide an initial estimate of traffic and their characteristics    Yahoo uses a hierarchical way of deploying data
 Traffic correlation                                                    centers
   Help in developing better strategies to deploy various services     Can be applied to anonymized NetFlow traces of
    across data centers
                                                                         other providers as well
   Optimize network performance

Shared By: