Applications Scope Quantify IP dynamics

Document Sample
Applications Scope Quantify IP dynamics Powered By Docstoc
					  Yinglian Xie, Fang Yu, Kannan Achan, Moises Goldszmidt, Ted
  Wobber
  Microsoft Research Silicon Valley

  Eliot Gillum
  Microsoft MSN Hotmail




                                                   MSR-SVC             1




Applications
                                         24.2.110.27
q Malicious host identification                          24.2.110.27
q Network forensic analysis
q IP blacklisting

Scope
q Subset of DHCP addresses

Quantify IP dynamics
q Is the set of dynamic IP
addresses a small fraction?           Web server

q Can we identify dynamic IP
addresses automatically?
q How can we compute IP
dynamics?
                                                   MSR-SVC             2
pammers harvest zombie

osts from the “dynamic”

ortions of the Internet
q   96% of these email
    servers sent only spam to
    Hotmail
q   42% of all spam to
    Hotmail


P blacklisting is not

ffective for dynamic IPs
                                                  MSR-SVC   3




    }   Spammer identification based on the server
        IP address type
        q   Dedicated servers with statically setup IP
            addresses
             Servers with highly dynamic IP addresses
             e.g., dialup , DSL, or wireless hosts
    }   Agnostic to email content
    }   Complementary to existing methods
    Question: how to identify dynamic IP ranges?


                                                  MSR-SVC   4
  }   Perfect classification is extremely hard!
      q Require global IP configuration knowledge
      q Share proprietary information

      q Configurations change over time


  }   Existing approaches
      q Reverse  DNS lookups, Public databases (e.g.,
        Dynablock)
      q Enormous manual effort

      q Incomplete information

                fine-
      q Lack of fine-grained IP dynamics information



                                                   MSR-SVC             5




  }   Hotmail user-login log
      ◦ <IP, user id, timestamp>
  }   Hotmail SMTP email server log
      ◦ <IP, date, #emails sent, #spam emails>

  }   How can we explore our datasets?

                                             SMTP Email
                                             server log
              Identify                 Use dynamic IP Spam detection
User-login   dynamic IP   Dynamic IP
log                       addresses     to filter spam results
             addresses


                                                  MSR-SVC         6
}   High level approach                      IP address   Hosts     User id
    q   Infer IP properties by their usage   space                  space
        patterns                               0
}   Input
    q   Hotmail user-login trace:


                                                           ?
        <IP, user id> mappings
}
}   Two observations
    Challenge
    q
    q
        Dynamic IPs IP allocated with
        Establishingaredynamics from a
        continuous address range,
        belonging to mappings
        only user-IP the same routing
        table entry
    q   Users using a dynamic IP are
        likely to use other IPs within the
        same range
                                             232

                                                          MSR-SVC             7




                                                                              8
   }       IP address block 148.202/16 (65,536 IPs)
       q Belongs to Universidad de Guadalajara in Mexico
       q136 mail servers in this block sent email to Hotmail during Jun-Sep 2006
        § 75 of these sent only spam
           UDmap identified the following
           –



   }
           range as dynamic IPs
               [148.202.33.71, 148.202.33.220]
           §    73 out of the 75 spam servers were from
               this range




                                                                                                                              9




   User ID            IP addresses
       1              65.79.162.22
       2              65.79.162.23
       3              65.79.162.24
       ……                 ……



 User
 User           IP-user
 login
 login          mappings                                                                                         Adjusted
                                                                                                      Dynamic
  trace
  trace                           Multi-    Candidate IP usage-       IP     Dynamic Dynamic
                                                                                                         s
                                                                                                                 blocks and
                                   user                 entropy               IP block                           dynamics
                                            IP blocks             entropies               IP blocks   analysis
                                 IP block             computation          identification                        statistics
                IP prefix                                                                               and
Routing
Routing                         selection
                table                                                                                 pruning
 tables
 tables


          IP Prefix          Origin AS
       8.0.0.0/8               3356
    4.23.114.0/24              19908
   212.213.0.0/16              5515
               ……               ……


                                                                                                 MSR-SVC                  10
                                            Selected block




                                           IP address

                                                                                       MSR-SVC                  11




 User
 User     IP-user
 login
 login    mappings                                                                                     Adjusted
                                                                                            Dynamic
  trace
  trace                 Multi-    Candidate IP usage-       IP     Dynamic Dynamic
                                                                                               s
                                                                                                       blocks and
                         user                 entropy               IP block                           dynamics
                                  IP blocks             entropies               IP blocks   analysis
                       IP block             computation          identification                        statistics
          IP prefix                                                                           and
Routing
Routing               selection
          table                                                                             pruning
 tables
 tables




                                                                                       MSR-SVC                  12
    }     Matrix representation Amxn
          ◦ m IPs, n users:
    }     For IPj, normalized usage-entropy




                                                                                       MSR-SVC                  13




 User
 User     IP-user
 login
 login    mappings                                                                                     Adjusted
                                                                                            Dynamic
  trace
  trace                 Multi-    Candidate IP usage-       IP     Dynamic Dynamic
                                                                                               s
                                                                                                       blocks and
                         user                 entropy               IP block                           dynamics
                                  IP blocks             entropies               IP blocks   analysis
                       IP block             computation          identification                        statistics
          IP prefix                                                                           and
Routing
Routing               selection
          table                                                                             pruning
 tables
 tables




                                                                                       MSR-SVC                  14
    }     Median filter to smooth the entropy signal




                                                                                       MSR-SVC                  15




 User
 User     IP-user
 login
 login    mappings                                                                                     Adjusted
                                                                                            Dynamic
  trace
  trace                 Multi-    Candidate IP usage-       IP     Dynamic Dynamic
                                                                                               s
                                                                                                       blocks and
                         user                 entropy               IP block                           dynamics
                                  IP blocks             entropies               IP blocks   analysis
                       IP block             computation          identification                        statistics
          IP prefix                                                                           and
Routing
Routing               selection
          table                                                                             pruning
 tables
 tables




                                                                                       MSR-SVC                  16
    wo Metrics for each IP
    }
    umber of users
    }
    nter-user switch time
          ◦ Interval between two consecutive users



    emove hyper-dynamic proxy clusters
          ◦ E.g. > 1000 users
                 < 5 min inter-user switch time                                        MSR-SVC                  17




 User
 User     IP-user
 login
 login    mappings                                                                                     Adjusted
                                                                                            Dynamic
  trace
  trace                 Multi-    Candidate IP usage-       IP     Dynamic Dynamic
                                                                                               s
                                                                                                       blocks and
                         user                 entropy               IP block                           dynamics
                                  IP blocks             entropies               IP blocks   analysis
                       IP block             computation          identification                        statistics
          IP prefix                                                                           and
Routing
Routing               selection
          table                                                                             pruning
 tables
 tables




                                                                                       MSR-SVC                  18
    }   Input data: Hotmail user-login trace for 08/2006
        q 155 million IPs, 20167 Autonomous Systems (ASes)


    }   Output: dynamic IP address blocks
        q 102.9 million IPs, 958822 blocks, 5891 ASes


           IPs seen from user-login                Unseen IP

        Input                      155

Output                    95.2           7.7



                                                                        MSR-SVC        19




}       Two methods/data sources
    q Compare with public databases (Dynablock) - 49.81%
         qManually maintained, with 193 million dynamic IPs
    q Reverse DNS lookup the rest - 50.19%
         qRandom sampling (1%)
                                                      Unknown

                        190-50-156-163.speedy.com.ar        No
                                          Static            inference
                61.17.37-29-
                bb.static.vsnl.eth.net                   rDNS lookup    Overlap with
                                                          50.19%        Dynablock
                                         Likely dynamic No record       49.81%

                                                         With Dynamic
                                                         Keyword
                             dialup1-16.kvvi.net
                             ppp79-73.dsl-chn.eth.net

                                                                Verified dynamic
}    Top 10 ASes with most dynamic IPs

             # IP
    AS #   (Million)          AS Name                             Country
                                                                                                                         UK: BTnet

7132        5.378      SBC Internet services                       USA                      SBC
                                                                                                       Germany: Deutsche Telecom
                                                                                                           France: France Telecom
                                                                                    US: Verizon       Spain: Telefonica-Data-Espana
3320        4.809       Deutsche Telecom AG                      Germany                 Quest
                                                                                         Level3                                        China: Chinanet

3215        4.679       France Telecom                            France
                                                                                    Mexico: Uninet
4134        4.538       Chinanet-backbone                         China
                        Verizon Internet
19262       4.081      services                                    USA
                        Telefonica-Data-
3352        3.435      Espana                                     Spain
 209        2.431       Quest                                      USA
                        Level3                                                      1 million IPs


3356        2.098      Communications                              USA
2856        1.942      BTnet UK Reg. network                        UK
8151        1.913      Uninet S.A. de. C.V.                       Mexico




                                                                                                                          MSR-SVC                        21




      }    Inter-user switch time
           q Exhibits a large variation
           q Over 30% are between 1-3 days
                                                         35
                                 (%) Percentage of IPs




                                                         30

                                                         25

                                                         20

                                                         15

                                                         10

                                                          5

                                                          0
                                                              < 5 min 5-60   1-12    12-24           1-3         3-7             >7
                                                                       min   hour     hour           day         day             day

                                                                                                                          MSR-SVC                        22
 }    Data: Incoming Hotmail server log (MSBL) /Jun-Sep 2006
     q Sessions: mail messages per IP per day

                                                % of sessions sent   % of all      % of user-
                     Total IPs    IPs in MSBL
                                                   purely spam        spam       reported spam

     UDmap IP       102,941,051   24,115,951          96.2%          42.2%          40.3%

 Dynablock IP       193,808,955   15,773,646          95.6%          30.4%          29.3%

  UDmap and
                    242,248,012   27,163,219          95.7%          50.7%          49.3%
 Dynablock IP

                       2^32-
 Likely static IP                 13,445,940          31.4%          49.3%          50.7%
                    242,248,012




                                                                       MSR-SVC               23




          UDMap                                         State of the Art
q Automatic approach                            q   Manually maintained
q Generally applicable                               q reverse-DNS records

q No need cooperation                                q Dynablock database
 among ISPs
q Provide fine-grained                          q Incomplete/obsolete
 dynamics info                                  information without ISP’s
                                                cooperation




                                                                       MSR-SVC               24
    }   Implemented using DryadLinQ
        ◦ DryadLinQ (MSR-SVC): a new data processing
          environment for large PC clusters.

        Data Set                       Aug 2006               Apr~Jun 2007
        Data Size                      85G                    530G *
        IP in data                     155M                   217M
        Running time                   2 hours                14 hours
        UDmap IPs                      102M                   175M (87.5M new)
        Inbound spam                   42.2%                  50.0%
        detected

        * Apr-June trace contains more detailed timestamp information
                                                                        MSR-SVC   25




}   The fraction of dynamic IP addresses is non-trivial !
}   IP dynamics is a discriminating feature for spam
    detection

}   Other applications:
    q   Selective web crawling
    q   Phishing site detection
    q   Botnet membership detection
    q   Targeted online advertisement




                                                                        MSR-SVC   26
}   Questions?




                 MSR-SVC   27

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:120
posted:5/17/2011
language:English
pages:14