; spam
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

spam

VIEWS: 29 PAGES: 38

  • pg 1
									    Spam

 Sagar Vemuri

    slides courtesy:
Anirudh Ramachandran
    Nick Feamster
Agenda
• Understanding Spam
   –   What is Spam?
   –   Statistics
   –   Types of Spam
   –   Spamming Methods
   –   Spam Mitigation Methods
• Understanding the Network-level behavior of spammers
   –   Data Collection Methods
   –   Statistics
   –   BGP Spectrum Agility, Botnets, Harvesting
   –   Drawbacks


                                                         2
What is Spam?
 • Unsolicited commercial message
 • “Spam is e-mail that is both unsolicited by the
   recipient and sent in substantively identical form
   to many recipients”
 • As of last quarter of 2005, estimates indicate
   that about 80-85% of all email is spam
 • Microsoft founder Bill Gates receives four million
   e-mails per year, most of them being spam



                                                        3
Some statistics
 • 1978 - An e-mail spam is sent to 600 addresses.
 • 1994 - First large-scale spam sent to 6000
   newsgroups, reaching millions of people
 • 2005 - (June) 30 billion per day
 • 2006 - (June) 55 billion per day
 • 2006 - (December) 85 billion per day
 • 2007 - (February) 90 billion per day



                                                     4
Products advertised
 •   Porn site subscriptions
 •   Prescription drugs
 •   Printer ink cartridges
 •   Counterfeit software
 •   Mortgage offers
 •   Fake diplomas from non-existent or non-
     accredited universities



                                               5
Types of Spam
 • Email spam
 • IM spam
   – Also called „Spim‟
   – 1.2 billion spam IM messages in 2004
 • SMS spam
   – Also called „m-spam‟
 • Image spam
   – Text of a msg stored as GIF or JPEG and displayed in
     the email
   – Prevents text based spam filters from detecting it


                                                            6
Spamming Methods
 • Direct spamming
   – By purchasing upstream connectivity from “spam-
     friendly ISPs”
 • Open relays and proxies
   – Mail servers that allow unauthenticated Internet hosts
     to connect and relay mail through them
 • Botnets
   – Collection of machines acting under one centralized
     controller. Eg: Bobax
 • BGP Spectrum Agility
   – IP hijacking techniques

                                                              7
Spam Mitigation
 • Filtering
    – Based on content
    – Use features in email‟s headers and body
    – Eg: SpamAssassin
 • Blacklisting:
    – IP addresses of known spam sources are used to
      classify email
    – More than 30 widely used blacklists available today




                                                            8
Content-based Filtering
 Content-based properties are malleable
    – Low cost to evasion: Spammers can easily alter features of an
      email‟s content
    – Customization: Customized emails are easy to generate
    – High cost to filter maintainers: Filters must be continually
      updated as content-changing techniques become more
      sophisticated


 • Content-based filters are applied at the destination
    – Too little, too late: Wasted network bandwidth, storage, etc.
      Many users receive (and store) the same spam content



                                                                      9
DNS Blacklisting
 • Aggressive filters have many false positives
 • One list might not have all the information about
   spamming IPs
 • Need to consult multiple lists




                                                       10
Network-level Spam Filtering
 • Network-level properties are harder to change
   than content
 • Network-level properties
   – IP addresses and IP address ranges (prevalence)
   – Change of addresses over time (persistence)
   – Distribution according to operating system, country
     and AS
   – Characteristics of botnets and short-lived route
     announcements
 • Help develop better spam filters

                                                           11
Spamming Patterns
 Network-level properties of spam arrival
    – From where?
       • What IP address space?
       • ASes?
       • What OSes?

    – What techniques?
       • Botnets
       • Short-lived route announcements
       • Shady ISPs

    – Capabilities and limitations?
       • Bandwidth
       • Size of botnet army


                                            12
Understanding the Network-
Level Behavior of Spammers
       Anirudh Ramachandran
            Nick Feamster
           (Georgia Tech)
Data Collection
• Primary dataset: Actual spam email messages
  collected at a large spam sinkhole
• Corpus of email logs from a large email provider
• Command and Control traffic from a Bobax botnet
• BGP route advertisements from an upstream
  border router in the same network
• Also capturing traceroutes, DNSBL results, passive
  TCP host fingerprinting simultaneous with spam
  arrival


                                                       14
Data Collection Setup


                        Exchange 1




                                     15
Data collected when the spam is
received
 • IP address of the relay that established the
   SMTP connection to the sinkhole
 • Traceroute to that IP address, to help us
   estimate the network location of the mail relay
 • Passive “p0f” TCP fingerprint, to determine the
   OS of the mail relay
 • Result of DNS blacklist (DNSBL) lookups for that
   mail relay at eight different DNSBLs



                                                      16
MailAvenger
 • Highly configurable SMTP server that collects many
   useful statistics




                                                        17
Spam per Day
 • Both the amount of spam and the number of
   distinct IP addresses increase over time




                                               18
IP Address Distribution
 • The majority of spam is sent from a relatively
   small fraction of IP address space
 • The distribution is the same for legitimate mail




                                                      19
AS distribution
 • Large fraction of spam received from just a
   handful of ASes
 • 12% of all received spam originates in just two
   ASes (from Korea and China)
 • Top 20 ASes are responsible for sending nearly
   37% of all spam
 • Spam filtering efforts might be better if focussed
   on identifying high-volume, persistent groups of
   spammers by AS number rather than on
   blacklisting individual IP addresses.

                                                        20
Distribution across ASes
      Still about 40% of spam coming from the U.S.




                                                     21
Distribution Across Operating Systems



                        About 4% of known hosts
                        are non-Windows.

                        These hosts are
                        responsible for about 8%
                        of received spam.




                                                   22
Persistence
 • More than half of the client IPs appear less than twice
 • 85% of the client IP addresses sent less than 10 emails to the
   sinkhole




                                                                    23
Effectiveness of Blacklists
 • Nearly 80% of all spam received from mail
   relays appear in at least one of eight blacklists
 • > 50% of spam was listed in two or more
   blacklists
 • If spammers use BGP spectrum agility, then
   50% of the IP addresses do not appear in any
   blacklist
 • About 30% appear in more than one blacklist



                                                       24
Effectiveness of Blacklists




                              25
Effectiveness of Blacklists




                              26
Spam From Botnets




                    27
Most Bot IP addresses do not return

             Percentage of bots




                                       65% of bots only send mail to a
                                       domain once over 18 months




                                  Lifetime (seconds)


 Collaborative spam filtering seems to be helping track bot IP addresses

                                                                           28
Most Bots Send Low Volumes of Spam
                    Most bot IP addresses send very little spam, regardless
                          of how long they have been spamming…
   Amount of Spam




                                       Lifetime (seconds)
                                                                              29
BGP Spectrum Agility
 • Log IP addresses of SMTP relays
 • Correlate BGP route advertisements seen at network
   where spam trap is co-located.

                                    A small club of persistent
                                   players appears to be using
                                         this technique.

                                      Common short-lived
                                      prefixes and ASes
             ~ 10 minutes               61.0.0.0/8 4678
                                        66.0.0.0/8 21562
                                        82.0.0.0/8 8717

                                 Somewhere between 1-10% of all
                                  spam (some clearly intentional,
                                     others might be flapping)    30
Why Such Big Prefixes?
 • Flexibility: Client IPs can be scattered
   throughout dark space within a large /8
    – Same sender usually returns with different IP
      addresses


 • Visibility: Route typically won‟t be filtered (nice
   and short)




                                                         31
Characteristics of IP-Agile Senders

 • IP addresses are widely distributed across the /8
   space
 • IP addresses typically appear only once at the
   sinkhole
 • Depending on which /8, 60-80% of these IP
   addresses were not reachable by traceroute when
   spot-checked
 • Some IP addresses were in allocated, albeit
   unannounced space
 • Some AS paths associated with the routes
   contained reserved AS numbers
                                                       32
Length of short-lived BGP epochs




                                   33
The Effectiveness of Blacklisting
     Fraction of all spam received




                                                           ~95% of bots listed in
                                                           one or more blacklists

                                                           ~80% listed on average


                                                                Only about half of the IPs
                                                                spamming from short-lived
                                                                BGP are listed in any
                                                                blacklist

                                     Number of DNSBLs listing this spammer

    Spam from IP-agile senders tend to be listed in fewer blacklists                         34
Harvesting
 • Tracking Web-based harvesting
   – Register domain, set up MX record
   – Post, link to page with randomly generated email
     addresses
   – Log requests
   – Wait for spam




                                                        35
Harvesting
 • Domain was registered on November 19, 2005
 • SMTP server was setup on December 6, 2005
 • Email harvesting occurred on January 16, 2006
 • First spam came on January 20, 2006 (phishing
   attack)
 • The harvester and the spammers were not in the
   same AS
 • Attack was coordinated between two machines
     – One machine sent to half of the addresses listed
       alphabetically, the other machine to the other half

                                                             36
Spam Mitigation
 • Spam filtering requires a better notion of host identity
    – IP address is not enough to identify an host
 • IP address range based filtering is more effective than
   single IP address based filtering
    – Some IP address ranges send more spam than others
 • Securing the Internet routing is necessary for bolstering
   identity and traceability of email senders
    – BGP spectrum agility method can be used more
 • Network-level properties can make current spam filters
   more effective



                                                               37
Conclusion
 •   A detailed study examining network level properties
 •   Reveals botnet characteristics in sending spam
 •   Shows the existence of BGP spectrum agility method
 •   Datasets are substantial, but not comprehensive
     – Comparison between spam and legitimate mail is questionable
     – Comparison between spam and legitimate mail of a single
       domain, repeating this using several domains can be better?
     – Analysis of IP addresses and address ranges fails to draw
       important conclusions
 • Does not analyze other types of spam, apart from email
   spam.
 • Data Analysis from a single vantage point

                                                                     38

								
To top