Anti-Spam Solutions and Security

Document Sample
Anti-Spam Solutions and Security Powered By Docstoc
					Anti-Spam Solutions and

      Directed by Dr. Ravi Mukkamala
         Presented By Ming-Chin Chen
  93% users: spam are ANNOYING!

  20 billion US dollars each year in lost productivity

  Today more than 50% of all mails worldwide are spam mails.

  The definition of spam: The proliferation of unsolicited
  commercial e-mails(UCE), including
  1. Commercial advertisements.
  2. Viruses
  3. E-mails containing hostile program or linkage
 Security issues:
 1. Identity theft: Phishing and scams are distributed as spam,
 directly leading to identity theft and fraud
 2. Combining exploits and spam
 3. Combining viruses and spam
Anti-Spam solutions
  Filter: Rely on black-lists, white-lists and handcrafted rules that
  search for particular keywords, phrases, or suspicious pattern in
  the headers.
Anti-Spam Solutions
  Reverse lookup: Nearly all spam uses forged sender(“From”)
  addresses; very few spam emails use the sender‟s true email
  address. Furthermore, most forged email addresses appear to
  com from trusted domains

  In an effort to limit the ability to forge sender addresses, a
  number of proposed system have surfaced for validating a
  sender‟s email. These systems include:
  Reverse Mail Exchanger(REM)
  Sender Permitted Form(SPF)
  Designated Mailers Protocol(DMP)
Anti-Spam Solutions
  Challenges: Spam senders use automated bulk-mailing
  programs to generate millions of emails per day. Challenges
  attempt to impede bulk-senders by slowing the bulk-mailing

  There are two main types of challenges: challenge-response and
  proposed computational challenges:
  Computational Challenge

  Filters are used by a recipient system to identify and organize
  spam. There are many different types of filter systems including:
      Word lists.
      Black-white lists.
      Hash tables.
      Artificial Intelligence and Probabilistic systems.
         Bayesian filtering technology. How does it work?

     1. Bypassing filters
     2. False-positive
     3. Filter reviewing
How does Bayesian work?
  Simple filters getting less useful. Statistical analysis of spam
  revealed surprising „signatures‟ in spam, e.g. „ff0000‟(red in
  HTML hexadecimal color coding)

  Bayesian Decision Theory: Make a decision based on previous
  information / „training‟. („a priori‟ in the world of maths)

 Say we see word „click‟, we classify email as spam if
 probability(spam | „click‟) > probability(non-spam | „click‟)
 Manually classify some spams and non-spams, to build up the
 database of words likely to indicate spam, or likely to indicate

 Test a new arrival email against the spam word databse, using
 Bayesian decision theory maths.

 If the automatic classification is correct, we add this latest email
 to the database(stronger database).

 If the automatic classification is incorrect, human needs to
 intervene(or database gets weaker).
Reverse lookup
More complicated reverse lookup:
  1. DKIM(DomainKeys Identified Mail): Derived from Yahoo
     DomainKeys and Cisco identified Internet Mail
     DKIM = Message header authentication
           = DNS identifiers + Public Keys in DNS

  2. SenderID: Domain administrators publish Sender of Policy
     Framework records in the Domain Name System which
     identify authorized outbound email servers. Receiving email
     systems verify whether messages originate from properly
     authorized outbound email servers.

 3. FairUCE: Stands for Fair use of Unsolicited Commercial
    Email. Find a relationship between the envelope sender's
    domain and the IP address of the client delivering the mail,
    using a series of cached DNS look-ups.
    Relation not found -> Send a user-customizable
While these solutions are viable in certain situations, they share
  some significant limitations:
  1. Host-less and vanity domains
  2. Mobile computing
1. Challenge-Response(CR): The belief is that spam senders using
     fake sender email addresses will never receive the challenge,
     and spam senders using real email addresses will not be able
     to reply to all of the challenges.
         a. CR deadlock
         b. Automated systems
         c. Interpretation challenges
2. Computational Challenge: Most CC systems use complex
     algorithms that are intended to take time. For a single user,
     the time is unlikely to be noticed. But for a bulk mailer such
     as a spam sender, the small delays add up, making it take too
     long to send millions of emails.
        a. Unequal taxation
        b. Mailing lists
        c. Robot armies
        d. Legal robot armies
A few solutions have been proposed that use cryptography to
   validate the spam sender. Essentially, these systems use
   certificates to perform the authentication. Without a proper
   certificate, a forged email can be readily identified. Some
   proposed cryptographic solutions include:
     1. AMTP
      2. MTP
      3. S/MIME
The existing mail protocol (SMTP) has no explicit support for
   cryptographic authentication. Some of these proposed solutions
   extend SMTP (e.g., S/MIME, PGP/MIME, and AMTP), while
   others aim to replace the existing mail infrastructure (e.g., MTP).
Cryptography does not validate that the email address is real --
   they only validate that the sender had the correct keys for the
   email. This creates a few issues:
     1. Automated abuse
     2. Usability issues
 1. Using hybrid strategies.
 2. Legislate Anti-Spam Regulations.

  Viable in limited circumstances with significant limitations.
  Impede regular users or spammers?
  A good solution today might not be a good solution tomorrow.
  Dr. Neal Krawetz, Anti-Spam Solutions and Security
  Better Bayesian Filtering
  Anti-Phishing Working Group

Shared By: