Anti-Spam Solutions and
Security
Directed by Dr. Ravi Mukkamala
Presented By Ming-Chin Chen
10/19/2005
Introduction
93% users: spam are ANNOYING!
20 billion US dollars each year in lost productivity
Today more than 50% of all mails worldwide are spam mails.
The definition of spam: The proliferation of unsolicited
commercial e-mails(UCE), including
1. Commercial advertisements.
2. Viruses
3. E-mails containing hostile program or linkage
Continue…
Security issues:
1. Identity theft: Phishing and scams are distributed as spam,
directly leading to identity theft and fraud
2. Combining exploits and spam
3. Combining viruses and spam
Anti-Spam solutions
Filter: Rely on black-lists, white-lists and handcrafted rules that
search for particular keywords, phrases, or suspicious pattern in
the headers.
Anti-Spam Solutions
Reverse lookup: Nearly all spam uses forged sender(“From”)
addresses; very few spam emails use the sender‟s true email
address. Furthermore, most forged email addresses appear to
com from trusted domains
In an effort to limit the ability to forge sender addresses, a
number of proposed system have surfaced for validating a
sender‟s email. These systems include:
Reverse Mail Exchanger(REM)
Sender Permitted Form(SPF)
Designated Mailers Protocol(DMP)
Anti-Spam Solutions
Challenges: Spam senders use automated bulk-mailing
programs to generate millions of emails per day. Challenges
attempt to impede bulk-senders by slowing the bulk-mailing
process.
There are two main types of challenges: challenge-response and
proposed computational challenges:
Challenge-Response
Computational Challenge
Cryptography
Filters
Filters are used by a recipient system to identify and organize
spam. There are many different types of filter systems including:
Word lists.
Black-white lists.
Hash tables.
Artificial Intelligence and Probabilistic systems.
Bayesian filtering technology. How does it work?
Disadvantages:
1. Bypassing filters
2. False-positive
3. Filter reviewing
How does Bayesian work?
Simple filters getting less useful. Statistical analysis of spam
revealed surprising „signatures‟ in spam, e.g. „ff0000‟(red in
HTML hexadecimal color coding)
Bayesian Decision Theory: Make a decision based on previous
information / „training‟. („a priori‟ in the world of maths)
Say we see word „click‟, we classify email as spam if
probability(spam | „click‟) > probability(non-spam | „click‟)
Continue…
Manually classify some spams and non-spams, to build up the
database of words likely to indicate spam, or likely to indicate
non-spam.
Test a new arrival email against the spam word databse, using
Bayesian decision theory maths.
If the automatic classification is correct, we add this latest email
to the database(stronger database).
If the automatic classification is incorrect, human needs to
intervene(or database gets weaker).
Reverse lookup
More complicated reverse lookup:
1. DKIM(DomainKeys Identified Mail): Derived from Yahoo
DomainKeys and Cisco identified Internet Mail
DKIM = Message header authentication
= DNS identifiers + Public Keys in DNS
2. SenderID: Domain administrators publish Sender of Policy
Framework records in the Domain Name System which
identify authorized outbound email servers. Receiving email
systems verify whether messages originate from properly
authorized outbound email servers.
Continue…
3. FairUCE: Stands for Fair use of Unsolicited Commercial
Email. Find a relationship between the envelope sender's
domain and the IP address of the client delivering the mail,
using a series of cached DNS look-ups.
Relation not found -> Send a user-customizable
challenge/response
Continue…
While these solutions are viable in certain situations, they share
some significant limitations:
1. Host-less and vanity domains
2. Mobile computing
Challenges
1. Challenge-Response(CR): The belief is that spam senders using
fake sender email addresses will never receive the challenge,
and spam senders using real email addresses will not be able
to reply to all of the challenges.
Limitations:
a. CR deadlock
b. Automated systems
c. Interpretation challenges
Continue…
2. Computational Challenge: Most CC systems use complex
algorithms that are intended to take time. For a single user,
the time is unlikely to be noticed. But for a bulk mailer such
as a spam sender, the small delays add up, making it take too
long to send millions of emails.
Limitations:
a. Unequal taxation
b. Mailing lists
c. Robot armies
d. Legal robot armies
Cryptography
A few solutions have been proposed that use cryptography to
validate the spam sender. Essentially, these systems use
certificates to perform the authentication. Without a proper
certificate, a forged email can be readily identified. Some
proposed cryptographic solutions include:
1. AMTP
2. MTP
3. S/MIME
The existing mail protocol (SMTP) has no explicit support for
cryptographic authentication. Some of these proposed solutions
extend SMTP (e.g., S/MIME, PGP/MIME, and AMTP), while
others aim to replace the existing mail infrastructure (e.g., MTP).
Continue…
Cryptography does not validate that the email address is real --
they only validate that the sender had the correct keys for the
email. This creates a few issues:
1. Automated abuse
2. Usability issues
Conclusion
1. Using hybrid strategies.
2. Legislate Anti-Spam Regulations.
Doubts:
Viable in limited circumstances with significant limitations.
Impede regular users or spammers?
A good solution today might not be a good solution tomorrow.
References
Dr. Neal Krawetz, Anti-Spam Solutions and Security
Better Bayesian Filtering
http://www.paulgraham.com/better.html
Anti-Phishing Working Group http://www.antiphishing.org/
http://antispam.yahoo.com/domainkeys
http://www.microsoft.com/senderid
http://www.alphaworks.ibm.com/tech/fairuce
http://sendmail.net/dk-milter/