Anti-Spam Solutions and
Directed by Dr. Ravi Mukkamala
Presented By Ming-Chin Chen
93% users: spam are ANNOYING!
20 billion US dollars each year in lost productivity
Today more than 50% of all mails worldwide are spam mails.
The definition of spam: The proliferation of unsolicited
commercial e-mails(UCE), including
1. Commercial advertisements.
3. E-mails containing hostile program or linkage
1. Identity theft: Phishing and scams are distributed as spam,
directly leading to identity theft and fraud
2. Combining exploits and spam
3. Combining viruses and spam
Filter: Rely on black-lists, white-lists and handcrafted rules that
search for particular keywords, phrases, or suspicious pattern in
Reverse lookup: Nearly all spam uses forged sender(“From”)
addresses; very few spam emails use the sender‟s true email
address. Furthermore, most forged email addresses appear to
com from trusted domains
In an effort to limit the ability to forge sender addresses, a
number of proposed system have surfaced for validating a
sender‟s email. These systems include:
Reverse Mail Exchanger(REM)
Sender Permitted Form(SPF)
Designated Mailers Protocol(DMP)
Challenges: Spam senders use automated bulk-mailing
programs to generate millions of emails per day. Challenges
attempt to impede bulk-senders by slowing the bulk-mailing
There are two main types of challenges: challenge-response and
proposed computational challenges:
Filters are used by a recipient system to identify and organize
spam. There are many different types of filter systems including:
Artificial Intelligence and Probabilistic systems.
Bayesian filtering technology. How does it work?
1. Bypassing filters
3. Filter reviewing
How does Bayesian work?
Simple filters getting less useful. Statistical analysis of spam
revealed surprising „signatures‟ in spam, e.g. „ff0000‟(red in
HTML hexadecimal color coding)
Bayesian Decision Theory: Make a decision based on previous
information / „training‟. („a priori‟ in the world of maths)
Say we see word „click‟, we classify email as spam if
probability(spam | „click‟) > probability(non-spam | „click‟)
Manually classify some spams and non-spams, to build up the
database of words likely to indicate spam, or likely to indicate
Test a new arrival email against the spam word databse, using
Bayesian decision theory maths.
If the automatic classification is correct, we add this latest email
to the database(stronger database).
If the automatic classification is incorrect, human needs to
intervene(or database gets weaker).
More complicated reverse lookup:
1. DKIM(DomainKeys Identified Mail): Derived from Yahoo
DomainKeys and Cisco identified Internet Mail
DKIM = Message header authentication
= DNS identifiers + Public Keys in DNS
2. SenderID: Domain administrators publish Sender of Policy
Framework records in the Domain Name System which
identify authorized outbound email servers. Receiving email
systems verify whether messages originate from properly
authorized outbound email servers.
3. FairUCE: Stands for Fair use of Unsolicited Commercial
Email. Find a relationship between the envelope sender's
domain and the IP address of the client delivering the mail,
using a series of cached DNS look-ups.
Relation not found -> Send a user-customizable
While these solutions are viable in certain situations, they share
some significant limitations:
1. Host-less and vanity domains
2. Mobile computing
1. Challenge-Response(CR): The belief is that spam senders using
fake sender email addresses will never receive the challenge,
and spam senders using real email addresses will not be able
to reply to all of the challenges.
a. CR deadlock
b. Automated systems
c. Interpretation challenges
2. Computational Challenge: Most CC systems use complex
algorithms that are intended to take time. For a single user,
the time is unlikely to be noticed. But for a bulk mailer such
as a spam sender, the small delays add up, making it take too
long to send millions of emails.
a. Unequal taxation
b. Mailing lists
c. Robot armies
d. Legal robot armies
A few solutions have been proposed that use cryptography to
validate the spam sender. Essentially, these systems use
certificates to perform the authentication. Without a proper
certificate, a forged email can be readily identified. Some
proposed cryptographic solutions include:
The existing mail protocol (SMTP) has no explicit support for
cryptographic authentication. Some of these proposed solutions
extend SMTP (e.g., S/MIME, PGP/MIME, and AMTP), while
others aim to replace the existing mail infrastructure (e.g., MTP).
Cryptography does not validate that the email address is real --
they only validate that the sender had the correct keys for the
email. This creates a few issues:
1. Automated abuse
2. Usability issues
1. Using hybrid strategies.
2. Legislate Anti-Spam Regulations.
Viable in limited circumstances with significant limitations.
Impede regular users or spammers?
A good solution today might not be a good solution tomorrow.
Dr. Neal Krawetz, Anti-Spam Solutions and Security
Better Bayesian Filtering
Anti-Phishing Working Group http://www.antiphishing.org/