Spam 3.0
The Zombie Image Generation
Once a mere nuisance, spam has exploded into one of the biggest threats to email. The
immense spam burden is clogging networks and choking the flow of legitimate messages.
Spam filters are widely available and implemented, so why are so many still so sorely losing
the battle? This document describes the technical prowess of the third generation of spam
and the problems it creates for the anti-spam industry.
Spam 1.0
Spam originally started as nothing more than mass-distributed advertisements, much like
the junk-mail sent by regular post. The email messages themselves were simple text and
perhaps html or graphic design. All spammers of yesteryear had to do was compile as many
email addresses as possible and send out their unwanted advertisements for everything from
mortgages to knock-off pharmaceuticals. The nuisance was minimal and most users just
deleted spam from their inbox. Rudimentary methods like honeypots and content filtering
were developed to catch this primitive form of spam.
Spam 2.0
As the early anti-spam solutions started cutting into spammers profits, spammers developed
tricks to bypass spam filters and reach inboxes. Spam content began appearing with slight
misspellings or use of mixed character sets. Fooling content filters was as easy as changing a
letter to form ‘V1agra’. The anti-spam solutions responded to the new challenge by
expanding content-filter rules to include misspellings and added methods of scoring message
elements such as Bayesian analysis to try to calculate the likelihood that an email is spam.
Since spammers typically sent their mail from a single IP address or series of IP addresses,
blacklists (known as RBLs) were able to block some spam simply
by blocking known spam sending IP addresses. International Spam percentages
spam joined the fray, so some anti-spam solutions incorporated climbed 30% in 2006,
foreign-language dictionaries and recruited teams of spam becoming 87% of all
analysts around the world. It is easy to see how such a cocktail email messages by the
approach becomes difficult to scale as spam complexity beginning of 2007.
increases.
Spam 3.0
Today’s spam makes use of two key elements that enable it to bypass many anti-spam
filters. The first relates to content: rather than rely on spelling or html tricks to hide the
spammers’ message, they started embedding graphics into the email messages, a technique
Spam 3.0
known as “image-based spam.” The second element is the distribution method: rather than
send spam from “spammy” IP addresses that can easily be blocked by blacklists, spammers
have started leveraging armies of zombies or bots. These bots are typically home PCs that
have been infected by malware and as a result can be instructed remotely to send whatever
the spammers’ desire. The new generation of spam, with endless hijacked computational
power available due to vast botnets, is able to employ so many mathematically sophisticated
tricks that blacklists and rudimentary automated solutions are no longer able to defend
against the onslaught of spam.
Spam 3.0 www.commtouch.com
Image Based Spam
Image-based spam was first developed to defeat the most
basic type of anti-spam technology – content filters. Content Image spam accounts for
filters attempt to detect spam by searching email messages for 35% of messages and 70%
certain words in the text. Plain text messages blatantly touting of bandwidth taken by
‘Viagra’ and ‘cheap meds’ are blocked by content filters, but spam
spammers quickly created a way to take the text out of spam –
images. The use of images blinded content-filters. Slightly more sophisticated heuristic rule-
based anti-spam engines are better at detecting a few types of image-based spam. Rules can
be implemented to block rare image formats or images of a certain size, however these
generic rules only block some image-based spam and typically lead to a large number of
false positives or classification errors.
Image-spam often employs a myriad of advanced randomization techniques to muddle anti-
spam engines.
Advanced Image Spam Techniques
Random pixels in the background of
the image (they appear to be ‘dirt’ in
the background)
Changes to border color, background
color, font color
Broken puzzle pieces with multiple
images rendered together to appear as
a single image of text
Animated GIF images
Use of rare image formats, e.g. PNG
“Snowflake” patterns spread
throughout the image
Wavy fonts
Fonts with multiple colors
Connected letters
Additional of text beneath the image
Standard Image Spam Blocking Techniques
Filters that block based on checksum – this technique calculates the unique
‘checksum’ of each spam message, blocking others that appear with the same
checksum. They are easily bypassed by techniques that randomize elements such as
pixels, border or background color, or even animation.
Filters that block based on image size – these are also bypassed easily by
randomizing techniques, and are prone to high false positive rates.
Spam 3.0 Page 2
www.commtouch.com
Optical Character Recognition (OCR)
- this technique attempts to translate
the image of text into actual text,
and then run it through a content
filter. Most images now incorporate
tricks which easily fool OCR, such as
connected letters, wavy-lines of text,
and multi-colored fonts. OCR is also
notoriously resource-intensive, and
with the huge amounts of image
spam being sent daily, it is an
unrealistic blocking method at best.
Advanced mathematical algorithms –
these techniques are too numerous
to outline here, however they are often fooled by the “patchwork” and “snowflake”
backgrounds built into many image-based spam campaigns.
Zombies Expand Spam Distribution and Variety
In the second key Spam 3.0 technique, spammers utilize huge networks of globally
distributed botnets -- some containing as many as 200,000 compromised zombie computers
– in order to distribute spam.
Botnets work by taking over large numbers of PCs and using them as bots to launch
massive, short-wave attacks. Since the distributed attacks are made from millions of
otherwise innocent computers, no specific static IP address can be identified and blocked.
Zombies are usually activated for brief outbreaks lasting an average of 2-3 hours, and then
they are deactivated by their controllers until the next attack.
Zombie Facts
Number of Zombie IP addresses active at 6 – 8 million
any one time
Number of new zombies per day (new bots, 500,000
dynamically changed IP addresses)
Length of average zombie-driven spam or 2 – 3 hours
malware outbreak
Number of zombies used per outbreak 10,000 – 200,000
Number of email messages a typical botnet 160 million messages in just 2 hours
sends
Number of email messages a group of 1 billion messages in just a few hours
botnets can send, when working in concert
Spam 3.0 Page 3
www.commtouch.com
Spam and malware join forces
The dual vices of spam and malware have joined forces to get the upper hand against the
technology solutions designed to defend against them. Malware know as Trojan downloaders
infect PCs and turn them into spam sending zombies, as outlined in the diagram below. The
Trojan downloaders are sent out as the payload in spam messages, utilizing the same types
of botnets as regular spam. Only an anti-spam or anti-virus solution that can break this cycle
will be successful in preventing new zombies from being recruited to the spammers’ side.
The Vicious Cycle of Spam & Malware
Because spam and malware have become so intertwined, it is easy to understand the viral
spread of zombie botnets, shown in the diagram below. Once a single computer is
compromised, it can be used to distribute spam and malware. When used to distribute
malware, most often these are Trojan programs that install malicious code on the
unsuspecting user’s machine, turning it into a new spam zombie that can send more spam or
malware.
85% of global spam is
Botnets Enable Sophisticated Spam Tricks sent by zombies
Spam is winning because it has grown increasingly sophisticated
in order to circumvent the very technologies designed to defend against it. Today’s
spammers are masters at penetrating traditional defenses. Image-based spam and
botnets are two of the most sophisticated spammer techniques and the primary reasons so
much spam is easily bypassing traditional anti-spam solutions. The ever-growing zombie
botnets allow spammers to distribute outbreaks and maneuver past static blacklists. Zombies
also put virtually unlimited computing power at the fingertips of spammers, giving them the
CPU required to randomly generate millions of unique spam images, not to mention the
bandwidth to send them. The spammers use sophisticated randomization and other
techniques to generate massive amounts of unique images so complex that they are
undetectable by many anti-spam solutions.
Spam 3.0 Page 4
www.commtouch.com
The Propagation of Zombies & Spam
Real-time Network-based Solution
Commtouch Recurrent Pattern Detection (RPD) Technology remains robust in the face of
emerging spam techniques, throughout the history of Spam 1.0, 2.0 and 3.0, and is
expected to remain at the forefront of blocking the as-yet-unseen spam 4.0. Rather than
attempting to evaluate the content of messages, the Commtouch Detection Center analyzes
vast volumes of Internet traffic in real-time. Commtouch Recurrent Pattern Detection
(RPD™) identifies and blocks new spam and malware outbreaks based on their most
fundamental characteristic – their mass distribution across the Internet. RPD is not belabored
by analyzing the content of each and every message, nor is it fooled by clever spam
techniques.
Commtouch’s real-time analysis of over 95% of global email traffic:
• Identifies 500,000 new zombies daily
• Tracks traffic from over 100 million IP addresses
• Classifies billions of messages per week, in real-time
The war against spam has reached a new level; and only the fittest real-time network based
outbreak defense solution will thrive.
Spam 3.0 Page 5
www.commtouch.com
About Commtouch
Commtouch Software Ltd. (NASDAQ: CTCH) is dedicated to protecting and preserving the
integrity of the world's most important communications tool -- e-mail. Commtouch has over
16 years of experience developing messaging software and is a global developer and
provider of proprietary anti-spam, Zero-Hour virus protection and IP Reputation solutions.
Using core technologies including RPD (Recurrent Pattern Detection™), the Commtouch
Detection Center analyzes billions of email messages per week to identify new spam and
malware outbreaks within minutes of their introduction into the Internet. Integrated by more
than 50 OEM partners, Commtouch technology protects thousands of organizations, with
hundreds of millions of users in over 100 countries. Commtouch is headquartered in
Netanya, Israel, and has a subsidiary in Mountain View, Calif. For more information, see:
www.commtouch.com. The site includes the Commtouch online lab detailing spam statistics
and charts.
Visit us at: www.commtouch.com
Contact us at: bizdev@commtouch.com
Call US: (650) 864-2273
Call Int’l: +972-9-863-8818
Spam 3.0 Page 6
www.commtouch.com