One of the greatest threats to users of the internet today is spam. Everyone who has ever tried to
use an email account is aware of just how big the problem is. Personally my spam filter filters out
between 25 and 75 spam messages every day. AOL reports that it blocks roughly 1.5 billion email
messages a day to its users and that in the year 2003 they blocked no less than 556 billion messages. It
is impossible to estimate the number of spam messages sent world wide but many reports seem to agree
spam makes up a majority of all email sent. The cost to American businesses has been reported by
Ferris Research to be in the realm of 8.9 billion dollars a year mostly in man hours lost dealing with
spam both as end users and in IT costs of dealing with the unnecessary volume of email that spam
The problem is that spam is incredibly inexpensive to generate costing the spammers
substantially less than the end users who are forced to deal with it. Some of the most effective methods
for combating spam that have been developed for use by email servers focuses on increasing the
amount of computational time required by the spam senders for each spam message that they send.
This works because legitimate users only need to send a few messages at a time so a little extra
computational time is not problem but for a spammer sending out millions of messages a day it can be
a huge problem. However there is more to spamming than just sending out messages, They need to
have address to send them to and this is the part of the problem on which our product focuses.
In order to harvest the email address of their victims spammers operate what are called “Spam
Bots.” A spam bot is a peice of software that searches the internet and spiders (followes every link)
every page it finds looking to harvest valid email addresses. There are many methods that people use
in order to protect the confidentiality of their email address. A popular one is to alter the email
addresses so that they are not machine readable but a human can figure them out. For instance
firstname.lastname@example.org becomes iatucker At You See davis dot edu. This is likely to fool a spam bot but
also reduces the usability or as we like to call it in security the availability of the web pages. In the first
example a user could just click on the email address and send an email but in the second example in the
best case a users would have to type it out and in the worst case the user would be confused and unable
to communicate with iatucker. Our Solution to this problem solves the problem in a novel manner.
Our Solution involves altering the text of the web page that is sent out by the http server
obfuscating the content of the email addresses to protect the confidentiality of the users from the spam
bots. Then so that the email address are also not hidden from the acctual users we include with the page
a peice of java script that goes through after the page has been downloaded to the browser and restores
the original content of the message so that users who unlike spam bots are likely to be running java
scripts can still use the page like they expect to be able to but spam bots are unable to find any usable
addresses. If the spammers so desired they could enable java script for their bots but just like with the
mail servers a legitimate users only needs to visit a few pages so the computational time needed to run
the java script is negligible but a spam bot which needs to view millions of pages at a time would grind
to a halt. Thus both confidentiality and Availability are protected for the legitimate users.
What our product actually does is take the content of a web page and replaces the parts that we
want protected, in this case important elements include the content of “href” tags especially “mailto:”
although anything the users chooses can be protected, with an obfuscated version of that content. In the
case of this example a simple conversion to ascii is performed but the modularity of the design allows
for actual encryption schemes to be used in the future if necessary.
When the http client downloads the page the protected data is not human nor spam bot readable.
The second part is a java script that is downloaded and run by the http client that goes through the page
and restores the content to its original readable and clickable form. Thus the heavy java script enabled
enabled spam bots are unable to gleen any useable data from the page. In effect we are using the
inefficiency of java script as a feature rather than a bug. Once again due the the modularity of the
system if just running java script proves to be to little work to be effective the client could be forced to
perform some useless computation or some sort of very inefficient decryption as part of the process.
From a purely security standpoint it is by no means a good means of protecting the
confidentiality of the users. Anyone could just view the page normally and get the email addresses.
Even if an actual encryption method was used the key to decrypt the content would also have to sent to
the end user. However this product would be very likely to discourage most spammers from trying to
attack protected pages and to instead move on to greener pastures.
For The Future:
There are many ways in which this product could be improved upon and it has been made as
modularly as possible to facilitate such improvements. Currently the publisher must identify sections
of this could very easily be done on the server side by some sort of pre processor such as php however
for the scope of this project we were unsure what the grader would have avalible to use and so we
decided to keep everything within the browser. Also the code that obfuscates the protected content
does a very simple replacement with the ascii values of the text. This could could be replaced by some
sort of simple encryption scheme. Also on the decryption side the client could be forced to do a lot
more work than it currently is. Since this solution works on the principle of making the viewing of the
page not impossible just computationally expensive it would make sense to have the client do some sort
of useless computation in order to ensure that only legitimate users will want to spend the cpu time
necessary to view the page. The two improvements could easily be done simultaneously by using a
very processor intensive decryption scheme.
Spam while difficult if not impossible to stop can be made much less profitable for the
spammers which is good for the rest of us. Our method used in conjunction with similar methods on the
server side could reduce the amount of spam we all have to deal with.