spam.doc - CSIF

Document Sample
spam.doc - CSIF Powered By Docstoc
					       The Problem:

       One of the greatest threats to users of the internet today is spam. Everyone who has ever tried to

use an email account is aware of just how big the problem is. Personally my spam filter filters out

between 25 and 75 spam messages every day. AOL reports that it blocks roughly 1.5 billion email

messages a day to its users and that in the year 2003 they blocked no less than 556 billion messages. It

is impossible to estimate the number of spam messages sent world wide but many reports seem to agree

spam makes up a majority of all email sent. The cost to American businesses has been reported by

Ferris Research to be in the realm of 8.9 billion dollars a year mostly in man hours lost dealing with

spam both as end users and in IT costs of dealing with the unnecessary volume of email that spam


       The problem is that spam is incredibly inexpensive to generate costing the spammers

substantially less than the end users who are forced to deal with it. Some of the most effective methods

for combating spam that have been developed for use by email servers focuses on increasing the

amount of computational time required by the spam senders for each spam message that they send.

This works because legitimate users only need to send a few messages at a time so a little extra

computational time is not problem but for a spammer sending out millions of messages a day it can be

a huge problem. However there is more to spamming than just sending out messages, They need to

have address to send them to and this is the part of the problem on which our product focuses.

Our Solution:

       In order to harvest the email address of their victims spammers operate what are called “Spam

Bots.” A spam bot is a peice of software that searches the internet and spiders (followes every link)

every page it finds looking to harvest valid email addresses. There are many methods that people use

in order to protect the confidentiality of their email address. A popular one is to alter the email
addresses so that they are not machine readable but a human can figure them out. For instance becomes iatucker At You See davis dot edu. This is likely to fool a spam bot but

also reduces the usability or as we like to call it in security the availability of the web pages. In the first

example a user could just click on the email address and send an email but in the second example in the

best case a users would have to type it out and in the worst case the user would be confused and unable

to communicate with iatucker. Our Solution to this problem solves the problem in a novel manner.

       Our Solution involves altering the text of the web page that is sent out by the http server

obfuscating the content of the email addresses to protect the confidentiality of the users from the spam

bots. Then so that the email address are also not hidden from the acctual users we include with the page

a peice of java script that goes through after the page has been downloaded to the browser and restores

the original content of the message so that users who unlike spam bots are likely to be running java

scripts can still use the page like they expect to be able to but spam bots are unable to find any usable

addresses. If the spammers so desired they could enable java script for their bots but just like with the

mail servers a legitimate users only needs to visit a few pages so the computational time needed to run

the java script is negligible but a spam bot which needs to view millions of pages at a time would grind

to a halt. Thus both confidentiality and Availability are protected for the legitimate users.

The Details:

       What our product actually does is take the content of a web page and replaces the parts that we

want protected, in this case important elements include the content of “href” tags especially “mailto:”

although anything the users chooses can be protected, with an obfuscated version of that content. In the

case of this example a simple conversion to ascii is performed but the modularity of the design allows

for actual encryption schemes to be used in the future if necessary.

       When the http client downloads the page the protected data is not human nor spam bot readable.

The second part is a java script that is downloaded and run by the http client that goes through the page
and restores the content to its original readable and clickable form. Thus the heavy java script enabled

legitimate clients are able to view the page in its unprotected form but the light weight non javascript

enabled spam bots are unable to gleen any useable data from the page. In effect we are using the

inefficiency of java script as a feature rather than a bug. Once again due the the modularity of the

system if just running java script proves to be to little work to be effective the client could be forced to

perform some useless computation or some sort of very inefficient decryption as part of the process.

       From a purely security standpoint it is by no means a good means of protecting the

confidentiality of the users. Anyone could just view the page normally and get the email addresses.

Even if an actual encryption method was used the key to decrypt the content would also have to sent to

the end user. However this product would be very likely to discourage most spammers from trying to

attack protected pages and to instead move on to greener pastures.

For The Future:

       There are many ways in which this product could be improved upon and it has been made as

modularly as possible to facilitate such improvements. Currently the publisher must identify sections

of his page that he wants to protect and use a javascript to generate the obfuscated content. Both parts

of this could very easily be done on the server side by some sort of pre processor such as php however

for the scope of this project we were unsure what the grader would have avalible to use and so we

decided to keep everything within the browser. Also the code that obfuscates the protected content

does a very simple replacement with the ascii values of the text. This could could be replaced by some

sort of simple encryption scheme. Also on the decryption side the client could be forced to do a lot

more work than it currently is. Since this solution works on the principle of making the viewing of the

page not impossible just computationally expensive it would make sense to have the client do some sort

of useless computation in order to ensure that only legitimate users will want to spend the cpu time

necessary to view the page. The two improvements could easily be done simultaneously by using a
very processor intensive decryption scheme.

       Spam while difficult if not impossible to stop can be made much less profitable for the

spammers which is good for the rest of us. Our method used in conjunction with similar methods on the

server side could reduce the amount of spam we all have to deal with.

Shared By: