; Abstract
Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>



  • pg 1
									     Enhancing Email Addresses Privacy on Anti-SPAM
                                         Ying Chen, Dou Wang
                                       School of Computer Science
                                         University of Windsor

                                                        exchanging. Hundreds million of people are
Abstract:                                               using email as the primary communication
                                                        method in their work and life. Each email user
                                                        may have more than one email address to use for
SPAM has been becoming to be the most
                                                        communicate to different groups of his contacts.
challenge in the world of electronic mail
                                                        However, by the email function being used
exchanging. Unsolicited buck emails are resulted
                                                        widely, some advertisers use email platform as
the consumption of recipients’ time and work to
                                                        their media for advertising. The email which
distinguish them from legitimate email messages.
                                                        recipient does not expect to read is defined as
Bulk email also consumes resources of Mail
                                                        “Junk mail” or SPAM. Spammer is defined as
Transfer Agents (MTAs). SPAM can deliver
                                                        the person who sending those SPAM to enforce
with virus, spy-ware and/or ad-ware to cause the
                                                        recipients to read by using the no cost Internet
information on the computer damaged, leaking
                                                        resources. Once the SPAM message reaches
or unmanageable. SPAM can also contain
                                                        recipient’s mailbox, the spammer has done the
phishing content to break the users’ privacy.
                                                        successful job on sending unsolicited bulk email.
There have been many anti-spam solutions to
                                                        Because recipient has to spend time to read the
protect users and email systems from not
                                                        spam and furthermore to recognize it is a spam.
impacted by SPAM. Most of those solutions are
                                                        On this point of view, the recipient has forced to
concentrating on “How to fight the SPAM” and
                                                        read the advertisement. To block the spam
undertake the challenge of “False-Positive” and
                                                        messages, the anti-spam solutions have to stop
“False-Negative”. Each solution has its own
                                                        the spam before them reach the recipient’s
characteristic and benefits to either specific
email environment or general public email
systems. In this paper, we analysis the root-cause
                                                        Jupiter Research estimates the average e-mail
of why spammers are able to send SPAM
                                                        user will receive more than 3,900 spam mails per
successfully to fight with anti-spam solutions, in
                                                        year by 2007, up from just 40 in 1999, and Ferris
the latter, we propose a possible solution to make
                                                        Research estimates spam costs U.S. companies
the email addresses secure to address book
                                                        10 billion in 2003 and a user spends on the
seekers, therefore to reduce the chance of email
                                                        average 4 seconds to process a SPAM mail. [1]
addresses in an organization being scanned.
There also is a way to encrypt the recipients and
                                                        The most popular solution in the real world is
senders email addresses during the delivery to
                                                        filtering. Filtering mechanism includes origin-
reduce the chance of hijacking the email
                                                        based filtering, content-based filtering and
addresses while routing.
                                                        traffic-based filtering [2]. Because of the rapid
                                                        changes on the characteristics of spam, to update
                                                        knowledge base, patterns or algorithms to reduce
Keywords                                                “False Positive” and “False Negative” is the
                                                        major challenge of filtering engine. There are
SPAM, anti-spam, directory, encoding, graphical         also some other mechanisms to fight to spam,
pictures, SMTP socket, MessageID                        such as using token to enforce email addresses
                                                        privacy, Completely Automatic Public Turing
                                                        Test to tell Computer and Humans Apart
I. Introduction                                         (CAPTCHA) [3], etc.

Since 1990s, electronic mail has been becoming          In this paper, we present a method of combining
more and more popular method of information             imaging email addresses in the directory,
examining spoofed email addresses and encoding              recipients with same content from
addresses in the email messages while delivering.           different relay hosts. Traffic-based filter
Imaging email addresses in the directory                    rates the message by the analysis of
prevents email harvesters scan out the email                network traffic, which related to it.
addresses from the public pressing on the
Internet. Encoding email addresses in the email             Filter engine generally uses all three
messages while delivering keeps email addresses             method to rate the message more
of both the sender and the recipient secure to              accurate to reduce the “False Positive”
prevent hijacking during the transferring and               and “False Negative”.
delivering. Those approaches stops most of bulk
spam before consuming network resources and                 The rated messages on the spam
keeps owners of email addresses privacy the best.           probability then are taken on further
                                                            actions, such as delete, quarantine or
This paper organizes as follows. In Section II              mark as legitimate email to route to
classify some of the related works and briefly              recipient’s mailbox.
discuss the advantages and disadvantages of
those classified solutions. In Section III illustrate       The most popular algorithms are Bayes
our proposing. The Section IV explains the                  and Assassin. Figure 1 is one example
advantages of our method. The conclusion is in              of implemented spam filter.
the Section V.

II. Related Works
Currently, there are three major logical elements
to implement anti-spam mechanisms to block the
spam effectively, Mail Transfer Agent (MTA),
Mail Delivery Agent (MDA) and Mail User
Agent (MUA). Existing anti-spam solutions
categorized to A. Filtering; B. Policy-control; C.
Human-interactive; D. Address-hiding.
                                                            Figure 1- A user interface of sample of SOPHOS
                                                            spam filter about quarantine and whitelist and
    A. Filtering.                                           blacklist.
       Filtering is the most common used anti-
       spam solution. Filtering mainly focuses          B. Policy-Control
       on pattern of header of spam email,                 Non-technical policy restriction. More
       content meaning of email body and                   and     more     governments    defined
       characteristic of spam email traffic.               regulations and acts to restrict
                                                           spammers performing the spam
         Origin-based filter checks sender                 spreading. Certain punishment by law
         information with certain keywords,                will be given to the spammers.
         string styles and compares with
         recipients’ whitelist and blacklist.               Technical policy restriction. SMTP is
                                                            quiet open protocol for email
         Content-based filter analyzes the body             transferring on the Internet. Change the
         content of the email message by                    protocol regulation can restrict spam
         complex algorithms and maintain the                message delivery for the technology the
         knowledge base to realize self-learning.           spammers currently are using.

         Traffic-based filter examines the              C. Human-Interactive
         network traffic on the email server and           Most commonly, spammers do not have
         gather the server logging information to          enough time send bulk email such as
         determine the spam probability. Spam              more than 500,000 spam email
         message usually have big amount of                messages and cannot manually spoof
         individual email message send to bulk             partial of the header information to have
    those messages passing the anti-spam                Internet web pages). Thus, in this
    filters.   Furthermore,     to    protect           format of email address, the scanning
    themselves and make their spam spread               program has difficulty to recognize the
    more successfully, they are using                   email address from natural human
    thousands of relay hosts to deliver more            languages, but people who need to
    than hundred thousand individual email              understand the email address are easily
    messages to make anti-spam solution                 to pronounce out to understand the
    getting hard to stop or trace the                   transformation.
    messages. Instead, the spammers send
    spam message by computer program or         All of above methods are preventing spam to be
    scheduled agents resist on some relay       delivered to recipients mailboxes. However, any
    hosts with weak security protection.        of those methods is not precise to stop the spam.
    Based on this characteristic, some          In another word, the “False-Positive” and “False-
    researchers designed another method to      Negative” are always the challenges of those
    reduce the spam which to be transferred.    solutions. Some of those methods are playing a
    The particular solution is Completely       “game” with spammers; passively update the
    Automatic Public Turing Test to Tell        rules based on the change of spam. To against
    Computer       and    Humans       Apart    the method of Address-Hiding, spammers are
    (CAPTCHA) [3]. This method triggered        easy to change the search criteria to recognize
    by the recipient MTA to send a              the email addresses.
    verification string back to sender to
    verify the sender is a real human. The      III Our Proposed Method
    main idea is to increase the spam
    sending cost to make the unsolicited
    bulk email distributing impossible.         Our proposing concentrates on keeping email
    Figure 2 provides some example of           addresses as owners’ privacy from spam
    CAPTCHA.                                    harvesters.

                                                Based on this goal, we develop two methods: 1)
                                                make email addresses on the Internet be
                                                unsearchable by scanning program; 2) encrypt
                                                both sender and recipients email addresses to
                                                prevent hijacking during the transferring.

                                                    1) Make email addresses on the Internet be
                                                       unreachable by scanning program.
    Figure 2: Samples of CAPTCHA string                After the research on how the spammers
                                                       get bulk email addresses, we figured out
D. Address-hiding                                      that spammer usually A. buy the
   Address-hiding is the method which                  millions of email addresses from some
   focuses on privacy protection from                  organizations; B. scan the Internet
   spam harvesters. Some solution has                  addresses from some of Internet web
   implemented to hide the unique                      pages that for seeking email addresses.
   character in the email address, the                 This method focuses on the way of
   symbol @. By RFC821 and RFC2821                     againsting scanning web pages by email
   standard, SMTP protocol can only                    address collectors. Lots of organizations
   understand      email     address     as            allow people to perform directory
   userid@domain, furthermore, domain                  search to find the information of
   name include at least dot to present the            employee to increase their business
   organization domain. Thereafter, at                 opportunity or quality of services. The
   symbol and dot symbol become unique                 email addresses which are posting on
   characters in the email address. One of             the company web page or personal web
   implemented solution is transforming                page use clear text in HTML files. Thus
   email address to be like “userid at                 scanning program could easily find out
   domain dot com” in the public press                 some special characters which are the
   (such as searchable directories and                 unique characters in an email address.
   Such as username@company.com for a
   user in a commercial organization. Then       Spammers also can hijack the email
   at symbol and dot symbol become the           address during the email being
   keywords in the search engine and             transferred on the MTAs since the email
   scanning program can easily find out          header information which contains both
   millions of email addresses. To prevent       sender information and recipient
   this, organization web page or personal       information are as clear text and are
   web page can use graphic pictures to          readable by SMTP, so do spammers.
   present email addresses which can be
   only understandable by human, not for         To secure the email addresses in email
   scanning program. Therefore, those            header during the delivery. The best
   scanning program are hard to dig into         practice is encrypt the email address
   the binary files to scan the email            before transferring out of senders MTAs
   addresses, because the graphic pictures       and decrypted by recipients MTAs to be
   in the binary files are presented as a file   understandable by recipients MTAs. To
   link in the HTML documents. For               realize this function, we propose the
   example,                                      architecture     of   encryption    and
                                                 decryption on MTAs.
                                                 Each email message has a MessageID
   is a string of clear text characters in a     on each relay host. The MTA randomly
   HTML file showing the partial of the          generates a MessageID and assign it to
   source code that is presenting as below       the email which arrived to the MTA.
                                                 That means once the email message
   <br><div>                                     arrived the new relay host, the relay
   username@company.ca</div></br>                host assigns a new MessageID to the
                                                 email message.
   Instead, we make the graphical picture
   to display the email address                  Before sending out the email message
                                                 to destination MTA, sender relay host
                                                 sends a SMTP socket with MessageID
                                                 to recipient MTA to ask for returning a
                                                 key generated by using the MessageID
                                                 on the recipient server. Then sender
   The HTML source code of above                 MTA uses this key to encrypt all the
   graphical picture is:                         email addresses in the message (SendTo,
                                                 CopyTo and From, etc) to generate the
   <img                                          encrypted code for the part of email
   src="/imgs/emailaddress/username.gif"         address before “@” symbol. For
   />                                            example, after encoding, we may have
                                                 the    email    address    which has
   obviously, there is no at symbol and          transformed    to     be   similar   to
   associated dot symbol in this piece of        1Qerg4mF7@gmail.com.
   HTML code.
                                                 After the message arrived the recipient
   Thereafter, if all company web sites and      MTA, the host uses the original
   personal web pages are using this             MessageID to decrypt the email
   method to disply the email addresses,         addresses in the email message and
   those scanning program are unable to          assign a new MessageID for delivery.
   collect email addresses. Then we cut off
   the channel of getting email address by       The diagram illustrates the principle of
   email address collectors.                     the method.

2) Encrypt both sender and recipients
   email addresses to prevent hijacking
   during the transferring.
                                                              still much lower than deliver the whole
                                                              spam message.

                                                         4.   Easy to implement. The first approach
                                                              is obviously easy to implement,
                                                              organizations need to convert their
                                                              published email addresses to the
                                                              graphical pictures and post on the
                                                              Internet web sites. The second approach
                                                              is also comparable easy to implement,
                                                              because            the          existing
                                                              encryption/decryption algorithms are
                                                              easily be used for the encryption and
                                                              decryption. The socket technique is a
                                                              mature technology that is easily to
                                                              program on it.

                                                         5.   Gain the initiative in the anti-spam
                                                              combat. These approaches lead the
                                                              direction of anti-spam and regardless of
                                                              how spam changes. SPAM are keep
                                                              changing the characteristic rapidly to
         Figure 3: Diagram of encoding email addresses        avoid filtering out by the anti-spam
                                                              algorithms, our solution does not rely
Based on the independent methods above, we                    on the changes of SPAM.
could have both of them to be implemented to
protect email user’s privacy. Those methods              V Conclusion
reduce the conduit of the spam generation.
                                                         In this paper, we present two new
IV Advantages                                            approaches to enhance email addresses
                                                         privacy on anti-spam. Combining these two
    1.   Reduces the SPAM from the root.                 approaches, we have an effective and
         These approaches reduce the chance of           practical solution of anti-spam. The first
         generation of SPAM. Without bulk of             approach convert posted email addresses to
         email addresses, spammers lose the              graphical pictures rather than expose the
         targets of SPAM and getting harder to           character strings to prevent spammers use
         compose the SPAM.                               scanning program to search them out. The
                                                         second approach use an encryption method
    2.   Compatibility and maintain the standard.        to secure the email addresses to avoid
         These approaches do not change                  hijacking during the email transferring.
         existing protocols and keep all SMTP            Combining the both approaches, we can
         and ESMTP functionalities of the email          narrow the window of getting email
         transferring. Compatible with all               addresses by spammers.
         existing spam filters.
                                                         VI References
    3.   Low cost of network traffic. The first
         approach does not increase any network          [1] Ming-Wei Wu; Yennun Huang; Shyue-Kung Lu;
         traffic of email delivery; instead, it          Ing-Yi Chen; Sy-Yen Kuo, “A Multi-faceted approach
         increases the cost to scanning program          towards spam-resistible mail”, Dependable Computing,
                                                         2005. Proceeding, 11th Pacific Rim International
         of spammers. The second approach uses
                                                         Symposium, Page(s): 9 pp, Dec, 2005.
         socket to do the pre-communication for          [2] Yanhui Guo; Yaolong Zhang; Jianyi Liu; Cong
         the encryption, since there is only a           Wang, “Research on the Comprehensive Anti-Spam
         MessageID and session information in            Filter”, Industrial Informatics, 2006 IEEE International
                                                         Conference, Page(s) 1069-1074, Aug, 2006
         the socket, the cost of network traffic is
                                                         [3] Sajad Shirali-Shahreza; Ali Movaghar, “A New
                                                         Anti-Spam Protocol Using CAPTCHA”, Networking,
Sensing and Control, 2007 IEEE International
Conference, Page(s) 234-238, April, 2007

To top