Luís Pedro    

Manuel Cabral    

        Seminar task of Internet Search Techniques and Business Intelligence, October 2007

                                                          search can make a big difference in the number of
Abstract                                                  clients that reach their website before those of their
Search engines play a major role in Internet today. In    competitors. This has lead to some companies using
fact, many companies optimize their websites in           black hat optimization techniques, which are very
order to get high ranks on them. Here we present          high-risk, since they are forbidden by search engines
some of the techniques used to achieve this and           and websites using them which are detected by the
examples of websites who used them. However,              search engines are usually punished by being placed
some of them are forbidden by search engines and          lower in the ranking.
websites found to be using them may be punished.
The same techniques may be used to create a Google
bomb, boosting some website’s ranking in search           Search Engine Optimization Techniques
engines. Finally, we present the search engines’          The techniques used for SEO can be broadly
reaction to this.                                         classified into two different categories: those that
                                                          are in accordance with search engines’ guidelines for
                                                          good design (White Hat SEO) and those that search
Introduction                                              engines do not approve (Black Hat SEO).
Search engine optimization is the process of
                                                          White hat techniques focus on providing good
improving the volume and quality of traffic in a
                                                          content for the users and helping to make sure that
website by increasing how high it ranks in searches
                                                          content can be accessed by the search engines’
made in search engines. There are several
                                                          crawlers. On the other hand, black hat SEO tries to
techniques that may be used to achieve this, which
                                                          take advantage of flaws in the search engines’
can be roughly separated into two categories: white
                                                          algorithms in order to get a higher rank on searches,
hat, which are based on improving the website’s
                                                          or deceives the search engine into believing that the
content and making it easy for search engines to find
                                                          content provided by that page is different than the
it; and black hat, which try to exploit the search
                                                          one it actually has.
engines’ algorithms.
                                                          However, some techniques fall into a “gray area”
Google bombing is an attempt to improve the
                                                          where it’s hard to tell whether they are approved by
ranking of a website in a search engine such as
                                                          search engines or not, especially since search
Google. The methods used may either be white hat
                                                          engines sometimes don’t have a set of clear rules for
(such as getting bloggers to place a certain link on
                                                          webmasters to follow, but rather guidelines. It can
their website), black hat (such as using link farms)
                                                          be said that in order for a technique to be considered
or a combination of both. In the particular case of
                                                          completely white hat, it must not only follow the
search engine contests, several webmasters compete
                                                          search engines’ guidelines, but also ensure that the
to get their website as high ranking as possible for a
                                                          content provided to the search engine is identical to
certain search string, usually using both white and
                                                          the one provided to the user.
black hat techniques.
                                                          White hat
It is important for companies to do search engine
                                                          Links are the most important part of search engine
optimization, since ranking higher in a certain
                                                          optimization. If a website has a lot of good links
pointing to it, it will rank high on searches. “Good      Invisible page elements – there are several ways to
links” means links from reputable and popular             add content to a page that is invisible to visitors but
websites, preferably in the same area as that             not to search engines. Some examples are: text with
website’s, with the link text containing keywords         the same or similar color as the background; text
related to the linked page.                               contained in “no frame” sections in the HTML code;
                                                          transparent and/or very small images; using CSS to
There are several strategies for obtaining links to a     place an image or some other element over the
webpage, some of them being questionably ethic and        element that you want to hide. All these techniques
some others simply black hat. White hat techniques        may deceive the search engine into believing that the
for this include listing your website in web              page contains a different content than it actually
directories like Open Directory; asking your              does and they may also be used to do keyword
suppliers and costumers for links to your website;        spamming. Another purpose may be to place links,
writing articles for other websites in exchange for a     which visitors don’t see, to other webpages in order
link or requesting that your employers link to you on     to increase their popularity.
their blogs or personal homepages, although the
latter may be considered less ethical.                    Doorway and redirect pages – these are low-
                                                          quality web pages that contain very little content but
It is also important for the website to have a good       are instead stuffed with very similar key words and
internal linking strategy. Links inside the website       phrases. They are designed to rank highly within the
should contain relevant keywords in the link’s            search results, but serve no purpose to visitors
anchor text.                                              looking for information. A doorway page will
                                                          generally have "click here to enter" in the middle of
But the most obvious way to get links is to provide
                                                          it, while a redirect page will automatically redirect
unique and good content, since that will make other
                                                          the user.
people want to link to you. The content itself may be
“optimized” by creating pages related to content and      Cloaking – serving a different page to normal
keywords that are searched for a lot but for which        visitors and to search engine crawlers, based on the
there is not much hard competition by pages from          IP address or user-agent of the visitor. More
other popular websites. For example, if using a           advanced techniques analyze the behavior of the
keyword analysis tool, a website about cars               visitor to decide which pages to present. This may be
discovers that there have been many searches about        used to deliver an optimized page to search engines
“Audi BMW comparison” and not many of the other           or to deceive them into believing your page has
popular websites have anything about that topic,          different content than it does. It has been used by
they may ask one of their writers to write an article     some legitimate companies in order to rank higher
on that, in order to rank high for those searches.        in searches, but also by webmasters who want their
                                                          page to be indexed with the wrong description of its
It is also important to have good titles for the pages
                                                          content, to persuade search engine users to visit the
and a good layout, which ensures the search engines’
crawlers can easily discover what the pages are
about and index them correctly.                           ”Bait and Switch” – this technique consists of
                                                          creating a highly optimized page for search engines
Black Hat
                                                          and changing it after it’s indexed. This only works if
Here some black hat SEO techniques are described in
                                                          the search engine crawler takes a lot of time
                                                          between visits to the page, since the correct version
                                                          will be indexed as soon as the next visit occurs.
Keyword Spamming –one technique used by some
websites is to have pages or sections of pages with
                                                          Several Entrances – creating several domains that
almost no content and consisting only of keywords.        lead to the same online store, each one optimized for
This is done to raise the keyword count and density
                                                          certain content and all interconnected to create
and get a higher ranking on searches for those
                                                          valuable links from related websites. However, there
keywords. However, search engines nowadays can
                                                          is the danger of creating an “island” that might be
detect this most of the times and it is less effective.
                                                          detected by search engines. Also, the websites must
not be copies of each other, or else they will easily be   therefore make a copy of some website and use
detected by search engines.                                techniques like cloaking to increase its page rank to
                                                           a point where it’s higher than that of the original
Link Farming – like there are white hat ways to            page, making it so that the original page is the one to
generate links to your pages, there are also black hat     be penalized.
techniques to do this. One of them consists in
posting links to a website in blogs, forums and wikis,
being the latter especially vulnerable, because the        Search Engine Optimization contests
changes may not be noticed for a long time. This is        In SEO contests a search string is selected (usually
especially problematic because there is software           with 2 or more words) and contestants must get
that can automatically crawl the web and do this sort      their page as high ranking as possible for that search
of spamming. Another method is creating other              string in a certain search engine. Besides being an
webpages with little or no content for the sole            opportunity to showcase new techniques for SEO,
purpose of linking to your page. Again, there is           one of the goals of these contests is to better
software that can automatically generate such pages.       understand the inner workings of search engines.
It is frequent that these websites then use cloaking       While some webmasters use white-hat techniques,
to prevent being detected by search engines.               like providing good content, many others resort to
                                                           spamming and other black-hat techniques, which
                                                           make these contests a good way to find out more
                                                           about how vulnerable search engines are to this.

                                                           Usually the search string consists of a non-existent
                                                           phrase of silly words. This prevents existing
                                                           websites from getting a “head start” and normal
                                                           internet searchers from getting bombarded by
                                                           irrelevant results on their “normal” searches.

                                                           Nigritude Ultramarine Contest
                                                           The contest was announced on May 7th 2004 and
                                                           two prizes were awarded for the top position in a
                                                           Google search for the string “nigritude ultramarine”:
Example of an automatically generated page for the
keywords “Travel Insurance UK”                             one on 9am GMT on June 7th, 2004, and another at
                                                           the close of the contest on 9am GMT July 7th, 2004. A
Buying expired domains – by monitoring DNS                 search in Google for this string before the contest
records for domains that will expire soon, it is           showed no results.
possible to buy them when they expire and put there
pages with links to one’s website, in order to             The rules of the contest were very liberal, which
increase its ranking.                                      encouraged many of the over 200 contestants to use
                                                           black-hat and aggressive techniques to boost their
Destructive methods – another way of improving             pages’ ranking. A lot of weblogs and public wikis
your rank in a search is to decrease the ranking of        were hit by the contest as contestants created links
the pages with a better rank than you, which can be        to their websites from them.
done by tricking the search engine into punishing
those pages. This can be achieved for example, by          June 7th Results
posting a comment with a lot of keywords in a blog,        The table below summarizes the top 10 results on
which the search engine might detect as an attempt         June 7th. The meaning of the columns is as follows:
by the blog owner to do keyword spamming.
                                                           Pagerank – the page’s Google pagerank. It’s
Another method takes advantage of the fact that
                                                           rounded      to      the        nearest      integer
search engines, such as Google, penalize websites
                                                           ODP? – Whether the page has or not a link from the
with the same content, and since the “thief” is
                                                           Open Directory Project, a large directory which lists
usually the website with the lower page rank, that is
                                                           a           lot              of            websites.
the website that gets penalized. An attacker can
Backlinks – the number of pages that link to this            All links – total number of pages that link to this
website displayed when doing a google “link:”                website, obtained by a google “@:” search
search. Google only displays a subset of all backlinks       Pages – the number of pages on the website
on                    this                     search.
# contestant PR ODP? Backlinks All                       Pages     Techniques used
1    merkey         6    Yes      11            2,870    2,770    Google bombing, Keyword spamming, high
                                                                  page count. website has real content mixed
                                                                  with other, unrelated content.
2   Phillip       6    Yes      663          10,100      5,050    Google bombing, wiki sandbox spamming,
                                                                  high page count. This website was a real blog
                                                                  with real content mostly unrelated to the
                                                                  search phrase.
3   T.J.          5    No       96           1,590       114      Keyword spamming and page count. No real
4   merkey        0    No       40           211         23       Keyword spamming (also acts as link farm
                                                                  supporting website #1). No real content.
5   JohnScott     0    No       1660         176,000 82,500 Google bombing, link farms, more google
                                                                  bombing, and page count. Real content - a
                                                                  discussion forum about the contest.
6   rubenxela     0    No       28           285         7        Google bombing. No real content.
7   NC            0    No       0            344         198      -
8   NC            0    No       2            1,210       260      -
9   mrunderhill 0      No       2            112         3        Google bombing, keyword spamming. No real
10 NC             0    No       0            76          6        -
NC are websites which were not contestants, since they did not display the required contest banner.

Almost none of the top 10 websites on June 7th               refused all requests for listing websites related to
displayed any real content related to the contest,           the contest, but those two websites were listed there
being instead just packed with keywords. Their high          before the contest for other reasons.
ranking can be explained by their use of black hat
techniques, like link farming, keyword spamming              A lot of the websites show a large number of
and wiki spamming.                                           inbound links, which contribute to their high
                                                             ranking. Many also have a high number of pages,
The first 2 positions belonged to 2 of the 3 websites        since Google shows some bias for large websites and
which had a link in Open Directory, showing how              so contestants tried to create websites containing a
important “good” links can be. Open Directory                large number of pages.

July 7th Results
#     contestant    PR   ODP?     Backlinks    All       Pages    Techniques used
1    anildash       3    Yes      693          2620      2570     Blog-based Google-bombing, page count
                                                                  inflation. website is a real blog with content
                                                                  mostly unrelated to the contest.
2    T.J.           4    Two      2            52        1        Mirror for T.J.'s website. No real
                                                                  content, just a meta-tag redirect. High
                                                                  placement is probably due to multiple ODP
3    T.J.           6    No       538          304       124      Google bombing, keyword spamming and
                                                                  mirroring. No real content; mostly meaningless
                                                                  text interspersed with keywords.
4    philipp        6    Yes      1320         1720      5030     Google bombing, wiki sandbox spamming, page
                                                                  count inflation. website is a real blog with real
                                                                  content mostly unrelated to the search phrase.
5    JohnScott      7    No       7930         120000    83700    Google bombing, link farms, attempted
                                                                mirroring via redirects, and page count
                                                                inflation. website is a real discussion forum
                                                                with real content about the contest.
6    srainwater    6    No       68          175       6        Content as SEO. Real content consisting of
                                                                frequently asked questions about the contest.
7    rubenxela     6    No       299         380       12       Google bombing. No real content.
8    merkey        6    Yes      246         4290      2770     Google bombing, Keyword spamming, high
                                                                page count. website appears to be a message
                                                                forum consisting of real content about the
                                                                contest mixed with other, unrelated content.
9    T2DMan        6    No       701         1360      409      Google bombing. website is a real blog with
                                                                content related to the contest. This website was
                                                                a victim of a black-hat SEO cloaking attack
                                                                during the contest.
10   n.u.seo       7    No       7080        9960      12800    Google bombing, keyword spamming. No real                                                 content

These results show some improvement over those of          internet nowadays. One of the reasons is the fact
June 7th, since more of them now have real content.        that it is not difficult to start a Google Bomb, since it
This hints that it takes some time for the Google          only uses basic SEO techniques. We can consider
algorithm to work. Probably two or three more              Google bombing more as a social phenomenon that
months would be required for the results to be             started as a “prank” and evolved to a weapon to
optimal.                                                   manifest political and religious statements. It is
                                                           wrong to think that Google Bombing is used only for
The first website seems to have won with simple            this kind of subjects, since many companies also use
blog-based Google bombing, which is having a lot of        this technique to manipulate the opinion of what
people with blogs linking to it, and no black-hat          people say about their company and to try to get
techniques. Also, the Open Directory link must also        their business’ web site as the top page on the
have helped it to reach #1. The same goes for              results of the search engines.
website #2, which does not seem to have many
reasons to be placed there except for two Open             Usually a bomb starts when someone decides to
Directory links.                                           create a link with a text that is not related to the
                                                           page that is being linked, and other people decide to
It is known that Google generally tries to find and        put that same link in their webpage influencing the
penalize websites that use black hat techniques, but       page rank algorithm of the search engines. There is
the company has refused to comment on whether              no need to be a SEO expert since the only technique
they had treated this contest in any special way. In       used is to spread as many links as possible through
addition to black hat optimization techniques being        the web. People put the links in their webpages
used by many competitors, many white hat websites          generally because the text associated with the link is
were hit by attacks, as black hat competitors filed        funny and amuses the internet users.
false spam reports on Google against them. The #9
competitor was also victim of a destructive method,        The term is called Google Bombing because Google is
with a black hat contestant copying his website and        the most influential Search Engine out there, but this
using other black hat methods in order to improve          doesn´t mean that only Google is affected by these
the copy’s pagerank, thus causing the original             page rank manipulations, and in fact most of the
website to be punished. This caused the website to         Google bombs also affect Yahoo and MSN searches
drop from the #3 position to #20, showing the              because they also use page rank algorithms in some
damage that can be done by these kinds of attacks.         aspects similar to the one used by Google.

                                                           The first known Google Bomb is the one that linked
Google Bombing                                             to site of Microsoft with the text “more evil than
Google Bombing may be considered one of the most           Satan himself”. This is clearly a good example of
effective ways to influence the opinion on the             what may be considered a “just for fun” Google
bomb in a subject that is very popular among the           hat methods, many of theirs clients’ webpages are
internet users. The most famous Google bomb is the         punished heavily by the search engines.
one that involved the President of the USA. Users
linked the biography of George W. Bush with the            BMW
words “miserable failure” and as a result the top                     e
                                                           One of the most famous examples is BMW, the
result for the search with this string was his             German vehicle company, whose site was blacklisted
biography. Since this bomb the world has been              and removed from the ranking for a short time for
paying more attention to these kinds of phenomenon         using black hat techniques. The site was included
and similar bombs surged, such as the bomb that            again in the listings after the pages were changed.
linked the word “liar” to the Tony Blair biography.
These bombs are clearly a frontier of what may be
considered fun and political opinion, and were only
used to express the global opinion about the
behaviors of those politicians. Another kind of
Google bombing is associated with religious
organizations such as scientology, which li linked the
search term “scientology” to a website that contains
information that is mostly critical about this religion.

Bombing is also used to get a commercial advantage
over the competitors. One good example is the one
that involves the company Quixtar, who has been
accused by its critics of using a large network of
websites to move websites critical of company to
lower positions in the search rank. As a result, an
internet blogger exposed this case on his own blog
and encouraged the users to help Google bombing
the company. (nao sei se devemos meter a imagem).

One important thing to notice is that generally these
kinds of bombs only work with keywords that are
not so relevant, which makes them more likely to be
successful. The majority of the examples giv don’t
work nowadays, because Google as been paying
more attention to this kind of practice and their
algorithm has evolved to try to prevent these types
of artificial page rank manipulation.

Most companies don’t do their own SEO and rely o on
external companies to do it. However, many of these
                                                           BMW’s site used cloaking to deceive the search
companies use questionable techniques which
                                                           engines’ crawlers, providing them with a page full of
sometimes     go     against    search     engines’
                                                           relevant keywords or, in other words keyword
                                                           spamming. On the other hand, when the website was
Sometimes, Google and other search engines may                                  browser,
                                                           visited by a user´s browser a JavaScript redirect
“blacklist” websites that they find infringing of their    would be immediately triggered and a more
guidelines, causing them to move down in ranking or
 uidelines,                                                      friendly
                                                           user-friendly page that didn´t contain as much
even being removed entirely from the listings. So,         keywords as the previous one.
when a SEO company is discovered to be using black
Yahoo Autos                                               Google’s first reaction to Google Bombing when the
                                                          problem was exposed was this public declaration in
                                                          their official blog:

                                                          “We don't condone the practice of googlebombing, or
                                                          any other action that seeks to affect the integrity of
                                                          our search results, but we're also reluctant to alter
                                                          our results by hand in order to prevent such items
                                                          from showing up. Pranks like this may be distracting
                                                          to some, but they don't affect the overall quality of our
                                                          search service, whose objectivity, as always, remains
                                                          the core of our mission.”

                                                          But after some complains that their policy on this
                                                          subject indicated that Google shared the same view
                                                          of the topics that were target of bombing, Google felt
                                                          the necessity to make a public announcement about
                                                          changes to the algorithm:

                                                          “By improving our analysis of the link structure of the
                                                          web, Google has begun minimizing the impact of
                                                          many Googlebombs. Now we will typically return
                                                          commentary, discussions, and articles about the
                                                          Googlebombs instead.”

                                                          There is no clear technical information about the
                                                          changes that were made and how they can minimize
                                                          the effects of Google Bombing, but in their Google
                                                          Answers web forum it is stated that to stop Google
                                                          bombing, Google now seems to compare the link’s
                                                          text with the linked website’s text. If the link’s text
                                                          doesn't appear in the linked site, then the link is
                                                          ignored or degraded.

Yahoo Auto also used cloaking to present a slightly
different content to user and crawlers. The content       In what relates to SEO, Google has very strict policies
provided to the crawlers was more keyword-rich            about what are “fair” and “dirty” techniques and is
than the one for normal visitors. However, this page      not reluctant to delete a page from their index (such
was quickly fixed and it was not subject to               as the BMW example) that tries to manipulate their
blacklisting. It is interesting noting that Yahoo also    page rank by using of Black hat techniques. Besides
owns a search engine that forbids these techniques.       having guidelines in their webmaster help center,
                                                          there is also an explicit list of practices that will lead
                                                          to exclusion from the Google Index. These are the
Google’s Reactions                                        general guidelines that, among others, Google
Google bombing may seem like an inoffensive               recommends following:
practice that doesn’t play a big role in the internet
world, but the truth is that it costs Google (and other       •    Make pages for users, not for search engines.
companies) money and skews their search results,                   Don't deceive your users or present different
influencing the overall quality of their search engine             content to search engines than you display to
and in some way the general opinion that people                    users, which is commonly referred to as
have about the company.                                            "cloaking."
    •   Avoid tricks intended to improve search           Conclusions
        engine rankings. A good rule of thumb is          SEO is a very important business nowadays and the
        whether you'd feel comfortable explaining         competition for certain keywords and search strings
        what you've done to a website that competes       is fierce. Because of that, many companies resort to
        with you. Another useful test is to ask, "Does    using black hat techniques, which are not approved
        this help my users? Would I do this if search     by search engines, and one could argue that give an
        engines didn't exist?"                            unfair advantage over competitors who do not use
    •   Don't participate in link schemes designed to     them. However, using them is dangerous, for it can
        increase your site's ranking or PageRank. In      result in a webpage being completely removed from
        particular, avoid links to web spammers or        the search engines’ rankings.
        "bad neighborhoods" on the web, as your own
        ranking may be affected adversely by those        Google bombing is an effective way of manipulating
        links.                                            the search results on the internet and the general
    •   Don't use unauthorized computer programs          opinion about a subject. The proof that the world has
        to submit pages, check rankings, etc. Such        been paying attention to this subject is the fact that
        programs consume computing resources and          the term Google bombing now appears on the
        violate our Terms of Service. Google does not     Oxford dictionary. One can use this technique to
        recommend the use of products such as             make pranks, but the most visible cases are related
        WebPosition Gold™ that send automatic or          with hot topics like politics and religion. Not only
        programmatic queries to Google.                   can Google Bombing be used for this, but also to
                                                          obtain commercial advantage by getting the people
And those are the specific techniques that sites must     to believe that your competitors have a worse
avoid to prevent being excluded:                          service than your company. This practice not only
                                                          influences the opinion of Internet users but also the
    •   Avoid hidden text or hidden links.                overall quality of a search engine, and Google has
    •   Don't use cloaking or sneaky redirects.           been forced to change their algorithms due to
    •   Don't send automated queries to Google.           external pressures. However, Google rarely
    •   Don't load pages with irrelevant keywords.        manually changes rankings in order to “deactivate” a
    •   Don't create multiple pages, subdomains, or       Google bomb, probably because the techniques used
        domains with substantially duplicate content.     are mostly white-hat. One of the most widely used
    •   Don't create pages that install viruses,          techniques is placing links in blogs and encouraging
        trojans, or other badware.                        others to do the same. This technique has been
    •   Avoid "doorway" pages created just for            proven very effective, since the winner of the
        search engines, or other "cookie cutter"          “Nigritude Ultramarine” SEO contest relied almost
        approaches such as affiliate programs with        uniquely on it in order to achieve #1 position.
        little or no original content.
    •   If your site participates in an affiliate         It can be argued if Google bombs are desirable or
        program, make sure that your site adds            not, but the fact is that search engines are still not
        value. Provide unique and relevant content        very good in analyzing a page to find out whether it
        that gives users a reason to visit your site      has relevant content. Instead, they mostly rely on
        first.                                            what other people say about it, which almost the
                                                          same as saying “the links that point to that page”. By
Google also incentives users to alert Google about        not manually dealing with Google bombs, search
webpages that don’t follow this criteria and there is     engines seem to not care that much about whether a
a specific site for this purpose. If a webpage got out    page does have relevant content or not, instead they
of the list by not following the guidelines, there is a   try to find out if what people say about it is true or if
Reconsideration Request that webmasters may fill          it’s just someone trying to manipulate the results by
and submit after the webpage being correctly rebuilt      pretending to be many people using black-hat
in order to get the page on the Google index again.       techniques (such as link farming). However,
                                                          attempts to deceive the search engine, such as
                                                          presenting a different page to the crawlers and the
                                                          visitors, are also punished.
