Thesis

Document Sample
Thesis Powered By Docstoc
					      SEARCH ENGINE OPTIMIZATION AND MARKETING

                                    by

                             Binoy Varghese

                A thesis submitted in partial fulfillment of
                   the requirements for the degree of


                 Master of Science in Computer Science


                    California State University, Chico


                                 Fall 2005



Approved by ___________________________________________________
                      Chairperson of Supervisory Committee

             __________________________________________________
             __________________________________________________
             __________________________________________________

Program Authorized
to Offer Degree _________________________________________________


Date __________________________________________________________
               CALIFORNIA STATE UNIVERSITY, CHICO

                                       ABSTRACT

          SEARCH ENGINE OPTIMIZATION AND MARKETING

                                  by Binoy Varghese

       Chairperson of the Supervisory Committee: Professor Anne Keuneke
                        Department of Computer Science


   Search engines have become the most important tool powering Internet
Marketing. The liberty that search engines offer to the user to find vast amounts
of information at the click of a button has made this software the connecting
point for internet surfers. The visibility of websites is directly related to their
rankings in search engine results. Higher rankings in search engine results ensure
highly targeted visitors resulting in maximum sales. Search engine optimization
and search engine marketing are powerful strategies that are constantly evolving
with new developments in search engine technology. They promise improved
visibility in search engine results.

   Search engine optimization is concerned with the construction of websites, the
technical aspects of a website that search engines tend to like. This is generally a
one time process incorporated during the development phase of a website. Search
engine optimization (SEO) can improve ranking within organic search results
over a period of time.

   Search engine marketing, on the other hand, is an ongoing process. Inorganic
contextual search result rankings can be purchased from leading search engine
companies. There are recurring costs involved with search engine marketing
(SEM) and no long term benefits.
   Both, SEO and SEM are key strategies for any business to consumer (B2C)
website to optimize profits and gain new leads.
                                            TABLE OF CONTENTS



List of Figures ..................................................................................................................... iii
List of Tables....................................................................................................................... iv
Chapter I: Introduction ...................................................................................................... 1
    Significance of search engines ................................................................................... 1
    Statement of Problem ................................................................................................. 2
    Purpose of Study .......................................................................................................... 2
    Research Conducted .................................................................................................... 3
    Limitations of Study .................................................................................................... 4
    Internet Marketing Terms .......................................................................................... 4
    Internet advertising pricing models .......................................................................... 5
    Online advertisement media formats ....................................................................... 9
    Roadmap to forthcoming chapters ......................................................................... 11
Chapter II: Search Engines.............................................................................................. 12
    Crawler based search engines .................................................................................. 12
    Web directories ........................................................................................................... 13
    Search engine relationship chart .............................................................................. 13
    Organic and inorganic searches ............................................................................... 15
    Search engine user trends ......................................................................................... 16
    Page Rank and Trust Rank Algorithms ................................................................. 17
    Overlap analysis .......................................................................................................... 20
Chapter III: Search Engine Optimization .................................................................... 23
    Search engine optmization ....................................................................................... 23
    Factors affecting SEO ............................................................................................... 23
    Key terms..................................................................................................................... 25
    Title tag ........................................................................................................................ 26
    Meta tags ...................................................................................................................... 27
    Body text ...................................................................................................................... 28
    Menu bar ...................................................................................................................... 29
    Keyword density analysis .......................................................................................... 30
    HTML code validation.............................................................................................. 30
    Absolute vs relative URL.......................................................................................... 30
    Tables in HTML code ............................................................................................... 31
    Sitemap ......................................................................................................................... 31
    Inbound links .............................................................................................................. 33
    Outbound links........................................................................................................... 35
    Reciprocal linking and link building ....................................................................... 35
    Search engine friendly URL ..................................................................................... 37
    Domain name ............................................................................................................. 39


                                                                    i
   404 Error page ............................................................................................................ 42
   301 Redirection........................................................................................................... 42
   Robots.txt meta file.................................................................................................... 43
   Website submission ................................................................................................... 43
   Visitor analysis ............................................................................................................ 45
   Frameset....................................................................................................................... 45
   SEO Roadmap............................................................................................................ 46
Chapter IV: Search Engine Spam .................................................................................. 47
   Search engine spam.................................................................................................... 47
   Consequences of spamming .................................................................................... 47
   Spamming techniques ............................................................................................... 48
   Hidden text .................................................................................................................. 48
   IP Cloaking .................................................................................................................. 49
   Doorway pages ........................................................................................................... 49
   Pagejacking .................................................................................................................. 50
   Domain duplication ................................................................................................... 50
   Excessive popups ....................................................................................................... 50
   Inflating link popularity............................................................................................. 50
   ALT stuffing ............................................................................................................... 51
   Link farming ................................................................................................................ 51
   FFA ............................................................................................................................... 51
   Mousetrapping ............................................................................................................ 51
Chapter V: Search Engine Marketing ............................................................................ 53
   Search engine marketing ........................................................................................... 53
   Cost per visitor model ............................................................................................... 53
   Malpractices in SEM Industry ................................................................................. 53
   Google and Yahoo Search Marketing .................................................................... 55
Chapter VI: Summary, Conclusions and Recommendations ................................... 57
   Introduction ................................................................................................................ 57
   Summary ...................................................................................................................... 57
   Conclusions ................................................................................................................. 58
   Recommendations ..................................................................................................... 60
   Suggestions and Future Research............................................................................ 60
References........................................................................................................................... 62




                                                                   ii
                                               LIST OF FIGURES



Number                                                                                                                        Page


Figure 1: Internet advertising pricing models .............................................................. 6
Figure 2: State diagram - Internet Marketing .............................................................. 7
Figure 3: Media formats ................................................................................................... 9
Figure 4: Search engine relationship chart [4] ............................................................ 14
Figure 5: Organic and Inorganic search results - Google snapshot....................... 15
Figure 6: Percent share of searches conducted by U.S. surfers in July
        2005 [5] ............................................................................................................... 16
Figure 7: Percent share of searches – Trend [5] ........................................................ 17
Figure 8: Overlap analysis of imageblowout.com - Googlerankings.com
        snapshot .............................................................................................................. 21
Figure 9: Relationship between search engine results and Title tag &
        Description meta tag ........................................................................................ 26
Figure 10: Search results for keyword - betterbody - Google snapshot ............... 39
Figure 11: Keyword density analysis of betterbody.de............................................. 41
Figure 12: Search engine optimization roadmap ....................................................... 46




                                                                iii
                                          LIST OF TABLES



Number                                                                                                      Page


Table 1: Rates of various pricing models ...................................................................... 8
Table 2: In-Page advertisement formats ..................................................................... 11




                                                        iv
                                  Chapter 1


                               INTRODUCTION


Significance of search engines

   The significance of search engines on the internet is analogous to that of
operating systems for computers. The inception of search engine technology
began with the creation of ARCHIE [1] by Alan Emtage in 1990. Today, search
engines have evolved as the most crucial business application on the internet due
to its enormous potential to provide highly targeted consumer traffic to business
to consumer (B2C) websites. From a marketing perspective it is more affective
for an online business to be listed in the first three pages of search engine results
than any other form of online marketing.

   Few observations that highlight the relevance of search engines are:

    1. 84.8% of internet users find websites through search engines [2]

    2. 81.7% of internet users read only the first three pages of search results [3]

    3. 87.2% of internet users use their favorite search engines to launch their
        queries [3]

   These observations are based on surveys conducted by Graphic, Visualization
and Usability center at Georgia Institute of Technology (www.gvu.gatech.edu),
iProspect    (www.iprospect.com),       webSurveyor       (www.websurveyor.com),
Stratagem Research (www.strategeminc.com) and Survey Sampling International.
(www.surveysampling.com). They emphasize key aspects in the behavior of
internet users with regard to search engines:



                                         1
    1. Dependability on search engines

    2. Trust in search engine results

    3. Loyalty to search engines

   Search engines have thus become a gateway to gain targeted visitors so much
so that search engine optimization and search engine marketing have become the
focal point of internet marketing.

Statement of Problem

   As more and more web pages are appended to the Internet, there is a constant
need for any B2C merchant to stand out among competitors in order to attract
more and more consumers. Since an online merchant has to face increasing
competition everyday, the most effective strategy to maximize exposure is by
achieving top rankings in search engine results. Search engines have psychological
relevance against any other channel of online advertising. The search engine user
is looking for desirable information. There is no better time to present a product
or service to a search engine user more than when one is looking for it. This
psychological factor binding search engine provides a high return on investment
[ROI]. This study attempts to identify various techniques to improve ranking of a
website in search engine results.

Purpose of Study

   Search engine relevancy algorithms are proprietary. Due to the mysterious
nature of search engine relevancy algorithms, the process of achieving higher
rankings in organic search engine results cannot be determined using a
mathematical or mechanical model. Different search engines use different factors
to determine relevancy of a webpage with respect to a search query. Search



                                        2
engine technology is in its evolving phase; hence search engine companies are
constantly classifying various techniques that improve ranking of a web site in
search engine results as spam. Though major search engine companies do specify
what they consider as spam, many of the minute technical details cannot be
ascertained by reviewing these specifications. A website is penalized if identified
as a spamming website. As a consequence, the website may be assigned a lower
ranking or even removed from the index. The purpose of this study is to identify
techniques which may improve the ranking of a website in organic search engine
results, specify spamming strategies which should be avoided and introduce
search engine marketing paying close attention to publisher and competitor
malpractices.

Research Conducted

   Search engine optimization strategies are based on assumptions which are
verified using trial and error. There is no literature on the subject which can claim
that following a set model can result in a certain ranking in the results of a
specific search engine. The research work for this study was conducted by
reviewing relevant literature and applying these techniques to derive a refined
process which can be incorporated into the software development stages of a
website project.

   The research conducted enabled the author to gain more exposure to the
relevancy   algorithms     of   Google       (www.google.com),     Yahoo!     Search
(search.yahoo.com) and MSN Search (search.msn.com) [Chapter II: Search
Engines]. A search engine optimization process has been outlined by the author
in Chapter III. This process classifies different optimization schemes into
techniques that may be applied to individual web pages and the entire website.
Search engine spamming strategies have been listed in Chapter IV. These
strategies are identified by major search engines as spam and should be restrained


                                         3
to avoid penalty. Chapters V focuses on bid jamming and click fraud which has
become prevalent in search engine marketing industry and increases awareness
regarding these malpractices.

Limitations of Study

   This study serves as a guideline to search engine optimization and search
engine marketing. The inferences derived in Chapter II, Chapter III, Chapter IV,
Chapter V and Chapter VI are based on independent research and data gathering.
Due to the proprietary nature of search engine relevancy algorithms, the process
outlined may not incorporate all possible optimization and spamming techniques.
This manuscript specifies guidelines pertaining to these techniques which are
valid as of October 2005. As search engine technology evolves, many of the
specified optimization techniques may be rendered as spam by search engine
companies. The study should be used to enhance knowledge relative to the
subject so that the reader can identify and concentrate on specific optimization
techniques.

Internet Marketing Terms

   Before one considers the potential of search engines as a marketing tool, one
should become acquainted with internet marketing. Internet marketing like any
other marketing is based on the basic principle of marketing – Make a sale to the
consumer.

   The key terms associated with internet marketing are:

   1. Visitor: Internet users who visit a website

   2. Targeted visitor: Visitors to a website who are interested in what the site
       has to offer



                                       4
   3. Publisher: Any individual/organization that offers advertising space on
       the site

   4. Advertiser: Any individual/organization interested in buying advertising
       space on websites

   5. Affiliate: A website that provides internet traffic to another website in
       return for a commission in sales

   6. URL: Uniform resource locator is the global address of documents and
       other resources located on the internet

   7. CTR: Click through rate is the number of clicks divided by the total
       number of impressions of the advertisement over a period of time

   8. Above the fold: The portion of the webpage that is viewable in a browser
       without scrolling

   9. Affinity Marketing: Marketing strategies based on established buying
       patterns

   10. Click Tracking: The process of tracking and auditing visitors referred by
       the publisher‟s website to the advertiser‟s website. This is done by setting
       a cookie on the visitor‟s browser that records the publisher, the link and
       the payment rates

Internet advertising pricing models

   At the lowest level, internet marketing can be classified into three pricing
models. They are CPM – Cost per milli, CPC – Cost per click, and CPA – Cost




                                          5
per acquisition. These models are behaviors exhibited by visitors on the
publisher‟s website.

    1. CPM or Cost per milli is the cost per 1000 impressions of advertisement
        displayed on the publisher‟s website

    2. CPC or Cost per click is the cost paid by the advertiser when a visitor to
        the publisher‟s website clicks on the advertisement and arrives at the
        advertiser‟s website irrespective of the impressions displayed. The
        psychology being that the visitor is interested in visiting the advertiser‟s
        website and hence the act of clicking on the advertisement. These visitors
        are targeted visitors




                       Figure 1: Internet advertising pricing models




                                                6
3. CPA or Cost per acquisition is generally a commission paid by the
   advertiser for behavior on the advertiser‟s website by a targeted visitor
   resulting in an action desired by the advertiser. This is a direct marketing
   model and takes different forms, most popular of which are:

   i.   CPL or Cost per lead pays a flat fee for obtaining a consumer lead,
   such as signing up for a newsletter program

   ii. CPS or Cost per sale pays a commission based on a transaction made
   by the consumer, such as a purchase




                Figure 2: State diagram - Internet Marketing




                                        7
   A comparison between the three pricing models from the advertiser‟s
viewpoint leads to the conclusion that CPM is least effective and CPL is most
effective. The reason being the occurrence of a successful transaction on the
advertiser‟s website is independent of the number of times a URL is displayed on
the publisher‟s website. For a publisher, it may seem that CPM is the highest
return on advertising space. On the contrary, CPM pays lower than both CPC
and CPL over a significant period of time.




                     Table 1: Rates of various pricing models



   Table 1 illustrates the rates paid by various advertisers and real time statistics
of imageblowout.com. In May 2005, imageblowout.com displayed 22,522
impressions of Google Adsense content (www.google.com/adsense). With a
payout rate US$ 0.09 per 1000 impressions, the earnings for CPM would have
been (22,522 / 1000) * 0.09 = US$ 2.02. Instead, imageblowout.com made US$
7.82 in CPC earnings with a click through rate of 0.5% (112 clicks). Assuming


                                             8
that imageblowout.com participated in allposters.com CPS program and referred
5 successful transactions out of 112 referrals each of US$ 8.99, then CPS earnings
would have been (8.99 * 5) * 0.3 = US$ 13.48. The assumed referral rate is (5 /
22,522)*100 = 0.02%. The above statistics makes a practical comparison between
CPM, CPC and CPA programs and indicates that CPA has a tendency to
generate higher profits for the publisher.

Online advertisement media formats




                     Figure 3: Media formats



   All pricing models use the similar media formats for advertisements. The cost
paid by the advertiser depends on the following factors:

    1. File size

    2. Dimensions of the media



                                               9
   3. Location on the publisher‟s webpage; e.g. Above the fold

   4. Nature; e.g. Static, Dynamic, Movie, Talking media

  The different types of media formats used in internet marketing are discussed
below:

Window Formats

   1. Pop-Ups are generally 720x300 px windows which automatically open on
         top of the primary browser window. This can be annoying to the visitor if
         the content is not relevant to the visitor‟s interest.

   2. Pop-Unders are generally 720x300 px windows which automatically open
         under the primary browser window without distracting the focus of the
         internet user from the primary window.

   3. Interstitials are 728x600 px web pages launched between two pages which
         the visitor is navigating. This page opens within the primary browser
         window, thus capturing the full attention of the user.

   4. InVues (250x250 px) slide into the center of the primary browser window
         after the main webpage loads completely. This is a modified version of
         the pop-up window but less intrusive.

In-Page formats




                                           10
                    Table 2: In-Page advertisement formats



Roadmap to forthcoming chapters

   Chapter II: Search Engines provides background information related to search
engines, directories, relationship between search engines, search engine user
trends, PageRank algorithm, TrustRank algorithm and overlap analysis of popular
search engine results. Chapter III: Search engine optimization focuses on
improving web page rankings in search results by fine tuning contents of the web
page. Chapter IV: Search engine spam points out factors to avoid while
improving ranking in search engine results. Chapter V: Search engine marketing
gives a brief overview of Pay per click search engine marketing. Chapter VI:
Conclusion and Recommendation summarizes the guidelines discussed in the
study




                                           11
                                  Chapter 2


                              SEARCH ENGINES


Crawler based search engines

   Search engines are software programs that provide users with URL of relevant
internet web pages relative to the keyword used to perform the search.

   A crawler based search engine consists of:

    1. Spider/Crawler visits a web page, stores a mirror image of all the
        information gathered from the web page on visit date and time and
        follows URL to other web pages within the site and other web sites. The
        mirror copy is called cached page. The spider returns to all web pages
        previously crawled to maintain up to date information about these pages.

    2. Indexer is a catalog which consists of copies of all web pages crawled by
        the spider with date time stamp. There is generally a delay between
        spidering a web page and adding it to the index. The search engine results
        are derived from the index and hence may not reflect the spidered web
        pages until the index is updated.

    3. Search software searches through all pages recorded in the index in
        response to a query and returns URL of related web pages ranked in an
        order determined by the search engine relevancy algorithm.

   A relevant search engine result may be defined as the set of URLs displayed in
response to a user query of which the user clicks one or more URLs. Relevancy
of search engine result is relative to the user as two users querying the same



                                         12
search engine with the exact same keyword may be searching for very different
information.

Web directories

   Most search engine spiders use web directories as the seed or starting point for
their crawl. A web directory is a human compiled listing of URLs to thousands of
websites categorized into different groups. Most well known directories are
DMOZ (www.dmoz.org) and Yahoo! Directory (dir.yahoo.com).

   DMOZ is the largest and most comprehensive human-edited directory on the
internet. Listing a website in DMOZ is free. It takes around 3 weeks to 6 months
for a listing to be approved. If the submission is improper there is a good chance
that the listing will be denied. Yahoo! Directory is a paid directory with an annual
recurring cost of US$ 299 for commercial sites and US$ 600 for sites with adult
content. Yahoo also provides a free listing feature but there is no guarantee
whether the listing will be accepted or rejected. Search engines view directory
listings as a vote of confidence in web sites [9]. Being listed in either of these
directories is crucial since most popular search engines spider these directories
and one can be certain that their website will be spidered by the search engine if
these directories link to their site.

Search engine relationship chart

   Figure 4 illustrates the relationship between major search engines and
directories. One can conclude that:

    1. DMOZ acts as the seed for Lycos (www.lycos.com), HotBot
        (www.hotbot.com),          AOL     Search     (search.aol.com),     Teoma
        (www.teoma.com), Google, iWon (www.iwon.com) and Netscape Search
        (search.netscape.com)


                                         13
2. Yahoo! directory acts as the seed for Yahoo! Search and AltaVista
   (www.altavista.com)




               Figure 4: Search engine relationship chart [4]




                                        14
   3. Google Adwords (adwords.google.com) provides paid search results to
       Google    Search,     HotBot,       AOL       Search,     Lycos,   Ask   Jeeves
       (www.askjeeves.com), Teoma, iWon, Netscape Search

   4. Yahoo! Search marketing (searchmarketing.yahoo.com) provides paid
       search results to Yahoo! Search, AllTheWeb (www.alltheweb.com),
       AltaVista and MSN Search

Organic and inorganic searches

  Key terms that one may encounter in the study of search engine results are:

   1. Organic search results: Non sponsored results returned by a search
       engine in response to a user query. The ranking of the results is
       determined by the relevancy algorithm of the search engine.




                    Figure 5: Organic and Inorganic search results -
                    Google snapshot




                                          15
    2. Inorganic search results: Results returned by a search engine where
        ranking of results are determined by the cost paid by the advertiser to the
        advertising network.

Search engine user trends

   Figure 6 and 7 are compiled from data collected by comScore Media metrix
(www.comscore.com/metrix/) gsearch service which monitors web activities of
1.5 million English speaking internet surfers worldwide. Both figures highlight the
significance of




                     Figure 6: Percent share of searches conducted by
                     U.S. surfers in July 2005 [5]




                                           16
Google, Yahoo! Search and MSN Search as search engines. Figure 7 – “percent
share of searches trend” clearly indicates that: Google is the most popular search
engine and its popularity has increased between Jan 2005 to Jul 2005. The
popularity of Google makes it necessary to understand their proprietary search
algorithms.




                     Figure 7: Percent share of searches – Trend [5]



Page Rank and Trust Rank Algorithms

   Google determines rankings of its search result listings using PageRank and
TrustRank algorithms. It is important to understand these algorithms since the
higher one‟s website ranks in search engine results, higher the potential to gain
more targeted visitors.




                                             17
PageRank [6]: The rank of a webpage in organic search results of Google is
determined by PageRank.

PR(A)=(1-d) + d[PR(T1)/C(T1) + … + PR(Tn)/ C(Tn)]

where

   PR(A) is Page Rank of web page A

   T1…Tn are web pages that point to page A

   d is damping factor which can be set between 0 and 1. It is usually set to 0.85

   C(A) are the number of links going out from web page A

PR(A) is based on the concept that a random surfer who is given a web page A
keeps clicking on links at random until he gets bored. The surfer never hits the
back button. On getting bored, the random surfer requests a random web page.
The probability that a surfer visits a page A is PR(A). The damping factor d is the
probability that at each page, the surfer gets bored and requests another random
web page. A variation that is added to the PageRank calculation is that different
damping factors may be assigned different pages T1…Tn which link to page A.

   One can conclude from the PageRank equation that:

    1. The more inbound links a web page has, the higher the PageRank

    2. It is better to have inbound links from a web page that has high
        PageRank and few out links over a webpage with high PageRank and too
        many out links.




                                        18
        e.g.    PR(X) = 4 and C(X) = 5       then d[PR(X)/C(X)] = 0.85d
                PR(Y) = 8 and C(Y) = 100 then d[PR(Y)/C(Y)] = 0.085d

PageRank forms a probability distribution over web pages, so the sum of all web
pages‟ PageRanks will be 1. PR(A) can be calculated using an iterative algorithm,
and corresponds to principal eigenvector of the normalized link matrix of the
web [6].

PR(A1) + PR(A2) + PR(A3) + … + PR(An) = 1

PR(A) = (1-d)      if web page A has no inbound links.

There are hundreds of web pages added to the World Wide Web every moment.
Since sum of PageRank of all web pages over the WWW is a constant i.e. 1, this
means that as more pages are added to the WWW, PageRank of each web page
gets constantly updated to accommodate the PageRank of new web pages‟.
Assume that, if a web page has no inbound links, (1-d)≈ 0. As inbound links
increase the PageRank of a webpage, one can conclude that outbound links
decrease the PageRank of a web page. This decrease in PageRank of a webpage
due to outbound links is called PageRank Leak.

   To ensure a high PageRank it is necessary that:

    1. A web page should have high number of inbound links

    2. A web page should have low number of outbound links

   The PageRank algorithm determines the importance of a web site by counting
the number of inbound links. This concept can be manipulated by artificially
inflating the number of inbound links to a web page. PageRank also does not
incorporate the quality of the web page in its calculations. Hence Google is


                                        19
developing the TrustRank algorithm and has registered the trademark for
TrustRank on March 16, 2005.

TrustRank [7]: According to Gyongyi, Garcia-Molina and Pederson, the
proposed algorithms for TrustRank rely on the PageRank algorithm. This
algorithm takes into account, not only the inbound links to a web page but also
the quality of the web page. To determine the quality of a web page, a panel of
human experts will identify a set of reputable web pages that will act as the seed
for the spider. This algorithm is based on an empirical observation that: good
pages seldom point to bad ones.

   One can conclude that a web page can achieve higher TrustRank if:

    1. Reputable (good) web pages link to the web page

    2. The web page does not link to any bad web pages

    3. The web page does not mislead the search engine or employ search
        engine spam

Overlap analysis

   A study conducted by Dogpile.com in collaboration with the University of
Pittsburg and Pennsylvania State University in April 2005 and July 2005 reveals
that only 1.1% of 485,460 first page search results were the same across Google,
Yahoo!, MSN Search and Ask Jeeves [8]. The study of search engine results for a
given keyword over different search engines at the same time is termed as
Overlap analysis and forms the basis of Meta search engines like Dogpile.com.
Meta search engines send search queries to popular search engines and their
results are displayed together on a single page. Since Google, Yahoo! Search and
MSN Search are significant in terms of percent share of search queries answered,


                                        20
it is important to optimize the web pages to achieve top rankings in all three
search engines.

   Figure 8 is a snapshot of overlap analysis performed on Google, Yahoo!
Search and MSN Search conducted on August 25, 2005 at 18.50 EST for the
keyword “free image library” and URL pattern “imageblowout.com”.




                     Figure 8: Overlap analysis of imageblowout.com -
                     Googlerankings.com snapshot




                                           21
   1. Yahoo Search displayed “imageblowout.com” on the page# 1 of search
         results

   2. MSN Search displayed “imageblowout.com” on the page# 2 of search
         results

   3. Google displayed “imageblowout.com” on page# 55 of search results

  The reason for the substantial difference in the ranking between Google
search results and Yahoo! Search and MSN Search results is due to proprietary
relevancy algorithms used by these search engines. Yahoo! Search and MSN
Search    use      content   based   relevancy   algorithms.   The   title   tag   of
imageblowout.com is “Imageblowout – Free Image Library for Commercial
Use”. This is an exact match to the keyword used to perform the search query,
hence the higher rankings in Yahoo! Search and MSN Search.




                                          22
                                  Chapter 3


                      SEARCH ENGINE OPTIMIZATION


Search engine optmization

   Search engine optimization can be defined as the process of fine tuning a web
page so that it achieves higher ranking in search engine results. There is a very
thin line separating search engine optimization and search engine spam. Any web
page undergoing optimization should be not be optimized to a level where by it
may qualify for search engine abuse. If a web page is detected as search engine
spam, it may be penalized or even removed from the search engine index. The
website will not show up in search results nor will it be indexed by a search
engine spider until it is added back to the search engine index.

Factors affecting SEO

   Search engine optimization of a website can be broken down into two distinct
groups.

       1. Web page optimization

       2. Web site optimization

   Both categories are inter-related. All factors within each category should be
paid equal attention to achieve proper optimization of the site.

   Factors which tend to improve ranking of a webpage in search engine results
are:

       1. Key Terms



                                         23
2. Title Tag

3. Meta Tags

4. Body Text

    i.      Alternative Text in Img Tag

    ii.     H1-H6 tags

5. Menu bar

6. Keyword density analysis

7. HTML code validation

8. Absolute vs Relative URL

9. Tables in HTML code

Factors which play a role in improving the ranking of the entire website are:

1. Sitemap

2. Inbound links

3. Outbound links

4. Reciprocal linking and link building

5. Search engine friendly URL

6. Domain name



                                     24
    7. 404 Error page

    8. 301 Redirection

    9. Robots.txt meta file

    10. Search engine submission

    11. Visitor analysis

   Factors that must be avoided during webpage construction:

    1. Frameset

   These factors are not discussed in any specific order. Each factor is
significant and plays an important role in improving the ranking of a website
in search engine results.

Key terms

   Key terms are queries by search engine users to find the information that they
are looking for. Research should be conducted to identify the most and least used
search terms relevant to the web page. Once the key terms are identified, they
should be incorporated into the web page in a manner which would not
constitute abusing the search engine. Search engines are misled by artificially
inflating the density of key terms.

   A higher density of key terms in a web page may lead to higher search engine
rankings. It is advisable to purchase the domain name of a website which is
identical to a key term. The domain name of a website is a primary factor used by
search engine relevancy algorithms to rank a website.




                                       25
   A good source to identify key terms is www.wordtracker.com. Wordtracker
gives suggestions based on over 300 million key terms used by users from
Metacrawler.com and Dogpile.com in past 120 days with the actual frequency of
key term and the predicted frequency.

Title tag

   The title tag is a very important component used by search engine relevancy
algorithms to determine ranking. The title tag is also used by search engines in the
search result listing. The title tag should incorporate a high frequency key term.
At the same time the title should convey the overall information available on the
web page, so that the user is enticed to click it. It is advisable to have different
titles on different web pages of the same website. The title of a web page should
reflect the overall content on the web page.




                     Figure 9: Relationship between search engine
                     results and Title tag & Description meta tag




                                         26
Meta tags

   Meta tags are used to provide information about relevant keywords and the
purpose of the web page. Meta keywords tag was used as determinant in the
relevancy of a web page by early search engines. Currently very few search
engines consider the meta keyword tag in their relevancy algorithms. With
reference to Figure 9, one can see that the description provided in search results
is the information in the DESCRIPTION meta tag. This information should be
precise to entice a search engine user to click on the link.

   A few Meta tags that should be paid close attention while optimizing web
pages are:

    1. <meta name="robots" content="…" />
        Robots meta tag specifies instructions to a search engine spider whether
        the owner of the website would allow/disallow the spider from indexing
        other web pages that are linked from this page

    2. <meta name="keywords" content="…" />
        Keywords meta tag is used to indicate keywords relevant to the web page

    3. <meta name="description" content="…" />
        Description meta tag provides information regarding the intended
        purpose of the web page

   It is worthwhile to spend extra time to have content related keywords and
description on every page rather than specifying identical keywords and
description for the entire website.




                                          27
Body text

   Search engines tend to like unadulterated HTML code. The term adulteration
is used in context of embedded JavaScript code, Flash movies and Image files.
Search engines do not attempt to read these contents even though they might
contain significant density of keywords within them. An example is the logo
image on the website which in most case state the domain name and the caption.
The caption may represent a keyword oriented phrase much similar to the title of
the web page but since this information cannot be read by the search engines, it
cannot be accounted to determine the relevancy of the web page. Simply put,
“What a search engine cannot see does not exist on the web page”. This may or
may not be true for human visitors but is certainly a rule adhered by the search
engine spider. Flash text can only be read by FAST Alltheweb.com. None of the
other search engines can read flash text or follow flash links [9]. Similar to flash
files is JavaScript code embedded within the HTML files. Most search engines
ignore JavaScript code and links within this code [10]. Another factor to consider
is the keywords appearing in the “Above the fold” region of a webpage. The
higher the keyword density in this region, the more relevant the web page is for a
given keyword.

   With these factors in mind, one can adapt the strategies outlined below to
optimize body text:

    1. Include JavaScript code as a separate file. This can be done using the
        following HTML tag.
        <SCRIPT LANGUAGE="JavaScript" SRC="myJavaScript.js"></SCRIPT>


    2. Minimize usage of Flash movies




                                        28
    3. Always use ALT attribute in IMG tags. The HTML IMG tag is
         <img src="myImage.gif" alt="My Image" />. This is the common usage
         of the tag. For optimization purposes, it might be better to use the tag as
         <img alt="My Image" src="myImage.gif" />. The objective is to bring
         the keyword phrase as close to the beginning of the HTML file so that
         the web page can increase the density of the keywords in the “Above the
         Fold” region.

   Style sheets are incorporated in almost every web page to enhance the visual
appearance of the web page. This might entice the visitor since one will find the
appearance of the web page more appealing compared to plain HTML, but for a
search engine spider, style sheets are unrelated text in the “Above the Fold”
region. The web page content can be optimized by including the style sheet file
over embedding the style sheet code using the LINK HTML tag:
<LINK href="myStyleSheet.css" rel="stylesheet" type="text/css">

   Heading tags also play a very important role in the content of a web page. It is
advisable to embed keywords in bold face within H1 to H6 tags with preference
given to H1 tag over H6 tag as 1 through 6 determines the importance of the
heading. Font faces like bold, italic and underline determine relative importance
of text and should be used wherever applicable in conjunction with key phrases.
A typical webpage may have keyword rich content with at least 200-250 words of
text [11].

Menu bar

   Menu bar in a webpage link to the most important pages on the site. Since
almost every page on the site contains menu, they vote for the pages linked by the
menu bar. This increases link popularity of these pages within the website. These
pages should have good targeted content and adhere to the linking guidelines


                                         29
discussed in the Sitemap section below. This may result in higher ranking for
these pages.

Keyword density analysis

   Every search engine has a different keyword density calculation. Some search
engines permit heavier keyword density on a webpage. Others like Google have
stricter allowable density levels. The placement of keywords in different locations
of the webpage has varying effects. A high density of keywords above the
permissible limit will be considered as spam by the search engine and will cause
the website to be penalized. Google allows a maximum of 2% of the webpage
text to be keywords. Yahoo and MSN Search allow a keyword density of 5% [12].

   A free tool to check keyword density of a webpage is available at
www.searchengineworld.com/cgi-bin/kwda.cgi

HTML code validation

   It is highly advisable to validate HTML code before submitting to search
engines. Even though the webpage may look visually correct, it may have syntax
errors which may be ignored by forgiving browsers like Internet Explorer. A free
validation service is provided by W3C. This service is available at
validator.w3.org. This software checks for W3C XHTML 1.0 compliance and
gives a detailed report. W3C cascading style sheet validation is available at
jigsaw.w3.org/css-validator/

Absolute vs relative URL

   Search engine spiders prefer absolute URL over relative URL. Search engine
spiders may miss indexing some web pages when relative URLs are used.
Absolute URLs will significantly reduce the portability of the website in the event



                                         30
of domain name change. This can be overcome by using a global variable which
will contain the domain name of the website. This variable can be used to
generate absolute URLs within web pages.

Tables in HTML code

   Tables are used in webpage construction to make the layout more organized.
Some web developers may use tables within table to simplify the webpage
structure for maintenance purposes. This adds a lot of irrelevant text decreasing
the keyword density in the “Above the Fold” region of the webpage. Most web
pages have menu bar on the left hand side or the top of the web page. Having the
menu bar positioned in such a way may decrease the density of keywords in the
“Above the Fold” region.

   Few alternatives to these issues are:

   1. Position the menu bar on the right side of the web page and keyword
       sensitive content on the the left side

   2. Use CSS stylesheet to define individual tag specifics. This CSS code must
       be placed in a separate file
       e.g. <td id="centerLcolumn">My Text</td>
       instead of
       <td width="100" height="400" bgcolor="#000000"
       bordercolor="#CCCCCC"> My Text</td>

Sitemap

   A site map is a web page with links to every webpage within the web site. This
web page has high importance in the website. Once the sitemap gets spidered by




                                           31
a search engine, one can be sure that every page on the website has been indexed.
When designing the sitemap of a website, key points to remember are:

   1. The sitemap should contain HTML anchor tags

   2. The link text should consist of keywords relevant to the destination
       webpage. The link text may contain identical phrase as the TITLE tag of
       the destination webpage. The link text is significant since it states what
       the content of the destination page may be. Link text is taken into
       consideration by the relevancy algorithm of search engines.

   3. The sitemap should be visible to the search engine. This means that there
       must be a link from the every page of the website (typically in the footer)
       to the sitemap and spiders must be permitted to index the sitemap

   A typical link on a sitemap may be modeled on the following example
<a href=”http://mysite.com/gallery.htm”>Gallery</a>

   Avoid the following:

   1. JavaScript handlers in anchor tag
       <a href=”#” onclick=”gotoURL(„gallery‟)”>Gallery</a>

   2. Flash movie for sitemap

   3. Images instead of link text
       <a href=”http://mysite.com/gallery.htm”><img alt=”Gallery”
       src=”gallery.gif” /></a>

   4. Imagemap




                                       32
    5. Irrelevant link text
        <a href=”http://mysite.com/gallery.htm”>Check this out</a>

   If the sitemap has more than 100 links, split the sitemap into multiple pages. A
guide to creating sitemaps is provided by Google and is available at
http://www.google.com/webmasters/sitemaps/docs/en/about.html. It is
advisable to read these guidelines and follow them while creating the sitemap.

Inbound links

   For Google, inbound links help determine the PageRank of a website.
Without any inbound links, a website is practically invisible to the search engine.
One way for the search engine spider to index a website is by following inbound
links from another indexed website. The alternative is to manually submit the
website to the search engine spider‟s crawling list. Though manual submission is
encouraged, there is never a guarantee that the website will be indexed. On the
other hand if there are inbound links from other sites, it is more predictable that
the website will be indexed.

   Inbound links from the following sources help in improving ranking of a web
page [10]:

    1. All major and local directories; Yahoo, DMOZ, LookSmart, trade,
        business and industry related directories

    2. Suppliers, happy customers, sister companies and Partners

    3. Websites which provide accompanying services
        e.g. Inbound links from web hosting companies for a site selling website
        templates




                                        33
   4. Related websites but not competing websites
       e.g. Websites that provide tutorials about web design and modification of
       website templates

   5. Competing websites

   Not all inbound links have the same weightage. Links from authoritative
industry sources count more towards improving page ranks than links from a
small private website. Some inbound links may have a negative effect on the
PageRank. These are:

   1. Links from FFA (Free for all) link pages

   2. Link farms
       Link farming is the process of organized exchanging of unrelated links
       between websites.

   3. Links from doorway pages
       Doorway pages are web pages created with the intent of inflating the
       inbound links of a website. These pages are created with the sole purpose
       of serving search engine spiders with optimized content which may boost
       the ranking of the webpage.

   4. Links from discussion forms
       Discussion forums can be maliciously used to inflate the inbound links to
       a website. Given a good but unmoderated message board, spammers may
       include messages to their spam pages as part of seemingly innocent
       messages they post [7]. In a moderated message board, spammers post
       valid messages with links to their websites in their signatures.




                                        34
   Most search engines penalize websites which employ malicious techniques to
inflate link popularity to the extent of removing the website from the index.

Outbound links

   Outbound links may improve the ranking of a website as long as the website is
citing good websites [10]. Good websites are the ones which have been
recognized as authorities in the industry relevant to the website. Outbound links
may cause PageRank Leak as discussed in the preceding chapter. If one is
following a reciprocal linking program, PageRank Leak can be minimized by
masking the destination URL of outbound links using JavaScript code or using
the NOINDEX NOFOLLOW property in the Robots Meta Tag. This is not an
ethical practice but is followed by some websites. An ethical solution would be to
maintain outbound links to a few authoritative and related websites. Avoid linking
to websites which follow the practice of masking URL in a reciprocal linking
program as this would cause a PageLeak with no worthwhile benefit.

Reciprocal linking and link building

   Reciprocal Linking is a strategy to gain inbound links from websites that share
the same idea as one‟s website and provide an outbound link in exchange. This
strategy improves the link popularity of a website. Link popularity can be defined
as number and quality of inbound links to a website. Reciprocal linking is done by
searching for websites that share the same idea and requesting for an inbound
link in exchange for an outbound link. These websites should be rich in keywords
and phrases that are emphasized on one‟s website. Before starting a reciprocal
linking strategy one should have the following web pages in place. They are:




                                        35
    1. A webpage which contains outbound links to websites (Link directory).
        This webpage should be linked from the homepage so that it gets indexed
        by the search engine spider.

    2. A “Link to Us” page which gives cut and paste HTML code to link to
        one‟s website.

   Once these pages are in place, one should send emails to webmasters of short
listed sites expressing interest in reciprocal linking. Key points to ask in this email
are whether they would allow placing an outbound link on one‟s website and if
so, do they expect a specific format (text link, image, flash movie, etc.) and the
HTML code that should be used for an inbound link. A text link will be most
effective as an inbound link. Pay close attention to the link text in the anchor tag.

   Zeus by cyber-robotic.com is a highly effective reciprocal link building
software (Cost US$ 195). Zeus is a robot/spider which crawls the internet to find
websites that have similar themes as one‟s website. Once the list of sites is
compiled, Zeus can be used to send personalized email messages to webmasters
of these websites, track and maintain the details of each site. It dynamically
generates keyword tuned link directory pages which can be uploaded to one‟s
website.

   CPA affiliate programs provided by third parties like Commission Junction
may bring qualified leads to one‟s website. But if a website hosts its own CPA
program, it serves a dual purpose. Not only does an affiliate program bring
qualified leads, but also indirectly builds inbound links to the website.
iDevAffiliate v4.0 Gold Edition (Cost US$ 149) by idevdirect.com is a popular
software used by many merchants to host their own affiliate programs. One can
promote their affiliate program by submitting to affiliate program directories,
specifying the terms of their affiliate program.


                                          36
Search engine friendly URL

   Many websites have dynamically generated content. Content is dynamically
generated in most cases by passing parameters in the URL. The URL in case of a
dynamic webpage resembles http://www.mysite.com/index.php?pageid=70.
This URL will be indexed by a search engine. However, there is often more than
one parameter attached to the URL like sort order, navigation setting. Hence
different URLs end up pointing to the same webpage.

http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD (Hits Descending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsA (Hits Ascending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleD (Title Descending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleA (Title Ascending)


   There is no way for the search engine to justify which parameter identifies a
new page and which parameter is a setting that does not justify indexing the URL
as a new page. Hence spiders have been programmed to detect and ignore
dynamic pages. This can be resolved by making the URL search engine friendly
by replacing the database characters (#&*!%) with equivalent search engine
friendly terms or characters. The above four URLs can be made search engine
friendly as follows:

http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsA
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleD
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleA


   The webpage is indexed since the spider is fooled into believing that since the
URL does not contain a database character, it is not a dynamic webpage. This
might be an intermediate solution adopted by search engine spiders until there is
technique that will allow spiders to index dynamic web pages, since the problem



                                          37
of isolating unique pages from their clones is not resolved by generating search
engine friendly URL. This type of conversion between dynamic URL to search
engine friendly URL and vice-versa can be achieved on almost all types of servers
either by proper configuration or installing third party software. One should
communicate with their hosting service provider to know more about the
software/server configuration available to generate search engine friendly URL.
The website code may have to be modified to generate search engine friendly
URL in each anchor tag that is parsed by the API.

   Mod_rewrite module in Apache server is used to make a URL search engine
friendly. A URL request for
“http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD” may be
translated by mod_rewrite to
“http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD“
depending on the regular expression specified.

   The web programmer will have modify the script to generate URL‟s of the
type “http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD”
instead of
“http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD”
within the web pages so that all URLs on the website are search engine friendly.




                                       38
Domain name




              Figure 10: Search results for keyword - betterbody -
              Google snapshot




                                      39
   The significance of domain name in the ranking of websites in search results
cannot be overlooked. Figure 10 is a listing of the search performed in Google
for the keyword betterbody. The inbound links for the listings have been
computed using the “Who links to you?” feature in Google search for the exact
URL that came up in the search results.

   Examine Listing A4 in Figure 10. www.betterbody.de has a ranking of 4 in
search results out of 2780 for the term betterbody. This page has 18 inbound
links. This is higher than the number of inbound links for A1, A2 or A3. There is
no occurrence of the term betterbody in the title tag, meta keywords tag, meta
description tag or the content of the webpage (Refer to Figure 11 Keyword
density analysis of www.betterbody.de for the term betterbody). The only
occurrence of the term betterbody is in the domain name of the website. In fact,
the keyword and the domain name is an exact match. This webpage would not
have shown up in search results for this keyword had it not been for the domain
name since this is the only occurrence. Not only did the webpage come up in the
search results but also came up on the first page with a ranking of 4. The above
discussion illustrates the significance of domain name in SEO.

   Research should be conducted to determine popular keywords relative to the
site. Keyword research has been discussed in key terms section of this chapter.
Domain names are synonymous to brand names. Changing domain names after
the website has been launched and gained popularity is highly discouraged. URLs
can be optimized independent of the domain name by incorporating keywords in
individual web page URL like
http://www.mysite.com/betterbody/mainpage.htm
http://www.mysite.com/better-body/mainpage.htm


                                          40
Figure 11: Keyword    density   analysis   of
betterbody.de




                 41
Using more than 2 keywords in the webpage URL may be treated as search
engine spam.

404 Error page

   A 404 error page states that the page cannot be found. The spider receives this
page from the server in response to a valid URL request. This page along with all
rankings will be dropped from the search engine index. Moreover, the spider
makes no attempt to crawl the website on receiving this page. Customize the 404
error page, typically with a sitemap to ensure successful crawling of all other web
pages by the spider.

301 Redirection

   301 Redirection is a spider/visitor friendly strategy to redirect one webpage to
another for websites hosted on Apache servers. 301 Redirection is implemented
by specifying the source and destination URLs in the .htaccess file. 301
Redirection is interpreted as “moved permanently”. This is required to ensure
stability of PageRank for the site. Google interprets http://www.mysite.com and
http://mysite.com as two different URLs. As a result, Google assigns different
PageRank to same web pages depending on whether they have www in the
domain name. This causes the PageRank for mysite.com to be distributed
between http://mysite.com and http://www.mysite.com. Implementing a 301
redirect from http://mysite.com to http://www.mysite.com will ensure that all
pages will be indexed as http://www.mysite.com/myexample.htm. One should
pay close attention to ensure that all link building strategies use www in the URL
like:

    1. “Link to Us” page

    2. Search engine and directory submissions


                                        42
    3. Reciprocal linking code

    4. Absolute URLs within the site

Robots.txt meta file

   Robots.txt (Robots Exclusion Standard) is a file with specific instructions to
the spider specifying to crawl/ignore web pages. Robots.txt must be located in
the root directory of the website. The same effect can be achieved with Robots
meta tag. The difference is that Robots.txt file is a centralized location to specify
instructions which may reduce maintenance. Robots.txt file allows blocking
specific directories from being indexed. This is helpful for a website with member
access web pages. A free tool is available at www.searchengineworld.com/cgi-
bin/robotcheck.cgi to validate robots.txt file. Validating robots.txt is important as
this can cause web pages to be indexed or blocked from spiders.

Website submission

   Website submission checklist

    1. Website is completed and optimized

    2. HTML code is validated

    3. Incoming links have been established

    4. Description of the website in less than 25 words with at least 2 to 3 key
        terms

    5. Keyword list




                                         43
   6. Email address, preferably with the same domain name as the website to
       respond to submission notifications. e.g. submit@mysite.com

   There is no need to submit each and every webpage on the site. Most search
engines prefer only the top level page in submissions. Manual submission is
preferred over automated submissions. Most search engines and directories have
guidelines for proper submission. One should read these carefully before
submitting the site. Frequently submitting one‟s website to search engines is
considered spamming. This can cause the website to be penalized. Hence it is
advisable to submit the website only once to each of the search engines. After
submission, one should constantly check their submission email, since their might
be responses from search engines and directories about improper submissions
and corrections that need to be made. Also, some search engines and directories
require validation of email address for each submission.

   Important search engines and directories [17] [18] [19] that may be considered
for website submission are:
  www.google.com               www.yahoo.com                www.askjeeves.com
   www.alltheweb.com           www.aol.com                  www.hotbot.com
   www.altavista.com           www.qango.com                www.gigablast.com
   www.looksmart.com           www.lycos.com                www.msn.com
   www.netscape.com            www.about.com                www.exite.com
   www.pepesearch.com          www.iwon.com                 www.dmoz.org
   www.webcrawler.com          www.webwombat.com            www.aeiwi.com
   www.links2go.com            www.searchking.com           www.joeant.com
   www.zeal.com                www.wondir.com               www.illumirate.com
   www.jayde.com               www.vlib.org                 www.goguides.org
   dir.yahoo.com               www.business.com




                                        44
   It is advisable not to redesign the website or change the webpage content after
the site has been submitted and indexed since this can cause variations in website
rankings in search results.

   Submit the website sitemap to Google sitemaps. Submitting the sitemap to
Google sitemaps may provide the site with better crawl coverage and fresher
search results.

Visitor analysis

   Visitor   Analysis    is   an   important   part   of   website   maintenance.
www.statcounter.com (US$ 29 per month) is a paid service that maintains website
statistics. Website statistics gives in depth information about the geographical
location of visitors, search terms used to reach the website, referring websites,
popular web pages, Operating system, monitor resolution, browser information
and time spent on the website by each visitor and peak traffic hours during each
day. This information can be utilized to cater to different types of visitors with
their individual needs which would add value to the time spent by the visitor on
the site. An example would be the monitor resolution of the visitor. This
information can be used to tune up the site so that minimum scrolling is needed.
Another benefit would be to keep a check that the server never goes down during
peak traffic hours.

Frameset

   Search engines tend to dislike websites with frames.       Frames have their
inherent problems like book marking. A visitor who wants to bookmark a
specific page on a website using frames is unable to do so. Search engines view
pages using frames as different web pages even though it might visually appear as
a single page. Hence the search engine may misunderstand the content of the



                                        45
webpage even though it might make perfect sense to the visitor. Though there are
solutions to make a website using frames to display similar contents to the visitor
as well as the search engine, it is better to avoid using frames altogether.

SEO Roadmap

   Figure 12 below gives a brief overview of the search engine optimization
process categorizing different factors into development and maintenance tasks.




                      Figure 12: Search engine optimization roadmap



   Search engine optimization is a constantly evolving area. Website owners are
constantly trying to discover better techniques to improve ranking. Unfortunately,
search engine algorithms are proprietary which adds mystery to the subject. As
search engines tend to improve relevancy algorithms, so will SEO tactics change.




                                             46
                                   Chapter 4


                             SEARCH ENGINE SPAM


Search engine spam

      Manipulation of web pages to improve rakings in search engine results is
defined as search engine spam. Guidelines that are considered as search engine
abuse have been outlined by industry leading search engines. They are available
at:

      Google            www.google.com/webmasters/guidelines.html
      Yahoo! Search     help.yahoo.com/help/us/ysearch/basics/basics-18.html
      MSN Search        search.msn.com/docs/siteowner.aspx

Consequences of spamming

      Spammers are constantly reinventing techniques to outdo spam control set
forth by search engines. Nevertheless, search engines constantly upgrade their
spam policies with constant modifications to their algorithms. Since the
algorithms are proprietary, there is no definite way of knowing what a search
engine considers spam. On detection of a website as an offender/abuser, the
search engines may penalize the website or even remove the site from the index.
Once blacklisted as a spammer, the website will not be crawled by the spider.
One needs to communicate with search engine staff to get the website back into
the crawling index. This process of communication between the website owner
and the search engine staff is time consuming thus costing the owner valuable
traffic and new clients.




                                          47
Spamming techniques

   Below are the different types of spamming methods that have been used to
improve rankings.

    1. Hidden text

    2. IP Cloaking

    3. Doorway pages

    4. Pagejacking

    5. Domain duplication

    6. Excessive popup

    7. Inflating link popularity

    8. ALT stuffing

    9. Link farming

    10. FFA

    11. Mousetrapping

Hidden text

   Hidden text or keyword stuffing is the practice of overloading a webpage with
keywords and key phrases. These are invisible to the visitor but are present in the
body of the webpage. Since search engines read the HTML source code of web
pages, this text is visible to the spider. The spider is manipulated to believe that



                                        48
due to the high occurrence of keyword in the content of the web page, the web
page is highly relevant to the keyword and hence assigns a higher ranking to this
webpage. Various techniques can be employed to inflate the density of keywords.
Most prominent among these are:

    1. Hidden input tag
        <input type=hidden name=keyword1 value=‟list of keywords‟>

    2. Invisible text
        This is done by rendering the color of the font with the background color
        of the web page so that these characters are invisible to the naked eye

IP Cloaking

   IP Cloaking is the practice of creating specialized web pages with the intention
of serving search engine spiders. These web pages are invisible to normal visitors.
The pages are programmed to detect whether the URL request is coming from a
regular browser or a search engine spider and serve each request with different
page content. The end result is that the spider sees a highly optimized web page
with a heavy keyword density while the visitor is served with the regular page.

Doorway pages

   Doorway pages serve as a bridge for the spider. The doorway pages are
created for the same purpose as cloaking only that they are served to all incoming
requests. The doorway page has a meta refresh tag which will redirect the visitor
to the appropriate page or a link that the visitor has to click to reach the
destination. Doorway pages are also used to inflate link popularity.




                                        49
Pagejacking

   Pagejacking or content duplication is the practice of copying content (HTML
source code) from another site and creating duplicate copies of web pages on
one‟s site. These illegitimate web pages are indexed by spiders and show up in
search engine results. The spammer uses these pages to attract visitors. The
visitors are tricked into thinking that the illegal site is the site they are looking for.
Once on the site, the visitors may become victims of mousetrapping.

Domain duplication

   The practice of creating identical websites with the only difference that they
have different domain names is termed as domain duplication. This would enable
the websites to occupy multiple listings in the search engine results on the same
page. Since these web pages are identical, their rankings will more or less be the
same. The visitor is thus tricked into visiting the same content from search engine
results since adjoining listings point to the same content.

Excessive popups

   Yahoo specifies that they consider excessive popups as spam. This is related to
mousetrapping. Hence a website should have a maximum of 1 to 2 popup‟s per
page.

Inflating link popularity

   Internal link popularity can be inflated by creating an infinite amount of
dynamically created web pages with content of little use to point to popular web
pages within the site, thereby inflating the internal inbound links of the web
pages. This tends to increase the PageRank of the intended web pages.




                                           50
ALT stuffing

   This is a special case of keyword stuffing. Like the input tag, the ALT attribute
is almost invisible from the visitor. The visitor sees the content of the ALT
attribute only when the mouse is over the image. This attribute can be
manipulated to have a very long string of keywords which have no relevance to
the image or the webpage. This increases the keyword density of the web page.

Link farming

   Link farming is the process of artificially inflating the inbound links to the
website by organized exchange of links. The reciprocal linking program can be
abused by exchanging links with other websites which are not related to the
content or the theme of the website.

FFA

   Free for all web pages are usually pages which have hardly any content except
links to other websites. FFA is a malicious technique to inflate link popularity.

Mousetrapping

   Mousetrapping uses JavaScript handlers to open up new windows with
content that is of no interest to the visitor. The visitor is prevented from leaving
the site. Whenever the visitor tries to close the window another window opens.
Sometimes, mousetrapping is programmed to end after a finite number of new
browser windows. Otherwise, the visitor will have to close the browser program
using the Task manager, thus losing all other open windows.

   Search engine spam is directly related to the evolution of search engine
algorithms. Spammers come up with new strategies every day to adapt to
restrictions imposed by search engines. Search engines try to isolate these


                                         51
strategies and penalize websites participating in spam. It is best not to use any
spamming methods to increase popularity of one‟s website.




                                       52
                                  Chapter 5


                      SEARCH ENGINE MARKETING


Search engine marketing

   Search Engine Marketing (SEM) is a marketing strategy offered by popular
search engines which allows website owners to buy high rankings in search engine
results (inorganic search results). These listings are contextual; meaning that they
are relevant to the search query executed. Most search engines offer a CPC
model. Google Adwords has recently incorporated the CPM model into their
highly popular SEM service. Google Adwords and Yahoo! Search Marketing are
the most prominent players in the SEM industry. Recently there has been smaller
companies with less popular search engines who offer SEM service.

Cost per visitor model

   Zango.com operated by metricsdirect.com implements the CPV (Cost per
Visitor) model. This model is a hybrid between popup windows and search
engine marketing. Zango.com allows users free access to games, downloads and
entertainment in exchange for installing their software on the user‟s computer.
When a user performs a search in a search engine, the software pops up a
window with a contextual website. The number of popups are limited to 20 per
day. This is very similar to the SEM services run by popular search engine. The
difference is in the activity of the user. When a search engine displays paid search
results, the user gets to select which link to click after reading the description.
This liberty is not available in the CPV model.

Malpractices in SEM Industry

   Malicious practices prevalent in the SEM industry are:


                                        53
    1. Bid Jamming
        Bid jamming is an approach to PPC SEM campaigns, whereby
        competitors are forced to pay their maximum bid amount for each click.
        Most SEM models allow the user to bid a maximum allowable amount
        for each click, but charge 1 penny (US$ 0.01) over the bid amount paid
        by the listing underneath the said listing. This liberty is manipulated by
        users to bid an amount which is 1 penny less than their competitor‟s bid
        amount. This will cost the competitor the maximum allowable CPC.

    2. Click Fraud
        Click fraud can be termed as a technique employed by content publishers
        and competitors to exhaust the user‟s SEM funds.

How does the content publisher benefit by click fraud?

Most SEM models have an affiliate program so that they can generate more
revenue. Google Adwords is one such SEM service. Google allows website
publishers to subscribe to their Adsense service. This allows Google to publish
contextual CPC/CPM ads on the publisher‟s website. The publisher gets paid by
Google for every click made from the website. Google charges a commission for
providing the infrastructure. Unethical publishers use this opportunity to click ads
on their website to increase their revenue. Though Google has software programs
to detect this kind of behavior, it cannot be completely avoided. Assume that an
ad is published on 100 web pages, each belonging to a different website. If 50
publishers click on this ad with a CPC of US$ 0.25, the advertiser incurs a loss of
US$ 12.5. It is difficult for Google to justify whether one single click is a valid or
fraud click.

How does the competitor benefit by click fraud?



                                          54
The competitor tries to deplete the advertiser‟s funds by resorting to click fraud
and bid jamming. The competitor benefits by gaining a higher rank in the paid
listing for a lower CPC.

Google and Yahoo Search Marketing

   Google supplies paid listings to Google search, Lycos, HotBot, AOL Search,
Netscape Search, iWon, Teoma, AskJeeves and publishers who have subscribed
to Adsense program. Yahoo Search Marketing supplies paid listings to Yahoo!
Search, AllTheWeb, AltaVista, MSN Search and their affiliate publisher network.
Google Adwords and Yahoo Search Marketing span almost all popular search
engines. It is advisable for advertisers to use Google and Yahoo Search Marketing
service over smaller companies providing similar services. The reason being these
SEM services are less prone to click fraud as they have software to detect and
identify this behavior.

   All SEM services are based on the principle of maximizing the company‟s
profit. Ranking of an ad is decided by the total amount paid by the advertiser.
Hence ranking of a paid listing is directly related to (# of clicks x CPC).

   Google Adwords provide all necessary tools for an automated campaign
management. Main features included:

    1. Keyword tool similar to wordtracker.com.

    2. Targeted campaign control. One can specify that the ad should be served
        to visitors from certain geographical locations speaking specific languages

    3. Variable CPC for different keywords




                                         55
    4. Ad monitor. Google staff will monitor that every ad served follows
        acceptable guidelines

    5. Forecast tools to estimate budget

    6. Multiple ad campaigns can be setup each serving multiple ads. Each ad
        can have its independent keyword list.

    7. Reports to monitor ad campaigns

    8. Tool to select publishers where the ads should be published

More information about Google Adwords is available at
https://adwords.google.com/select/main?cmd=Login. Yahoo! Search marketing
offers sponsored search very similar to Google Adwords. More information
about Yahoo! Search marketing is available at
http://searchmarketing.yahoo.com/srch/index.php. Yahoo! Search marketing
also provides a suite of other marketing services.




                                        56
                                Chapter 6


       SUMMARY, CONCLUSIONS AND RECOMMENDATIONS


Introduction

   Search engine optimization is forever an evolving topic. As long as search
engines exist, new optimization strategies will be discovered. New spam
techniques will be used to mislead search engines. Search engine optimization
specifies techniques to achieve higher rankings in organic search engine results.
Search engine marketing helps in achieving higher rankings in inorganic search
engine results. SEO is an ongoing process with substantial long term benefits.
SEM, on the other hand, produces instant results with no long term benefits.

Summary

   Search engine marketing demands every B2C website owner to plan and
workout an effective SEO and SEM strategy. Typically, a search engine friendly
website will achieve higher rankings over a period of time. A significant
improvement may be noticed after at least 3 months. This is not promising to a
website owner in terms of return on investment. On the contrary, a good SEM
plan can bring new leads to the website instantly and provide a higher return on
investment at least until the website achieves a significant ranking in organic
search engine results. Achieving the perfect harmony between SEO and SEM will
produce optimum ROI for the website owner.

   Due to evolving nature of search technology, many articles available on the
subject may be outdated. Some articles may be misleading which might flag a
website as search engine spam. This manuscript provides a systematic approach
to the subject. Chapter 2 has highlighted the significance of improving ranking



                                       57
not only in the most popular search engine but also in other major search
engines. Chapter 3 discussed search engine optimization strategies. Following the
guidelines in this chapter will ensure a good ranking in search engine results.
These guidelines have to be followed through the inception of the website to
regular maintenance tasks. Chapter 4 focused on major search engine spamming
techniques that have been used to mislead search engines. Most of these
techniques have been identified by popular search engines as spam. Sites which
are penalized should expect to lose at least one month of search engine traffic,
though 1 month period is a highly optimistic evaluation. Chapters 1 through 5
served as a guideline for new and experienced webmasters. Experienced
webmasters can use this information to improve rankings. New webmasters may
have to recruit a search engine optimization firm to help improve rankings but
will be equipped to ask the “How and Why” of the trade. Many SEO firms
incorporate spamming strategies to boost rankings. These are short term results
and will consequently get one‟s website penalized. Beware of such SEO firms.
Last but not least, SEO results take time to show effect. Rankings tend to
increase as the website grows in content rich pages and collects relevant inbound
links. Chapter 6 served as an introduction to SEM. Search engine marketing is a
fast and effective way to gain targeted visitors. Caution must be exercised about
Bid jamming and Click fraud.

Conclusions

   No two search engine relevancy algorithms are the same. Every search engine
company tries to achieve a distinct identity by providing different results to the
same search query. Only 1.1% of the first page results of popular search engines
are identical (Chapter 2; Section – Overlap Analysis). Though Google is currently
deemed as the most popular search engine, there is no assurance that Google will
maintain its popularity in the future. Hence, it is advisable to focus on achieving



                                        58
higher rankings not only in Google but also other prominent search engines.
Additionally, 87.2% of search engine users exhibit loyalty to their favorite search
engine (Chapter 1; Section – Significance of search engines). Achieving top
rankings in popular search engines will provide exposure to unique users. These
statistics are applicable to both SEO and SEM. An effective SEO strategy will be
to focus on optimization taking into consideration the algorithms of at least a few
popular search engines. Yahoo! Search and MSN Search are considered to adapt a
content based relevancy algorithm where as Google has popularity based
approach. Figure 8 (Chapter 2; Section – Overlap analysis) points out this
difference between the relevancy algorithms. Achieving higher rankings in Yahoo!
Search and MSN Search may be governed by higher quality content. This factor
is controlled by the website owner. On the other hand, Google determines
ranking by link popularity, over which the website owner has limited control. As a
consequence, the website owner may notice an improvement in Yahoo! and
MSN rankings in a relatively short period over Google.

   Figure 4 (Chapter 2; Section – Search engine relationship chart) demonstrates
the reach of Google Adsense and Yahoo! Search Marketing. Google Adsense and
Yahoo! Search Marketing provides paid listings not only to many independent
publishers but also to almost every popular search engine. It is necessary to build
a SEM campaign devoting equal attention to both Google Adsense and Yahoo!
Search Marketing to gain search users who are loyal to almost every popular
search engine. SEM funds should be divided between both Google Adsense and
Yahoo! Search Marketing. Keyword research is of utmost importance. Identifying
new keywords and short listing existing keywords can be an empowering factor in
gaining new leads.

   There is a psychological difference between organic and inorganic search
results. The user knows that inorganic search results are based on the CPC rate


                                        59
paid by the advertiser and hence may not lead to what the user is searching. This
psychological difference may trigger the search user to give a higher priority to
organic search results, which are unbiased over inorganic search results. This
factor signifies the relative importance of SEO.

Recommendations

   Search engine algorithms are constantly updated by search engine companies
to provide better search results to users and punish spammers. This constant
evolution makes SEO a very volatile topic. The only alternative is to keep oneself
abreast with the latest information available about popular search engines. In
comparison to other research, this topic also imposes time restrictions.
Identifying optimization strategies and implementing them in the shortest
possible time to improve rankings and keeping one on top of spam policies
specified by popular search engines is equally important. Any optimization
technique that may improve rankings at present may be classified as spam in
future. The website owner should be at guard to clean up the website before the
site indexed in such a scenario. Most search engine spiders index websites every
15 to 30 days.

   www.searchenginewatch.com is an industry authority in search engine
marketing. Acquaintance with discussions and regularly published articles will
help to improve knowledge and incorporate new strategies in areas of search
engine optimization and search engine marketing.

Suggestions and Future Research

   Internet marketing is a blossoming field of study. Future research in this area
should not only focus on search marketing but also on other marketing
techniques. Email marketing is a tried and successful approach to draw a vast



                                        60
amount of targeted visitors in a very short time. Laws for email spamming are
stringent and impose heavy penalties on the spammer. Email spamming unlike
search engine spamming is governed by the Federal Trade Commission
(www.ftc.gov). Continuous research is being carried out to identify effective
forms of advertising media formats. Advertising media formats started as static
images during the infancy of internet. Presently, online advertising media formats
have achieved speech and motion capabilities. Banner advertising which was very
popular at one time is now regarded as one of the least effective means of online
advertising. This is due to a growing trend in the psychology of internet users to
unconsciously ignore banner advertisements referred to as banner blindness. For
a new website, a single email with an effective image or movie forwarded by the
website owner to acquaintances can have the same effect as a nuclear chain
reaction. Only in this case, the victims are targeted visitors who are exited by the
contents of the email and curious to know more about the website. Interesting
emails may be forwarded by recipients to their acquaintances. This strategy can
bring instant visitors to the website in vast numbers at absolutely no cost. This
model is constantly abused in chat rooms generally by owners of pornographic
and dating websites. The reason being chat rooms are not under as much strict
supervision as emails. Moreover, the abuser uses a deceptive identity which might
be psychologically favorable to prompt a random user to initiate communication
with the abuser. The abuser responds with the intended material.

   In conclusion, the basis of marketing is to develop new and effective means of
captivating the target audience. There are no boundaries to the research that can
be performed on this subject. This not only involves developing a stunning
communication tool but also understanding the thought process of the user.




                                        61
                                REFERENCES



[1] Lee Underwood, “A Brief History of Search Engines”;
www.webreference.com/authoring/search_history/

[2]GVU‟s 10th www user survey graphs, “How Users Find out About WWW
Pages”, 1998;
www.gvu.gatech.edu/user_surveys/survey-1998-10/graphs/use/q52.htm

[3]iProspect, “iProspect Search Engine User Attitudes”, May.2004;
www.iprospect.com/premiumPDFs/iProspectSurveyComplete.pdf

[4]Bruce Clay, Inc, “Search Engine Relationship Chart”, 2005;
www.bruceclay.com/searchenginerelationshipchart.htm

[5]Danny Sullivan, “comScore Media Metrix Search Engine Ratings”, Aug.2005;
www.searchenginewatch.com/reports/article.php/2156431

[6]Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual
Web Search Engine”, Proceedings of the 7th World-Wide Web Conference, 1998

[7]Zoltan Gyongyi, Hector Garcia-Molina and Jan Pederson, “Combating Web
Spam with TrustRank”, Proceedings of the 30th VLDB Conference, 2004

[8]Dogpile.com, University of Pittsburg and Pennsylvania State University,
“Different Engines, Different Results”, Aug.2005

[9]Kevin Curran, “Tips for achieving high positioning in the results pages of the
major search engines”, 2004




                                        62
[10]Insite by Lycos, “Search engine marketing guide”;
insite.lycos.com/tutorial.asp

[11]Searchenginewatch.com, “Ten tips to the top of Google”, Apr.2003
www.searchenginewatch.com/searchday/article.php/2198931

[12]Wayne Hulbert, “Keyword Density: SEO Considerations”, May.2005
www.webpronews.com/news/ebusinessnews/wpn-
4520050501KeywordDensitySEOconsiderations.html

[13]Chris Sherman, “131 (Legitimate) Link Building Strategies”, Jul.2002
www.searchenginewatch.com/searchday/article.php/2160301

[16]Alexa, “Top Sites”
www.alexa.com/site/ds/top_500

[17]Danny Sullivan, “Major Search Engines and Directories”, Apr.2004
www.searchenginewatch.com/links/article.php/2156221

[18]Danny Sullivan, “Other Global Search Engines”, Oct.2001
www.searchenginewatch.com/links/article.php/2156281

[19]Danny Sullivan, “Community-Based Search Engines”, Dec.2004
www.searchenginewatch.com/links/article.php/2156101




                                       63

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:48
posted:1/7/2012
language:English
pages:70
jianghongl jianghongl http://
About