Embed
Email

Thesis

Document Sample

Shared by: jianghongl
Categories
Tags
Stats
views:
6
posted:
1/7/2012
language:
pages:
70
SEARCH ENGINE OPTIMIZATION AND MARKETING



by



Binoy Varghese



A thesis submitted in partial fulfillment of

the requirements for the degree of





Master of Science in Computer Science





California State University, Chico





Fall 2005







Approved by ___________________________________________________

Chairperson of Supervisory Committee



__________________________________________________

__________________________________________________

__________________________________________________



Program Authorized

to Offer Degree _________________________________________________





Date __________________________________________________________

CALIFORNIA STATE UNIVERSITY, CHICO



ABSTRACT



SEARCH ENGINE OPTIMIZATION AND MARKETING



by Binoy Varghese



Chairperson of the Supervisory Committee: Professor Anne Keuneke

Department of Computer Science





Search engines have become the most important tool powering Internet

Marketing. The liberty that search engines offer to the user to find vast amounts

of information at the click of a button has made this software the connecting

point for internet surfers. The visibility of websites is directly related to their

rankings in search engine results. Higher rankings in search engine results ensure

highly targeted visitors resulting in maximum sales. Search engine optimization

and search engine marketing are powerful strategies that are constantly evolving

with new developments in search engine technology. They promise improved

visibility in search engine results.



Search engine optimization is concerned with the construction of websites, the

technical aspects of a website that search engines tend to like. This is generally a

one time process incorporated during the development phase of a website. Search

engine optimization (SEO) can improve ranking within organic search results

over a period of time.



Search engine marketing, on the other hand, is an ongoing process. Inorganic

contextual search result rankings can be purchased from leading search engine

companies. There are recurring costs involved with search engine marketing

(SEM) and no long term benefits.

Both, SEO and SEM are key strategies for any business to consumer (B2C)

website to optimize profits and gain new leads.

TABLE OF CONTENTS







List of Figures ..................................................................................................................... iii

List of Tables....................................................................................................................... iv

Chapter I: Introduction ...................................................................................................... 1

Significance of search engines ................................................................................... 1

Statement of Problem ................................................................................................. 2

Purpose of Study .......................................................................................................... 2

Research Conducted .................................................................................................... 3

Limitations of Study .................................................................................................... 4

Internet Marketing Terms .......................................................................................... 4

Internet advertising pricing models .......................................................................... 5

Online advertisement media formats ....................................................................... 9

Roadmap to forthcoming chapters ......................................................................... 11

Chapter II: Search Engines.............................................................................................. 12

Crawler based search engines .................................................................................. 12

Web directories ........................................................................................................... 13

Search engine relationship chart .............................................................................. 13

Organic and inorganic searches ............................................................................... 15

Search engine user trends ......................................................................................... 16

Page Rank and Trust Rank Algorithms ................................................................. 17

Overlap analysis .......................................................................................................... 20

Chapter III: Search Engine Optimization .................................................................... 23

Search engine optmization ....................................................................................... 23

Factors affecting SEO ............................................................................................... 23

Key terms..................................................................................................................... 25

Title tag ........................................................................................................................ 26

Meta tags ...................................................................................................................... 27

Body text ...................................................................................................................... 28

Menu bar ...................................................................................................................... 29

Keyword density analysis .......................................................................................... 30

HTML code validation.............................................................................................. 30

Absolute vs relative URL.......................................................................................... 30

Tables in HTML code ............................................................................................... 31

Sitemap ......................................................................................................................... 31

Inbound links .............................................................................................................. 33

Outbound links........................................................................................................... 35

Reciprocal linking and link building ....................................................................... 35

Search engine friendly URL ..................................................................................... 37

Domain name ............................................................................................................. 39





i

404 Error page ............................................................................................................ 42

301 Redirection........................................................................................................... 42

Robots.txt meta file.................................................................................................... 43

Website submission ................................................................................................... 43

Visitor analysis ............................................................................................................ 45

Frameset....................................................................................................................... 45

SEO Roadmap............................................................................................................ 46

Chapter IV: Search Engine Spam .................................................................................. 47

Search engine spam.................................................................................................... 47

Consequences of spamming .................................................................................... 47

Spamming techniques ............................................................................................... 48

Hidden text .................................................................................................................. 48

IP Cloaking .................................................................................................................. 49

Doorway pages ........................................................................................................... 49

Pagejacking .................................................................................................................. 50

Domain duplication ................................................................................................... 50

Excessive popups ....................................................................................................... 50

Inflating link popularity............................................................................................. 50

ALT stuffing ............................................................................................................... 51

Link farming ................................................................................................................ 51

FFA ............................................................................................................................... 51

Mousetrapping ............................................................................................................ 51

Chapter V: Search Engine Marketing ............................................................................ 53

Search engine marketing ........................................................................................... 53

Cost per visitor model ............................................................................................... 53

Malpractices in SEM Industry ................................................................................. 53

Google and Yahoo Search Marketing .................................................................... 55

Chapter VI: Summary, Conclusions and Recommendations ................................... 57

Introduction ................................................................................................................ 57

Summary ...................................................................................................................... 57

Conclusions ................................................................................................................. 58

Recommendations ..................................................................................................... 60

Suggestions and Future Research............................................................................ 60

References........................................................................................................................... 62









ii

LIST OF FIGURES







Number Page





Figure 1: Internet advertising pricing models .............................................................. 6

Figure 2: State diagram - Internet Marketing .............................................................. 7

Figure 3: Media formats ................................................................................................... 9

Figure 4: Search engine relationship chart [4] ............................................................ 14

Figure 5: Organic and Inorganic search results - Google snapshot....................... 15

Figure 6: Percent share of searches conducted by U.S. surfers in July

2005 [5] ............................................................................................................... 16

Figure 7: Percent share of searches – Trend [5] ........................................................ 17

Figure 8: Overlap analysis of imageblowout.com - Googlerankings.com

snapshot .............................................................................................................. 21

Figure 9: Relationship between search engine results and Title tag &

Description meta tag ........................................................................................ 26

Figure 10: Search results for keyword - betterbody - Google snapshot ............... 39

Figure 11: Keyword density analysis of betterbody.de............................................. 41

Figure 12: Search engine optimization roadmap ....................................................... 46









iii

LIST OF TABLES







Number Page





Table 1: Rates of various pricing models ...................................................................... 8

Table 2: In-Page advertisement formats ..................................................................... 11









iv

Chapter 1





INTRODUCTION





Significance of search engines



The significance of search engines on the internet is analogous to that of

operating systems for computers. The inception of search engine technology

began with the creation of ARCHIE [1] by Alan Emtage in 1990. Today, search

engines have evolved as the most crucial business application on the internet due

to its enormous potential to provide highly targeted consumer traffic to business

to consumer (B2C) websites. From a marketing perspective it is more affective

for an online business to be listed in the first three pages of search engine results

than any other form of online marketing.



Few observations that highlight the relevance of search engines are:



1. 84.8% of internet users find websites through search engines [2]



2. 81.7% of internet users read only the first three pages of search results [3]



3. 87.2% of internet users use their favorite search engines to launch their

queries [3]



These observations are based on surveys conducted by Graphic, Visualization

and Usability center at Georgia Institute of Technology (www.gvu.gatech.edu),

iProspect (www.iprospect.com), webSurveyor (www.websurveyor.com),

Stratagem Research (www.strategeminc.com) and Survey Sampling International.

(www.surveysampling.com). They emphasize key aspects in the behavior of

internet users with regard to search engines:







1

1. Dependability on search engines



2. Trust in search engine results



3. Loyalty to search engines



Search engines have thus become a gateway to gain targeted visitors so much

so that search engine optimization and search engine marketing have become the

focal point of internet marketing.



Statement of Problem



As more and more web pages are appended to the Internet, there is a constant

need for any B2C merchant to stand out among competitors in order to attract

more and more consumers. Since an online merchant has to face increasing

competition everyday, the most effective strategy to maximize exposure is by

achieving top rankings in search engine results. Search engines have psychological

relevance against any other channel of online advertising. The search engine user

is looking for desirable information. There is no better time to present a product

or service to a search engine user more than when one is looking for it. This

psychological factor binding search engine provides a high return on investment

[ROI]. This study attempts to identify various techniques to improve ranking of a

website in search engine results.



Purpose of Study



Search engine relevancy algorithms are proprietary. Due to the mysterious

nature of search engine relevancy algorithms, the process of achieving higher

rankings in organic search engine results cannot be determined using a

mathematical or mechanical model. Different search engines use different factors

to determine relevancy of a webpage with respect to a search query. Search







2

engine technology is in its evolving phase; hence search engine companies are

constantly classifying various techniques that improve ranking of a web site in

search engine results as spam. Though major search engine companies do specify

what they consider as spam, many of the minute technical details cannot be

ascertained by reviewing these specifications. A website is penalized if identified

as a spamming website. As a consequence, the website may be assigned a lower

ranking or even removed from the index. The purpose of this study is to identify

techniques which may improve the ranking of a website in organic search engine

results, specify spamming strategies which should be avoided and introduce

search engine marketing paying close attention to publisher and competitor

malpractices.



Research Conducted



Search engine optimization strategies are based on assumptions which are

verified using trial and error. There is no literature on the subject which can claim

that following a set model can result in a certain ranking in the results of a

specific search engine. The research work for this study was conducted by

reviewing relevant literature and applying these techniques to derive a refined

process which can be incorporated into the software development stages of a

website project.



The research conducted enabled the author to gain more exposure to the

relevancy algorithms of Google (www.google.com), Yahoo! Search

(search.yahoo.com) and MSN Search (search.msn.com) [Chapter II: Search

Engines]. A search engine optimization process has been outlined by the author

in Chapter III. This process classifies different optimization schemes into

techniques that may be applied to individual web pages and the entire website.

Search engine spamming strategies have been listed in Chapter IV. These

strategies are identified by major search engines as spam and should be restrained





3

to avoid penalty. Chapters V focuses on bid jamming and click fraud which has

become prevalent in search engine marketing industry and increases awareness

regarding these malpractices.



Limitations of Study



This study serves as a guideline to search engine optimization and search

engine marketing. The inferences derived in Chapter II, Chapter III, Chapter IV,

Chapter V and Chapter VI are based on independent research and data gathering.

Due to the proprietary nature of search engine relevancy algorithms, the process

outlined may not incorporate all possible optimization and spamming techniques.

This manuscript specifies guidelines pertaining to these techniques which are

valid as of October 2005. As search engine technology evolves, many of the

specified optimization techniques may be rendered as spam by search engine

companies. The study should be used to enhance knowledge relative to the

subject so that the reader can identify and concentrate on specific optimization

techniques.



Internet Marketing Terms



Before one considers the potential of search engines as a marketing tool, one

should become acquainted with internet marketing. Internet marketing like any

other marketing is based on the basic principle of marketing – Make a sale to the

consumer.



The key terms associated with internet marketing are:



1. Visitor: Internet users who visit a website



2. Targeted visitor: Visitors to a website who are interested in what the site

has to offer







4

3. Publisher: Any individual/organization that offers advertising space on

the site



4. Advertiser: Any individual/organization interested in buying advertising

space on websites



5. Affiliate: A website that provides internet traffic to another website in

return for a commission in sales



6. URL: Uniform resource locator is the global address of documents and

other resources located on the internet



7. CTR: Click through rate is the number of clicks divided by the total

number of impressions of the advertisement over a period of time



8. Above the fold: The portion of the webpage that is viewable in a browser

without scrolling



9. Affinity Marketing: Marketing strategies based on established buying

patterns



10. Click Tracking: The process of tracking and auditing visitors referred by

the publisher‟s website to the advertiser‟s website. This is done by setting

a cookie on the visitor‟s browser that records the publisher, the link and

the payment rates



Internet advertising pricing models



At the lowest level, internet marketing can be classified into three pricing

models. They are CPM – Cost per milli, CPC – Cost per click, and CPA – Cost









5

per acquisition. These models are behaviors exhibited by visitors on the

publisher‟s website.



1. CPM or Cost per milli is the cost per 1000 impressions of advertisement

displayed on the publisher‟s website



2. CPC or Cost per click is the cost paid by the advertiser when a visitor to

the publisher‟s website clicks on the advertisement and arrives at the

advertiser‟s website irrespective of the impressions displayed. The

psychology being that the visitor is interested in visiting the advertiser‟s

website and hence the act of clicking on the advertisement. These visitors

are targeted visitors









Figure 1: Internet advertising pricing models









6

3. CPA or Cost per acquisition is generally a commission paid by the

advertiser for behavior on the advertiser‟s website by a targeted visitor

resulting in an action desired by the advertiser. This is a direct marketing

model and takes different forms, most popular of which are:



i. CPL or Cost per lead pays a flat fee for obtaining a consumer lead,

such as signing up for a newsletter program



ii. CPS or Cost per sale pays a commission based on a transaction made

by the consumer, such as a purchase









Figure 2: State diagram - Internet Marketing









7

A comparison between the three pricing models from the advertiser‟s

viewpoint leads to the conclusion that CPM is least effective and CPL is most

effective. The reason being the occurrence of a successful transaction on the

advertiser‟s website is independent of the number of times a URL is displayed on

the publisher‟s website. For a publisher, it may seem that CPM is the highest

return on advertising space. On the contrary, CPM pays lower than both CPC

and CPL over a significant period of time.









Table 1: Rates of various pricing models







Table 1 illustrates the rates paid by various advertisers and real time statistics

of imageblowout.com. In May 2005, imageblowout.com displayed 22,522

impressions of Google Adsense content (www.google.com/adsense). With a

payout rate US$ 0.09 per 1000 impressions, the earnings for CPM would have

been (22,522 / 1000) * 0.09 = US$ 2.02. Instead, imageblowout.com made US$

7.82 in CPC earnings with a click through rate of 0.5% (112 clicks). Assuming





8

that imageblowout.com participated in allposters.com CPS program and referred

5 successful transactions out of 112 referrals each of US$ 8.99, then CPS earnings

would have been (8.99 * 5) * 0.3 = US$ 13.48. The assumed referral rate is (5 /

22,522)*100 = 0.02%. The above statistics makes a practical comparison between

CPM, CPC and CPA programs and indicates that CPA has a tendency to

generate higher profits for the publisher.



Online advertisement media formats









Figure 3: Media formats







All pricing models use the similar media formats for advertisements. The cost

paid by the advertiser depends on the following factors:



1. File size



2. Dimensions of the media







9

3. Location on the publisher‟s webpage; e.g. Above the fold



4. Nature; e.g. Static, Dynamic, Movie, Talking media



The different types of media formats used in internet marketing are discussed

below:



Window Formats



1. Pop-Ups are generally 720x300 px windows which automatically open on

top of the primary browser window. This can be annoying to the visitor if

the content is not relevant to the visitor‟s interest.



2. Pop-Unders are generally 720x300 px windows which automatically open

under the primary browser window without distracting the focus of the

internet user from the primary window.



3. Interstitials are 728x600 px web pages launched between two pages which

the visitor is navigating. This page opens within the primary browser

window, thus capturing the full attention of the user.



4. InVues (250x250 px) slide into the center of the primary browser window

after the main webpage loads completely. This is a modified version of

the pop-up window but less intrusive.



In-Page formats









10

Table 2: In-Page advertisement formats







Roadmap to forthcoming chapters



Chapter II: Search Engines provides background information related to search

engines, directories, relationship between search engines, search engine user

trends, PageRank algorithm, TrustRank algorithm and overlap analysis of popular

search engine results. Chapter III: Search engine optimization focuses on

improving web page rankings in search results by fine tuning contents of the web

page. Chapter IV: Search engine spam points out factors to avoid while

improving ranking in search engine results. Chapter V: Search engine marketing

gives a brief overview of Pay per click search engine marketing. Chapter VI:

Conclusion and Recommendation summarizes the guidelines discussed in the

study









11

Chapter 2





SEARCH ENGINES





Crawler based search engines



Search engines are software programs that provide users with URL of relevant

internet web pages relative to the keyword used to perform the search.



A crawler based search engine consists of:



1. Spider/Crawler visits a web page, stores a mirror image of all the

information gathered from the web page on visit date and time and

follows URL to other web pages within the site and other web sites. The

mirror copy is called cached page. The spider returns to all web pages

previously crawled to maintain up to date information about these pages.



2. Indexer is a catalog which consists of copies of all web pages crawled by

the spider with date time stamp. There is generally a delay between

spidering a web page and adding it to the index. The search engine results

are derived from the index and hence may not reflect the spidered web

pages until the index is updated.



3. Search software searches through all pages recorded in the index in

response to a query and returns URL of related web pages ranked in an

order determined by the search engine relevancy algorithm.



A relevant search engine result may be defined as the set of URLs displayed in

response to a user query of which the user clicks one or more URLs. Relevancy

of search engine result is relative to the user as two users querying the same







12

search engine with the exact same keyword may be searching for very different

information.



Web directories



Most search engine spiders use web directories as the seed or starting point for

their crawl. A web directory is a human compiled listing of URLs to thousands of

websites categorized into different groups. Most well known directories are

DMOZ (www.dmoz.org) and Yahoo! Directory (dir.yahoo.com).



DMOZ is the largest and most comprehensive human-edited directory on the

internet. Listing a website in DMOZ is free. It takes around 3 weeks to 6 months

for a listing to be approved. If the submission is improper there is a good chance

that the listing will be denied. Yahoo! Directory is a paid directory with an annual

recurring cost of US$ 299 for commercial sites and US$ 600 for sites with adult

content. Yahoo also provides a free listing feature but there is no guarantee

whether the listing will be accepted or rejected. Search engines view directory

listings as a vote of confidence in web sites [9]. Being listed in either of these

directories is crucial since most popular search engines spider these directories

and one can be certain that their website will be spidered by the search engine if

these directories link to their site.



Search engine relationship chart



Figure 4 illustrates the relationship between major search engines and

directories. One can conclude that:



1. DMOZ acts as the seed for Lycos (www.lycos.com), HotBot

(www.hotbot.com), AOL Search (search.aol.com), Teoma

(www.teoma.com), Google, iWon (www.iwon.com) and Netscape Search

(search.netscape.com)





13

2. Yahoo! directory acts as the seed for Yahoo! Search and AltaVista

(www.altavista.com)









Figure 4: Search engine relationship chart [4]









14

3. Google Adwords (adwords.google.com) provides paid search results to

Google Search, HotBot, AOL Search, Lycos, Ask Jeeves

(www.askjeeves.com), Teoma, iWon, Netscape Search



4. Yahoo! Search marketing (searchmarketing.yahoo.com) provides paid

search results to Yahoo! Search, AllTheWeb (www.alltheweb.com),

AltaVista and MSN Search



Organic and inorganic searches



Key terms that one may encounter in the study of search engine results are:



1. Organic search results: Non sponsored results returned by a search

engine in response to a user query. The ranking of the results is

determined by the relevancy algorithm of the search engine.









Figure 5: Organic and Inorganic search results -

Google snapshot









15

2. Inorganic search results: Results returned by a search engine where

ranking of results are determined by the cost paid by the advertiser to the

advertising network.



Search engine user trends



Figure 6 and 7 are compiled from data collected by comScore Media metrix

(www.comscore.com/metrix/) gsearch service which monitors web activities of

1.5 million English speaking internet surfers worldwide. Both figures highlight the

significance of









Figure 6: Percent share of searches conducted by

U.S. surfers in July 2005 [5]









16

Google, Yahoo! Search and MSN Search as search engines. Figure 7 – “percent

share of searches trend” clearly indicates that: Google is the most popular search

engine and its popularity has increased between Jan 2005 to Jul 2005. The

popularity of Google makes it necessary to understand their proprietary search

algorithms.









Figure 7: Percent share of searches – Trend [5]







Page Rank and Trust Rank Algorithms



Google determines rankings of its search result listings using PageRank and

TrustRank algorithms. It is important to understand these algorithms since the

higher one‟s website ranks in search engine results, higher the potential to gain

more targeted visitors.









17

PageRank [6]: The rank of a webpage in organic search results of Google is

determined by PageRank.



PR(A)=(1-d) + d[PR(T1)/C(T1) + … + PR(Tn)/ C(Tn)]



where



PR(A) is Page Rank of web page A



T1…Tn are web pages that point to page A



d is damping factor which can be set between 0 and 1. It is usually set to 0.85



C(A) are the number of links going out from web page A



PR(A) is based on the concept that a random surfer who is given a web page A

keeps clicking on links at random until he gets bored. The surfer never hits the

back button. On getting bored, the random surfer requests a random web page.

The probability that a surfer visits a page A is PR(A). The damping factor d is the

probability that at each page, the surfer gets bored and requests another random

web page. A variation that is added to the PageRank calculation is that different

damping factors may be assigned different pages T1…Tn which link to page A.



One can conclude from the PageRank equation that:



1. The more inbound links a web page has, the higher the PageRank



2. It is better to have inbound links from a web page that has high

PageRank and few out links over a webpage with high PageRank and too

many out links.









18

e.g. PR(X) = 4 and C(X) = 5 then d[PR(X)/C(X)] = 0.85d

PR(Y) = 8 and C(Y) = 100 then d[PR(Y)/C(Y)] = 0.085d



PageRank forms a probability distribution over web pages, so the sum of all web

pages‟ PageRanks will be 1. PR(A) can be calculated using an iterative algorithm,

and corresponds to principal eigenvector of the normalized link matrix of the

web [6].



PR(A1) + PR(A2) + PR(A3) + … + PR(An) = 1



PR(A) = (1-d) if web page A has no inbound links.



There are hundreds of web pages added to the World Wide Web every moment.

Since sum of PageRank of all web pages over the WWW is a constant i.e. 1, this

means that as more pages are added to the WWW, PageRank of each web page

gets constantly updated to accommodate the PageRank of new web pages‟.

Assume that, if a web page has no inbound links, (1-d)≈ 0. As inbound links

increase the PageRank of a webpage, one can conclude that outbound links

decrease the PageRank of a web page. This decrease in PageRank of a webpage

due to outbound links is called PageRank Leak.



To ensure a high PageRank it is necessary that:



1. A web page should have high number of inbound links



2. A web page should have low number of outbound links



The PageRank algorithm determines the importance of a web site by counting

the number of inbound links. This concept can be manipulated by artificially

inflating the number of inbound links to a web page. PageRank also does not

incorporate the quality of the web page in its calculations. Hence Google is





19

developing the TrustRank algorithm and has registered the trademark for

TrustRank on March 16, 2005.



TrustRank [7]: According to Gyongyi, Garcia-Molina and Pederson, the

proposed algorithms for TrustRank rely on the PageRank algorithm. This

algorithm takes into account, not only the inbound links to a web page but also

the quality of the web page. To determine the quality of a web page, a panel of

human experts will identify a set of reputable web pages that will act as the seed

for the spider. This algorithm is based on an empirical observation that: good

pages seldom point to bad ones.



One can conclude that a web page can achieve higher TrustRank if:



1. Reputable (good) web pages link to the web page



2. The web page does not link to any bad web pages



3. The web page does not mislead the search engine or employ search

engine spam



Overlap analysis



A study conducted by Dogpile.com in collaboration with the University of

Pittsburg and Pennsylvania State University in April 2005 and July 2005 reveals

that only 1.1% of 485,460 first page search results were the same across Google,

Yahoo!, MSN Search and Ask Jeeves [8]. The study of search engine results for a

given keyword over different search engines at the same time is termed as

Overlap analysis and forms the basis of Meta search engines like Dogpile.com.

Meta search engines send search queries to popular search engines and their

results are displayed together on a single page. Since Google, Yahoo! Search and

MSN Search are significant in terms of percent share of search queries answered,





20

it is important to optimize the web pages to achieve top rankings in all three

search engines.



Figure 8 is a snapshot of overlap analysis performed on Google, Yahoo!

Search and MSN Search conducted on August 25, 2005 at 18.50 EST for the

keyword “free image library” and URL pattern “imageblowout.com”.









Figure 8: Overlap analysis of imageblowout.com -

Googlerankings.com snapshot









21

1. Yahoo Search displayed “imageblowout.com” on the page# 1 of search

results



2. MSN Search displayed “imageblowout.com” on the page# 2 of search

results



3. Google displayed “imageblowout.com” on page# 55 of search results



The reason for the substantial difference in the ranking between Google

search results and Yahoo! Search and MSN Search results is due to proprietary

relevancy algorithms used by these search engines. Yahoo! Search and MSN

Search use content based relevancy algorithms. The title tag of

imageblowout.com is “Imageblowout – Free Image Library for Commercial

Use”. This is an exact match to the keyword used to perform the search query,

hence the higher rankings in Yahoo! Search and MSN Search.









22

Chapter 3





SEARCH ENGINE OPTIMIZATION





Search engine optmization



Search engine optimization can be defined as the process of fine tuning a web

page so that it achieves higher ranking in search engine results. There is a very

thin line separating search engine optimization and search engine spam. Any web

page undergoing optimization should be not be optimized to a level where by it

may qualify for search engine abuse. If a web page is detected as search engine

spam, it may be penalized or even removed from the search engine index. The

website will not show up in search results nor will it be indexed by a search

engine spider until it is added back to the search engine index.



Factors affecting SEO



Search engine optimization of a website can be broken down into two distinct

groups.



1. Web page optimization



2. Web site optimization



Both categories are inter-related. All factors within each category should be

paid equal attention to achieve proper optimization of the site.



Factors which tend to improve ranking of a webpage in search engine results

are:



1. Key Terms







23

2. Title Tag



3. Meta Tags



4. Body Text



i. Alternative Text in Img Tag



ii. H1-H6 tags



5. Menu bar



6. Keyword density analysis



7. HTML code validation



8. Absolute vs Relative URL



9. Tables in HTML code



Factors which play a role in improving the ranking of the entire website are:



1. Sitemap



2. Inbound links



3. Outbound links



4. Reciprocal linking and link building



5. Search engine friendly URL



6. Domain name







24

7. 404 Error page



8. 301 Redirection



9. Robots.txt meta file



10. Search engine submission



11. Visitor analysis



Factors that must be avoided during webpage construction:



1. Frameset



These factors are not discussed in any specific order. Each factor is

significant and plays an important role in improving the ranking of a website

in search engine results.



Key terms



Key terms are queries by search engine users to find the information that they

are looking for. Research should be conducted to identify the most and least used

search terms relevant to the web page. Once the key terms are identified, they

should be incorporated into the web page in a manner which would not

constitute abusing the search engine. Search engines are misled by artificially

inflating the density of key terms.



A higher density of key terms in a web page may lead to higher search engine

rankings. It is advisable to purchase the domain name of a website which is

identical to a key term. The domain name of a website is a primary factor used by

search engine relevancy algorithms to rank a website.









25

A good source to identify key terms is www.wordtracker.com. Wordtracker

gives suggestions based on over 300 million key terms used by users from

Metacrawler.com and Dogpile.com in past 120 days with the actual frequency of

key term and the predicted frequency.



Title tag



The title tag is a very important component used by search engine relevancy

algorithms to determine ranking. The title tag is also used by search engines in the

search result listing. The title tag should incorporate a high frequency key term.

At the same time the title should convey the overall information available on the

web page, so that the user is enticed to click it. It is advisable to have different

titles on different web pages of the same website. The title of a web page should

reflect the overall content on the web page.









Figure 9: Relationship between search engine

results and Title tag & Description meta tag









26

Meta tags



Meta tags are used to provide information about relevant keywords and the

purpose of the web page. Meta keywords tag was used as determinant in the

relevancy of a web page by early search engines. Currently very few search

engines consider the meta keyword tag in their relevancy algorithms. With

reference to Figure 9, one can see that the description provided in search results

is the information in the DESCRIPTION meta tag. This information should be

precise to entice a search engine user to click on the link.



A few Meta tags that should be paid close attention while optimizing web

pages are:



1.

Robots meta tag specifies instructions to a search engine spider whether

the owner of the website would allow/disallow the spider from indexing

other web pages that are linked from this page



2.

Keywords meta tag is used to indicate keywords relevant to the web page



3.

Description meta tag provides information regarding the intended

purpose of the web page



It is worthwhile to spend extra time to have content related keywords and

description on every page rather than specifying identical keywords and

description for the entire website.









27

Body text



Search engines tend to like unadulterated HTML code. The term adulteration

is used in context of embedded JavaScript code, Flash movies and Image files.

Search engines do not attempt to read these contents even though they might

contain significant density of keywords within them. An example is the logo

image on the website which in most case state the domain name and the caption.

The caption may represent a keyword oriented phrase much similar to the title of

the web page but since this information cannot be read by the search engines, it

cannot be accounted to determine the relevancy of the web page. Simply put,

“What a search engine cannot see does not exist on the web page”. This may or

may not be true for human visitors but is certainly a rule adhered by the search

engine spider. Flash text can only be read by FAST Alltheweb.com. None of the

other search engines can read flash text or follow flash links [9]. Similar to flash

files is JavaScript code embedded within the HTML files. Most search engines

ignore JavaScript code and links within this code [10]. Another factor to consider

is the keywords appearing in the “Above the fold” region of a webpage. The

higher the keyword density in this region, the more relevant the web page is for a

given keyword.



With these factors in mind, one can adapt the strategies outlined below to

optimize body text:



1. Include JavaScript code as a separate file. This can be done using the

following HTML tag.







2. Minimize usage of Flash movies









28

3. Always use ALT attribute in IMG tags. The HTML IMG tag is

. This is the common usage

of the tag. For optimization purposes, it might be better to use the tag as

. The objective is to bring

the keyword phrase as close to the beginning of the HTML file so that

the web page can increase the density of the keywords in the “Above the

Fold” region.



Style sheets are incorporated in almost every web page to enhance the visual

appearance of the web page. This might entice the visitor since one will find the

appearance of the web page more appealing compared to plain HTML, but for a

search engine spider, style sheets are unrelated text in the “Above the Fold”

region. The web page content can be optimized by including the style sheet file

over embedding the style sheet code using the LINK HTML tag:





Heading tags also play a very important role in the content of a web page. It is

advisable to embed keywords in bold face within H1 to H6 tags with preference

given to H1 tag over H6 tag as 1 through 6 determines the importance of the

heading. Font faces like bold, italic and underline determine relative importance

of text and should be used wherever applicable in conjunction with key phrases.

A typical webpage may have keyword rich content with at least 200-250 words of

text [11].



Menu bar



Menu bar in a webpage link to the most important pages on the site. Since

almost every page on the site contains menu, they vote for the pages linked by the

menu bar. This increases link popularity of these pages within the website. These

pages should have good targeted content and adhere to the linking guidelines





29

discussed in the Sitemap section below. This may result in higher ranking for

these pages.



Keyword density analysis



Every search engine has a different keyword density calculation. Some search

engines permit heavier keyword density on a webpage. Others like Google have

stricter allowable density levels. The placement of keywords in different locations

of the webpage has varying effects. A high density of keywords above the

permissible limit will be considered as spam by the search engine and will cause

the website to be penalized. Google allows a maximum of 2% of the webpage

text to be keywords. Yahoo and MSN Search allow a keyword density of 5% [12].



A free tool to check keyword density of a webpage is available at

www.searchengineworld.com/cgi-bin/kwda.cgi



HTML code validation



It is highly advisable to validate HTML code before submitting to search

engines. Even though the webpage may look visually correct, it may have syntax

errors which may be ignored by forgiving browsers like Internet Explorer. A free

validation service is provided by W3C. This service is available at

validator.w3.org. This software checks for W3C XHTML 1.0 compliance and

gives a detailed report. W3C cascading style sheet validation is available at

jigsaw.w3.org/css-validator/



Absolute vs relative URL



Search engine spiders prefer absolute URL over relative URL. Search engine

spiders may miss indexing some web pages when relative URLs are used.

Absolute URLs will significantly reduce the portability of the website in the event







30

of domain name change. This can be overcome by using a global variable which

will contain the domain name of the website. This variable can be used to

generate absolute URLs within web pages.



Tables in HTML code



Tables are used in webpage construction to make the layout more organized.

Some web developers may use tables within table to simplify the webpage

structure for maintenance purposes. This adds a lot of irrelevant text decreasing

the keyword density in the “Above the Fold” region of the webpage. Most web

pages have menu bar on the left hand side or the top of the web page. Having the

menu bar positioned in such a way may decrease the density of keywords in the

“Above the Fold” region.



Few alternatives to these issues are:



1. Position the menu bar on the right side of the web page and keyword

sensitive content on the the left side



2. Use CSS stylesheet to define individual tag specifics. This CSS code must

be placed in a separate file

e.g. My Text

instead of

My Text



Sitemap



A site map is a web page with links to every webpage within the web site. This

web page has high importance in the website. Once the sitemap gets spidered by









31

a search engine, one can be sure that every page on the website has been indexed.

When designing the sitemap of a website, key points to remember are:



1. The sitemap should contain HTML anchor tags



2. The link text should consist of keywords relevant to the destination

webpage. The link text may contain identical phrase as the TITLE tag of

the destination webpage. The link text is significant since it states what

the content of the destination page may be. Link text is taken into

consideration by the relevancy algorithm of search engines.



3. The sitemap should be visible to the search engine. This means that there

must be a link from the every page of the website (typically in the footer)

to the sitemap and spiders must be permitted to index the sitemap



A typical link on a sitemap may be modeled on the following example

Gallery



Avoid the following:



1. JavaScript handlers in anchor tag

Gallery



2. Flash movie for sitemap



3. Images instead of link text





4. Imagemap









32

5. Irrelevant link text

Check this out



If the sitemap has more than 100 links, split the sitemap into multiple pages. A

guide to creating sitemaps is provided by Google and is available at

http://www.google.com/webmasters/sitemaps/docs/en/about.html. It is

advisable to read these guidelines and follow them while creating the sitemap.



Inbound links



For Google, inbound links help determine the PageRank of a website.

Without any inbound links, a website is practically invisible to the search engine.

One way for the search engine spider to index a website is by following inbound

links from another indexed website. The alternative is to manually submit the

website to the search engine spider‟s crawling list. Though manual submission is

encouraged, there is never a guarantee that the website will be indexed. On the

other hand if there are inbound links from other sites, it is more predictable that

the website will be indexed.



Inbound links from the following sources help in improving ranking of a web

page [10]:



1. All major and local directories; Yahoo, DMOZ, LookSmart, trade,

business and industry related directories



2. Suppliers, happy customers, sister companies and Partners



3. Websites which provide accompanying services

e.g. Inbound links from web hosting companies for a site selling website

templates









33

4. Related websites but not competing websites

e.g. Websites that provide tutorials about web design and modification of

website templates



5. Competing websites



Not all inbound links have the same weightage. Links from authoritative

industry sources count more towards improving page ranks than links from a

small private website. Some inbound links may have a negative effect on the

PageRank. These are:



1. Links from FFA (Free for all) link pages



2. Link farms

Link farming is the process of organized exchanging of unrelated links

between websites.



3. Links from doorway pages

Doorway pages are web pages created with the intent of inflating the

inbound links of a website. These pages are created with the sole purpose

of serving search engine spiders with optimized content which may boost

the ranking of the webpage.



4. Links from discussion forms

Discussion forums can be maliciously used to inflate the inbound links to

a website. Given a good but unmoderated message board, spammers may

include messages to their spam pages as part of seemingly innocent

messages they post [7]. In a moderated message board, spammers post

valid messages with links to their websites in their signatures.









34

Most search engines penalize websites which employ malicious techniques to

inflate link popularity to the extent of removing the website from the index.



Outbound links



Outbound links may improve the ranking of a website as long as the website is

citing good websites [10]. Good websites are the ones which have been

recognized as authorities in the industry relevant to the website. Outbound links

may cause PageRank Leak as discussed in the preceding chapter. If one is

following a reciprocal linking program, PageRank Leak can be minimized by

masking the destination URL of outbound links using JavaScript code or using

the NOINDEX NOFOLLOW property in the Robots Meta Tag. This is not an

ethical practice but is followed by some websites. An ethical solution would be to

maintain outbound links to a few authoritative and related websites. Avoid linking

to websites which follow the practice of masking URL in a reciprocal linking

program as this would cause a PageLeak with no worthwhile benefit.



Reciprocal linking and link building



Reciprocal Linking is a strategy to gain inbound links from websites that share

the same idea as one‟s website and provide an outbound link in exchange. This

strategy improves the link popularity of a website. Link popularity can be defined

as number and quality of inbound links to a website. Reciprocal linking is done by

searching for websites that share the same idea and requesting for an inbound

link in exchange for an outbound link. These websites should be rich in keywords

and phrases that are emphasized on one‟s website. Before starting a reciprocal

linking strategy one should have the following web pages in place. They are:









35

1. A webpage which contains outbound links to websites (Link directory).

This webpage should be linked from the homepage so that it gets indexed

by the search engine spider.



2. A “Link to Us” page which gives cut and paste HTML code to link to

one‟s website.



Once these pages are in place, one should send emails to webmasters of short

listed sites expressing interest in reciprocal linking. Key points to ask in this email

are whether they would allow placing an outbound link on one‟s website and if

so, do they expect a specific format (text link, image, flash movie, etc.) and the

HTML code that should be used for an inbound link. A text link will be most

effective as an inbound link. Pay close attention to the link text in the anchor tag.



Zeus by cyber-robotic.com is a highly effective reciprocal link building

software (Cost US$ 195). Zeus is a robot/spider which crawls the internet to find

websites that have similar themes as one‟s website. Once the list of sites is

compiled, Zeus can be used to send personalized email messages to webmasters

of these websites, track and maintain the details of each site. It dynamically

generates keyword tuned link directory pages which can be uploaded to one‟s

website.



CPA affiliate programs provided by third parties like Commission Junction

may bring qualified leads to one‟s website. But if a website hosts its own CPA

program, it serves a dual purpose. Not only does an affiliate program bring

qualified leads, but also indirectly builds inbound links to the website.

iDevAffiliate v4.0 Gold Edition (Cost US$ 149) by idevdirect.com is a popular

software used by many merchants to host their own affiliate programs. One can

promote their affiliate program by submitting to affiliate program directories,

specifying the terms of their affiliate program.





36

Search engine friendly URL



Many websites have dynamically generated content. Content is dynamically

generated in most cases by passing parameters in the URL. The URL in case of a

dynamic webpage resembles http://www.mysite.com/index.php?pageid=70.

This URL will be indexed by a search engine. However, there is often more than

one parameter attached to the URL like sort order, navigation setting. Hence

different URLs end up pointing to the same webpage.



http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD (Hits Descending)

http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsA (Hits Ascending)

http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleD (Title Descending)

http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleA (Title Ascending)





There is no way for the search engine to justify which parameter identifies a

new page and which parameter is a setting that does not justify indexing the URL

as a new page. Hence spiders have been programmed to detect and ignore

dynamic pages. This can be resolved by making the URL search engine friendly

by replacing the database characters (#&*!%) with equivalent search engine

friendly terms or characters. The above four URLs can be made search engine

friendly as follows:



http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD

http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsA

http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleD

http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleA





The webpage is indexed since the spider is fooled into believing that since the

URL does not contain a database character, it is not a dynamic webpage. This

might be an intermediate solution adopted by search engine spiders until there is

technique that will allow spiders to index dynamic web pages, since the problem







37

of isolating unique pages from their clones is not resolved by generating search

engine friendly URL. This type of conversion between dynamic URL to search

engine friendly URL and vice-versa can be achieved on almost all types of servers

either by proper configuration or installing third party software. One should

communicate with their hosting service provider to know more about the

software/server configuration available to generate search engine friendly URL.

The website code may have to be modified to generate search engine friendly

URL in each anchor tag that is parsed by the API.



Mod_rewrite module in Apache server is used to make a URL search engine

friendly. A URL request for

“http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD” may be

translated by mod_rewrite to

“http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD“

depending on the regular expression specified.



The web programmer will have modify the script to generate URL‟s of the

type “http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD”

instead of

“http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD”

within the web pages so that all URLs on the website are search engine friendly.









38

Domain name









Figure 10: Search results for keyword - betterbody -

Google snapshot









39

The significance of domain name in the ranking of websites in search results

cannot be overlooked. Figure 10 is a listing of the search performed in Google

for the keyword betterbody. The inbound links for the listings have been

computed using the “Who links to you?” feature in Google search for the exact

URL that came up in the search results.



Examine Listing A4 in Figure 10. www.betterbody.de has a ranking of 4 in

search results out of 2780 for the term betterbody. This page has 18 inbound

links. This is higher than the number of inbound links for A1, A2 or A3. There is

no occurrence of the term betterbody in the title tag, meta keywords tag, meta

description tag or the content of the webpage (Refer to Figure 11 Keyword

density analysis of www.betterbody.de for the term betterbody). The only

occurrence of the term betterbody is in the domain name of the website. In fact,

the keyword and the domain name is an exact match. This webpage would not

have shown up in search results for this keyword had it not been for the domain

name since this is the only occurrence. Not only did the webpage come up in the

search results but also came up on the first page with a ranking of 4. The above

discussion illustrates the significance of domain name in SEO.



Research should be conducted to determine popular keywords relative to the

site. Keyword research has been discussed in key terms section of this chapter.

Domain names are synonymous to brand names. Changing domain names after

the website has been launched and gained popularity is highly discouraged. URLs

can be optimized independent of the domain name by incorporating keywords in

individual web page URL like

http://www.mysite.com/betterbody/mainpage.htm

http://www.mysite.com/better-body/mainpage.htm





40

Figure 11: Keyword density analysis of

betterbody.de









41

Using more than 2 keywords in the webpage URL may be treated as search

engine spam.



404 Error page



A 404 error page states that the page cannot be found. The spider receives this

page from the server in response to a valid URL request. This page along with all

rankings will be dropped from the search engine index. Moreover, the spider

makes no attempt to crawl the website on receiving this page. Customize the 404

error page, typically with a sitemap to ensure successful crawling of all other web

pages by the spider.



301 Redirection



301 Redirection is a spider/visitor friendly strategy to redirect one webpage to

another for websites hosted on Apache servers. 301 Redirection is implemented

by specifying the source and destination URLs in the .htaccess file. 301

Redirection is interpreted as “moved permanently”. This is required to ensure

stability of PageRank for the site. Google interprets http://www.mysite.com and

http://mysite.com as two different URLs. As a result, Google assigns different

PageRank to same web pages depending on whether they have www in the

domain name. This causes the PageRank for mysite.com to be distributed

between http://mysite.com and http://www.mysite.com. Implementing a 301

redirect from http://mysite.com to http://www.mysite.com will ensure that all

pages will be indexed as http://www.mysite.com/myexample.htm. One should

pay close attention to ensure that all link building strategies use www in the URL

like:



1. “Link to Us” page



2. Search engine and directory submissions





42

3. Reciprocal linking code



4. Absolute URLs within the site



Robots.txt meta file



Robots.txt (Robots Exclusion Standard) is a file with specific instructions to

the spider specifying to crawl/ignore web pages. Robots.txt must be located in

the root directory of the website. The same effect can be achieved with Robots

meta tag. The difference is that Robots.txt file is a centralized location to specify

instructions which may reduce maintenance. Robots.txt file allows blocking

specific directories from being indexed. This is helpful for a website with member

access web pages. A free tool is available at www.searchengineworld.com/cgi-

bin/robotcheck.cgi to validate robots.txt file. Validating robots.txt is important as

this can cause web pages to be indexed or blocked from spiders.



Website submission



Website submission checklist



1. Website is completed and optimized



2. HTML code is validated



3. Incoming links have been established



4. Description of the website in less than 25 words with at least 2 to 3 key

terms



5. Keyword list









43

6. Email address, preferably with the same domain name as the website to

respond to submission notifications. e.g. submit@mysite.com



There is no need to submit each and every webpage on the site. Most search

engines prefer only the top level page in submissions. Manual submission is

preferred over automated submissions. Most search engines and directories have

guidelines for proper submission. One should read these carefully before

submitting the site. Frequently submitting one‟s website to search engines is

considered spamming. This can cause the website to be penalized. Hence it is

advisable to submit the website only once to each of the search engines. After

submission, one should constantly check their submission email, since their might

be responses from search engines and directories about improper submissions

and corrections that need to be made. Also, some search engines and directories

require validation of email address for each submission.



Important search engines and directories [17] [18] [19] that may be considered

for website submission are:

www.google.com www.yahoo.com www.askjeeves.com

www.alltheweb.com www.aol.com www.hotbot.com

www.altavista.com www.qango.com www.gigablast.com

www.looksmart.com www.lycos.com www.msn.com

www.netscape.com www.about.com www.exite.com

www.pepesearch.com www.iwon.com www.dmoz.org

www.webcrawler.com www.webwombat.com www.aeiwi.com

www.links2go.com www.searchking.com www.joeant.com

www.zeal.com www.wondir.com www.illumirate.com

www.jayde.com www.vlib.org www.goguides.org

dir.yahoo.com www.business.com









44

It is advisable not to redesign the website or change the webpage content after

the site has been submitted and indexed since this can cause variations in website

rankings in search results.



Submit the website sitemap to Google sitemaps. Submitting the sitemap to

Google sitemaps may provide the site with better crawl coverage and fresher

search results.



Visitor analysis



Visitor Analysis is an important part of website maintenance.

www.statcounter.com (US$ 29 per month) is a paid service that maintains website

statistics. Website statistics gives in depth information about the geographical

location of visitors, search terms used to reach the website, referring websites,

popular web pages, Operating system, monitor resolution, browser information

and time spent on the website by each visitor and peak traffic hours during each

day. This information can be utilized to cater to different types of visitors with

their individual needs which would add value to the time spent by the visitor on

the site. An example would be the monitor resolution of the visitor. This

information can be used to tune up the site so that minimum scrolling is needed.

Another benefit would be to keep a check that the server never goes down during

peak traffic hours.



Frameset



Search engines tend to dislike websites with frames. Frames have their

inherent problems like book marking. A visitor who wants to bookmark a

specific page on a website using frames is unable to do so. Search engines view

pages using frames as different web pages even though it might visually appear as

a single page. Hence the search engine may misunderstand the content of the







45

webpage even though it might make perfect sense to the visitor. Though there are

solutions to make a website using frames to display similar contents to the visitor

as well as the search engine, it is better to avoid using frames altogether.



SEO Roadmap



Figure 12 below gives a brief overview of the search engine optimization

process categorizing different factors into development and maintenance tasks.









Figure 12: Search engine optimization roadmap







Search engine optimization is a constantly evolving area. Website owners are

constantly trying to discover better techniques to improve ranking. Unfortunately,

search engine algorithms are proprietary which adds mystery to the subject. As

search engines tend to improve relevancy algorithms, so will SEO tactics change.









46

Chapter 4





SEARCH ENGINE SPAM





Search engine spam



Manipulation of web pages to improve rakings in search engine results is

defined as search engine spam. Guidelines that are considered as search engine

abuse have been outlined by industry leading search engines. They are available

at:



Google www.google.com/webmasters/guidelines.html

Yahoo! Search help.yahoo.com/help/us/ysearch/basics/basics-18.html

MSN Search search.msn.com/docs/siteowner.aspx



Consequences of spamming



Spammers are constantly reinventing techniques to outdo spam control set

forth by search engines. Nevertheless, search engines constantly upgrade their

spam policies with constant modifications to their algorithms. Since the

algorithms are proprietary, there is no definite way of knowing what a search

engine considers spam. On detection of a website as an offender/abuser, the

search engines may penalize the website or even remove the site from the index.

Once blacklisted as a spammer, the website will not be crawled by the spider.

One needs to communicate with search engine staff to get the website back into

the crawling index. This process of communication between the website owner

and the search engine staff is time consuming thus costing the owner valuable

traffic and new clients.









47

Spamming techniques



Below are the different types of spamming methods that have been used to

improve rankings.



1. Hidden text



2. IP Cloaking



3. Doorway pages



4. Pagejacking



5. Domain duplication



6. Excessive popup



7. Inflating link popularity



8. ALT stuffing



9. Link farming



10. FFA



11. Mousetrapping



Hidden text



Hidden text or keyword stuffing is the practice of overloading a webpage with

keywords and key phrases. These are invisible to the visitor but are present in the

body of the webpage. Since search engines read the HTML source code of web

pages, this text is visible to the spider. The spider is manipulated to believe that







48

due to the high occurrence of keyword in the content of the web page, the web

page is highly relevant to the keyword and hence assigns a higher ranking to this

webpage. Various techniques can be employed to inflate the density of keywords.

Most prominent among these are:



1. Hidden input tag





2. Invisible text

This is done by rendering the color of the font with the background color

of the web page so that these characters are invisible to the naked eye



IP Cloaking



IP Cloaking is the practice of creating specialized web pages with the intention

of serving search engine spiders. These web pages are invisible to normal visitors.

The pages are programmed to detect whether the URL request is coming from a

regular browser or a search engine spider and serve each request with different

page content. The end result is that the spider sees a highly optimized web page

with a heavy keyword density while the visitor is served with the regular page.



Doorway pages



Doorway pages serve as a bridge for the spider. The doorway pages are

created for the same purpose as cloaking only that they are served to all incoming

requests. The doorway page has a meta refresh tag which will redirect the visitor

to the appropriate page or a link that the visitor has to click to reach the

destination. Doorway pages are also used to inflate link popularity.









49

Pagejacking



Pagejacking or content duplication is the practice of copying content (HTML

source code) from another site and creating duplicate copies of web pages on

one‟s site. These illegitimate web pages are indexed by spiders and show up in

search engine results. The spammer uses these pages to attract visitors. The

visitors are tricked into thinking that the illegal site is the site they are looking for.

Once on the site, the visitors may become victims of mousetrapping.



Domain duplication



The practice of creating identical websites with the only difference that they

have different domain names is termed as domain duplication. This would enable

the websites to occupy multiple listings in the search engine results on the same

page. Since these web pages are identical, their rankings will more or less be the

same. The visitor is thus tricked into visiting the same content from search engine

results since adjoining listings point to the same content.



Excessive popups



Yahoo specifies that they consider excessive popups as spam. This is related to

mousetrapping. Hence a website should have a maximum of 1 to 2 popup‟s per

page.



Inflating link popularity



Internal link popularity can be inflated by creating an infinite amount of

dynamically created web pages with content of little use to point to popular web

pages within the site, thereby inflating the internal inbound links of the web

pages. This tends to increase the PageRank of the intended web pages.









50

ALT stuffing



This is a special case of keyword stuffing. Like the input tag, the ALT attribute

is almost invisible from the visitor. The visitor sees the content of the ALT

attribute only when the mouse is over the image. This attribute can be

manipulated to have a very long string of keywords which have no relevance to

the image or the webpage. This increases the keyword density of the web page.



Link farming



Link farming is the process of artificially inflating the inbound links to the

website by organized exchange of links. The reciprocal linking program can be

abused by exchanging links with other websites which are not related to the

content or the theme of the website.



FFA



Free for all web pages are usually pages which have hardly any content except

links to other websites. FFA is a malicious technique to inflate link popularity.



Mousetrapping



Mousetrapping uses JavaScript handlers to open up new windows with

content that is of no interest to the visitor. The visitor is prevented from leaving

the site. Whenever the visitor tries to close the window another window opens.

Sometimes, mousetrapping is programmed to end after a finite number of new

browser windows. Otherwise, the visitor will have to close the browser program

using the Task manager, thus losing all other open windows.



Search engine spam is directly related to the evolution of search engine

algorithms. Spammers come up with new strategies every day to adapt to

restrictions imposed by search engines. Search engines try to isolate these





51

strategies and penalize websites participating in spam. It is best not to use any

spamming methods to increase popularity of one‟s website.









52

Chapter 5





SEARCH ENGINE MARKETING





Search engine marketing



Search Engine Marketing (SEM) is a marketing strategy offered by popular

search engines which allows website owners to buy high rankings in search engine

results (inorganic search results). These listings are contextual; meaning that they

are relevant to the search query executed. Most search engines offer a CPC

model. Google Adwords has recently incorporated the CPM model into their

highly popular SEM service. Google Adwords and Yahoo! Search Marketing are

the most prominent players in the SEM industry. Recently there has been smaller

companies with less popular search engines who offer SEM service.



Cost per visitor model



Zango.com operated by metricsdirect.com implements the CPV (Cost per

Visitor) model. This model is a hybrid between popup windows and search

engine marketing. Zango.com allows users free access to games, downloads and

entertainment in exchange for installing their software on the user‟s computer.

When a user performs a search in a search engine, the software pops up a

window with a contextual website. The number of popups are limited to 20 per

day. This is very similar to the SEM services run by popular search engine. The

difference is in the activity of the user. When a search engine displays paid search

results, the user gets to select which link to click after reading the description.

This liberty is not available in the CPV model.



Malpractices in SEM Industry



Malicious practices prevalent in the SEM industry are:





53

1. Bid Jamming

Bid jamming is an approach to PPC SEM campaigns, whereby

competitors are forced to pay their maximum bid amount for each click.

Most SEM models allow the user to bid a maximum allowable amount

for each click, but charge 1 penny (US$ 0.01) over the bid amount paid

by the listing underneath the said listing. This liberty is manipulated by

users to bid an amount which is 1 penny less than their competitor‟s bid

amount. This will cost the competitor the maximum allowable CPC.



2. Click Fraud

Click fraud can be termed as a technique employed by content publishers

and competitors to exhaust the user‟s SEM funds.



How does the content publisher benefit by click fraud?



Most SEM models have an affiliate program so that they can generate more

revenue. Google Adwords is one such SEM service. Google allows website

publishers to subscribe to their Adsense service. This allows Google to publish

contextual CPC/CPM ads on the publisher‟s website. The publisher gets paid by

Google for every click made from the website. Google charges a commission for

providing the infrastructure. Unethical publishers use this opportunity to click ads

on their website to increase their revenue. Though Google has software programs

to detect this kind of behavior, it cannot be completely avoided. Assume that an

ad is published on 100 web pages, each belonging to a different website. If 50

publishers click on this ad with a CPC of US$ 0.25, the advertiser incurs a loss of

US$ 12.5. It is difficult for Google to justify whether one single click is a valid or

fraud click.



How does the competitor benefit by click fraud?







54

The competitor tries to deplete the advertiser‟s funds by resorting to click fraud

and bid jamming. The competitor benefits by gaining a higher rank in the paid

listing for a lower CPC.



Google and Yahoo Search Marketing



Google supplies paid listings to Google search, Lycos, HotBot, AOL Search,

Netscape Search, iWon, Teoma, AskJeeves and publishers who have subscribed

to Adsense program. Yahoo Search Marketing supplies paid listings to Yahoo!

Search, AllTheWeb, AltaVista, MSN Search and their affiliate publisher network.

Google Adwords and Yahoo Search Marketing span almost all popular search

engines. It is advisable for advertisers to use Google and Yahoo Search Marketing

service over smaller companies providing similar services. The reason being these

SEM services are less prone to click fraud as they have software to detect and

identify this behavior.



All SEM services are based on the principle of maximizing the company‟s

profit. Ranking of an ad is decided by the total amount paid by the advertiser.

Hence ranking of a paid listing is directly related to (# of clicks x CPC).



Google Adwords provide all necessary tools for an automated campaign

management. Main features included:



1. Keyword tool similar to wordtracker.com.



2. Targeted campaign control. One can specify that the ad should be served

to visitors from certain geographical locations speaking specific languages



3. Variable CPC for different keywords









55

4. Ad monitor. Google staff will monitor that every ad served follows

acceptable guidelines



5. Forecast tools to estimate budget



6. Multiple ad campaigns can be setup each serving multiple ads. Each ad

can have its independent keyword list.



7. Reports to monitor ad campaigns



8. Tool to select publishers where the ads should be published



More information about Google Adwords is available at

https://adwords.google.com/select/main?cmd=Login. Yahoo! Search marketing

offers sponsored search very similar to Google Adwords. More information

about Yahoo! Search marketing is available at

http://searchmarketing.yahoo.com/srch/index.php. Yahoo! Search marketing

also provides a suite of other marketing services.









56

Chapter 6





SUMMARY, CONCLUSIONS AND RECOMMENDATIONS





Introduction



Search engine optimization is forever an evolving topic. As long as search

engines exist, new optimization strategies will be discovered. New spam

techniques will be used to mislead search engines. Search engine optimization

specifies techniques to achieve higher rankings in organic search engine results.

Search engine marketing helps in achieving higher rankings in inorganic search

engine results. SEO is an ongoing process with substantial long term benefits.

SEM, on the other hand, produces instant results with no long term benefits.



Summary



Search engine marketing demands every B2C website owner to plan and

workout an effective SEO and SEM strategy. Typically, a search engine friendly

website will achieve higher rankings over a period of time. A significant

improvement may be noticed after at least 3 months. This is not promising to a

website owner in terms of return on investment. On the contrary, a good SEM

plan can bring new leads to the website instantly and provide a higher return on

investment at least until the website achieves a significant ranking in organic

search engine results. Achieving the perfect harmony between SEO and SEM will

produce optimum ROI for the website owner.



Due to evolving nature of search technology, many articles available on the

subject may be outdated. Some articles may be misleading which might flag a

website as search engine spam. This manuscript provides a systematic approach

to the subject. Chapter 2 has highlighted the significance of improving ranking







57

not only in the most popular search engine but also in other major search

engines. Chapter 3 discussed search engine optimization strategies. Following the

guidelines in this chapter will ensure a good ranking in search engine results.

These guidelines have to be followed through the inception of the website to

regular maintenance tasks. Chapter 4 focused on major search engine spamming

techniques that have been used to mislead search engines. Most of these

techniques have been identified by popular search engines as spam. Sites which

are penalized should expect to lose at least one month of search engine traffic,

though 1 month period is a highly optimistic evaluation. Chapters 1 through 5

served as a guideline for new and experienced webmasters. Experienced

webmasters can use this information to improve rankings. New webmasters may

have to recruit a search engine optimization firm to help improve rankings but

will be equipped to ask the “How and Why” of the trade. Many SEO firms

incorporate spamming strategies to boost rankings. These are short term results

and will consequently get one‟s website penalized. Beware of such SEO firms.

Last but not least, SEO results take time to show effect. Rankings tend to

increase as the website grows in content rich pages and collects relevant inbound

links. Chapter 6 served as an introduction to SEM. Search engine marketing is a

fast and effective way to gain targeted visitors. Caution must be exercised about

Bid jamming and Click fraud.



Conclusions



No two search engine relevancy algorithms are the same. Every search engine

company tries to achieve a distinct identity by providing different results to the

same search query. Only 1.1% of the first page results of popular search engines

are identical (Chapter 2; Section – Overlap Analysis). Though Google is currently

deemed as the most popular search engine, there is no assurance that Google will

maintain its popularity in the future. Hence, it is advisable to focus on achieving







58

higher rankings not only in Google but also other prominent search engines.

Additionally, 87.2% of search engine users exhibit loyalty to their favorite search

engine (Chapter 1; Section – Significance of search engines). Achieving top

rankings in popular search engines will provide exposure to unique users. These

statistics are applicable to both SEO and SEM. An effective SEO strategy will be

to focus on optimization taking into consideration the algorithms of at least a few

popular search engines. Yahoo! Search and MSN Search are considered to adapt a

content based relevancy algorithm where as Google has popularity based

approach. Figure 8 (Chapter 2; Section – Overlap analysis) points out this

difference between the relevancy algorithms. Achieving higher rankings in Yahoo!

Search and MSN Search may be governed by higher quality content. This factor

is controlled by the website owner. On the other hand, Google determines

ranking by link popularity, over which the website owner has limited control. As a

consequence, the website owner may notice an improvement in Yahoo! and

MSN rankings in a relatively short period over Google.



Figure 4 (Chapter 2; Section – Search engine relationship chart) demonstrates

the reach of Google Adsense and Yahoo! Search Marketing. Google Adsense and

Yahoo! Search Marketing provides paid listings not only to many independent

publishers but also to almost every popular search engine. It is necessary to build

a SEM campaign devoting equal attention to both Google Adsense and Yahoo!

Search Marketing to gain search users who are loyal to almost every popular

search engine. SEM funds should be divided between both Google Adsense and

Yahoo! Search Marketing. Keyword research is of utmost importance. Identifying

new keywords and short listing existing keywords can be an empowering factor in

gaining new leads.



There is a psychological difference between organic and inorganic search

results. The user knows that inorganic search results are based on the CPC rate





59

paid by the advertiser and hence may not lead to what the user is searching. This

psychological difference may trigger the search user to give a higher priority to

organic search results, which are unbiased over inorganic search results. This

factor signifies the relative importance of SEO.



Recommendations



Search engine algorithms are constantly updated by search engine companies

to provide better search results to users and punish spammers. This constant

evolution makes SEO a very volatile topic. The only alternative is to keep oneself

abreast with the latest information available about popular search engines. In

comparison to other research, this topic also imposes time restrictions.

Identifying optimization strategies and implementing them in the shortest

possible time to improve rankings and keeping one on top of spam policies

specified by popular search engines is equally important. Any optimization

technique that may improve rankings at present may be classified as spam in

future. The website owner should be at guard to clean up the website before the

site indexed in such a scenario. Most search engine spiders index websites every

15 to 30 days.



www.searchenginewatch.com is an industry authority in search engine

marketing. Acquaintance with discussions and regularly published articles will

help to improve knowledge and incorporate new strategies in areas of search

engine optimization and search engine marketing.



Suggestions and Future Research



Internet marketing is a blossoming field of study. Future research in this area

should not only focus on search marketing but also on other marketing

techniques. Email marketing is a tried and successful approach to draw a vast







60

amount of targeted visitors in a very short time. Laws for email spamming are

stringent and impose heavy penalties on the spammer. Email spamming unlike

search engine spamming is governed by the Federal Trade Commission

(www.ftc.gov). Continuous research is being carried out to identify effective

forms of advertising media formats. Advertising media formats started as static

images during the infancy of internet. Presently, online advertising media formats

have achieved speech and motion capabilities. Banner advertising which was very

popular at one time is now regarded as one of the least effective means of online

advertising. This is due to a growing trend in the psychology of internet users to

unconsciously ignore banner advertisements referred to as banner blindness. For

a new website, a single email with an effective image or movie forwarded by the

website owner to acquaintances can have the same effect as a nuclear chain

reaction. Only in this case, the victims are targeted visitors who are exited by the

contents of the email and curious to know more about the website. Interesting

emails may be forwarded by recipients to their acquaintances. This strategy can

bring instant visitors to the website in vast numbers at absolutely no cost. This

model is constantly abused in chat rooms generally by owners of pornographic

and dating websites. The reason being chat rooms are not under as much strict

supervision as emails. Moreover, the abuser uses a deceptive identity which might

be psychologically favorable to prompt a random user to initiate communication

with the abuser. The abuser responds with the intended material.



In conclusion, the basis of marketing is to develop new and effective means of

captivating the target audience. There are no boundaries to the research that can

be performed on this subject. This not only involves developing a stunning

communication tool but also understanding the thought process of the user.









61

REFERENCES







[1] Lee Underwood, “A Brief History of Search Engines”;

www.webreference.com/authoring/search_history/



[2]GVU‟s 10th www user survey graphs, “How Users Find out About WWW

Pages”, 1998;

www.gvu.gatech.edu/user_surveys/survey-1998-10/graphs/use/q52.htm



[3]iProspect, “iProspect Search Engine User Attitudes”, May.2004;

www.iprospect.com/premiumPDFs/iProspectSurveyComplete.pdf



[4]Bruce Clay, Inc, “Search Engine Relationship Chart”, 2005;

www.bruceclay.com/searchenginerelationshipchart.htm



[5]Danny Sullivan, “comScore Media Metrix Search Engine Ratings”, Aug.2005;

www.searchenginewatch.com/reports/article.php/2156431



[6]Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual

Web Search Engine”, Proceedings of the 7th World-Wide Web Conference, 1998



[7]Zoltan Gyongyi, Hector Garcia-Molina and Jan Pederson, “Combating Web

Spam with TrustRank”, Proceedings of the 30th VLDB Conference, 2004



[8]Dogpile.com, University of Pittsburg and Pennsylvania State University,

“Different Engines, Different Results”, Aug.2005



[9]Kevin Curran, “Tips for achieving high positioning in the results pages of the

major search engines”, 2004









62

[10]Insite by Lycos, “Search engine marketing guide”;

insite.lycos.com/tutorial.asp



[11]Searchenginewatch.com, “Ten tips to the top of Google”, Apr.2003

www.searchenginewatch.com/searchday/article.php/2198931



[12]Wayne Hulbert, “Keyword Density: SEO Considerations”, May.2005

www.webpronews.com/news/ebusinessnews/wpn-

4520050501KeywordDensitySEOconsiderations.html



[13]Chris Sherman, “131 (Legitimate) Link Building Strategies”, Jul.2002

www.searchenginewatch.com/searchday/article.php/2160301



[16]Alexa, “Top Sites”

www.alexa.com/site/ds/top_500



[17]Danny Sullivan, “Major Search Engines and Directories”, Apr.2004

www.searchenginewatch.com/links/article.php/2156221



[18]Danny Sullivan, “Other Global Search Engines”, Oct.2001

www.searchenginewatch.com/links/article.php/2156281



[19]Danny Sullivan, “Community-Based Search Engines”, Dec.2004

www.searchenginewatch.com/links/article.php/2156101









63


Shared by: jianghongl
Other docs by jianghongl
“Well Seasoned CHEFS”
Views: 15  |  Downloads: 0
“PREZ
Views: 8  |  Downloads: 0
“GENERATION G”
Views: 8  |  Downloads: 0
“Cooking Class Venues”
Views: 15  |  Downloads: 0
“Bundle” of Joy
Views: 11  |  Downloads: 0
Related docs