Search Engines

Document Sample
Search Engines Powered By Docstoc
					                                    Search Engines

There are currently over a billion pages of information on the Internet about every topic imaginable. The
question is how can you possibly find what you want? Computer algorithms can be written to search the
Internet but most are not practical because they must sacrifice precision for coverage. However, a few
engines have found interesting ways of providing high quality information quickly. Page value ranking,
topic-specific searches, and Meta search engines are three of the most popular because they work
smarter not harder.

While no commercial search engine will make public their algorithm, the basic structure can be inferred
by testing the results. The reason for this is because there would be a thousand imitation sites, meaning
little or no profit for the developers. The most primitive of searches is the sequential search, which goes
through every item in the list one at a time. Yet the sheer size of the web immediately rules out this
possibility. While sequential might return the best results, you would most likely never see any results
because of the web’s inflammatory growth rate. Even the fastest computers would take a long time, and
in that time, all kinds of new pages will have been created.

Some of the older ‘spiders’ like Alta Vista are designed to literally roam randomly through the web using
links to other pages. This is accomplished with high-speed servers with 300 connections open at one
time. These web ‘spiders’ are content based which means they actually read and categorize the HTML
on every page. One flaw of this is the verbal-disagreement problem where you have a particular word
that can describe two different concepts. Type a few words in the query and you will be lucky if you can
find anything relates to what you are looking for. The query words can be anywhere in a page and they
are likely to be taken out of context.

Content-based searches can also be easily manipulates. Some tactics are very deceptive, for example
“…some automobile web sites have stooped to writing ‘Buy This Car’ dozens of times in hidden fonts…a
subliminal version of listing AAAA Autos in the Yellow Pages”(1). The truth is that one would never know
if a site was doing this unless you looked at the code and most consumers do not look at the code. A less
subtle tactic is to pay to get to the top.

Tags: Search, Engines