Document Sample
players Powered By Docstoc
					The Players

The  Majors
Dead Search Engines
International Search Engines
Metasearch Engines
   Developed as BackRub by Stanford University students
    Larry Page and Sergey Brin
   Became a private company, and changed name to
    Google in 1998
   One of largest databases >8 billion (they include pages
    their robots have searched, even if their indexing
    program hasn’t fully indexed it)
   Indexes 3 billion pages every 28 days; 3 million every
   Makes money through powering over 130 portals and
    Corporate Web sites, and AdWords
 Google Spidering
 Uses its own ‘bots to spider web
 Generally ignores meta keywords and
  description tags.
   Google Indexing
   Descriptions (snippets) are formed automatically
    by extracting the most relevant portions of pages
   Finds the first instance of the search term on a
    page, then includes the words that appear
    around this term
   Only indexes first 100K or so
   Some pages don’t have a description - Google
    will include a “botted” page even if it has not
    been “indexed”
   Web - Indexed Web pages and other file types
   Ads - Paid advertisements appear on the right side or above search
    results under a "Sponsored Links" heading
   Images - 880 million+ images searched
   Groups - 845 million+ usenet messages searched
   News
   Directory - A ranked version of the Open Directory using Google's
   Froogle - Shopping and product search
   Catalog Search - Scanned, searchable retail catalogs
 Web index subsets:
 Government sites
 Military sites
 University sites
 Linux sites
 Apple/Macintosh sites
 Microsoft sites
 “Google teams with the libraries of Harvard, Stanford, the University
  of Michigan, the University of Oxford, and The New York Public
  Library to digitally scan books from their collections so that users
  worldwide can search them in Google…Users searching with
  Google will see links in their search results page when there are
  books relevant to their query. Clicking on a title delivers a Google
  Print page where users can browse the full text of public domain
  works and brief excerpts and/or bibliographic data of copyrighted
  material. Library content will be displayed in keeping with copyright
Yahoo! Search
 Originally just a subject directory
 Search engine launched Feb. 2004
 Indexes first 500 KB of a Web page
 Includes some pay for inclusion sites
 Founded in 2000 by a team of scientists
  from Rutgers University
 Teoma means "expert" in Gaelic
 Acquired by Ask Jeeves, Inc. in
  September 2001.
   More than 2 billion English-only web documents
   Spam, duplicates and pornographic results
    removed from index
   Indexes whole page; no stop words
   Considers meta-tag descriptions
   Aims to re-index every month (freshness)
   Sponsored links from Google Adwords
  Establishing authority and relevancy:
 Refine - organizes sites into naturally occurring
  communities that are about the subject of each
  search query
 Results - analyzes the relationship of sites
  within a community, ranking a site based on the
  number of same-subject pages that reference it
  (Subject-Specific Popularity)
 Resources - identifies expert resources about a
  particular subject
   Founded in 2000
   Built and operated by sole proprietor Matt Wells
   Created to index up to 200 Billion pages with the least
    amount of hardware possible
   Currently indexes 650 million
   Provides "Gigabits” to help searchers refine their search
    based upon related topics from search results
   Makes money by selling search services to private
   Newer database ~2001
   850 million pages indexed
   1.5 billion – identified not crawled/indexed
   Few advanced search features
   Spider capable of fetching more than 100 million a day
   Often months out of date
   Smart/Relevant: all words on page, text or referring links
    and words around them, significance and content of
    pages with the links
   Generates automatic semantic searches called
    WiseGuide categories
MSN Search
 New, improved
 ~4.2 billion pages search/indexed?
 Formerly used Inktomi, now has
  proprietary robots, indexer, and retrieval
Dead Search Engines
    What ever happened to…?
   Direct Hit - defunct, redirecting to Teoma
   Infoseek – defunct, redirecting to Go
   Magellan - dead, redirects to WebCrawler
   Northern Light - defunct
   Openfind - Under "reconstruction" as of 2003
   WebTop - Dead
Dead Search Engines
    The search engine formerly know as…
   AlltheWeb - uses Yahoo! database
   AltaVista - uses Yahoo! database
   Excite - uses an InfoSpace meta search
   Go - took over Infoseek, but now just uses Overture
   iWon – now uses Google "sponsored" ads, web, and image
   Looksmart - uses Wisenut search engine
   Lycos - uses Yahoo!/Inktomi database and LookSmart directory
   NBCi (formerly Snap) - uses metasearch engine Dogpile
   WebCrawler - uses an InfoSpace meta search
International Search Engines
  There are hundreds of search engines all over
  the world. We will not be investigating any of
  these very closely, but you can use the
  resources below to locate and master
  international search engines:
 All Search Engines: foreign search engines
 Search Engines Worldwide
 Search Engine Colossus
 Country-specific Search Engines
Metasearch Engines
 A search engine that queries other search
  engines and then combines the results
  that are received from all
 Allows user is not using just one search
  engine but a combination of many search
  engines at once to optimize Web
Metasearch Engines
 The difference among them:
 Engines covered (many pay-for-placement)
 # of engines that can be searched at once
 Sophistication of search query
 # of records from each search engine
 Length of time it will search each search engine
 Delete duplicates (de-duping)
Metasearch Engines
   Dogpile
   Metacrawler
   Mamma
   Kart00
   Clusty
   Surfwax
   Ixquick
   Fazzle
   InfoGrid
   Gimenei
Metasearch Engines
  Good for getting a lay of the land:
 What is out there?
 Is there anything out there?
 Who covers a topic best?
 Learning the names of new or emerging
  search engines
Metasearch Engines
  Otherwise, usually better off searching
  multiple SE’s individually:
 Syntax varies among search engines and
  metasearch engines may not allow you to
  make use of all search engines
 May not translate your query well into
  different SE’s
Metasearch Engines

    Check out some cool, value-adding
 features emerging is metasearch engines
 Clusty (using Vivisimo clustering engine):
 Clustering: uses algorithm to put search
  results together based on textual and
  linguistic similarity. Groups further refined
  using heuristics (i.e., human knowledge)
  designed to show what users wish to see
  when they examine clustered documents.
    “Vivísimo's Clustering Engine lets you see deeper and
    farther--with less effort--into a large number of search
    results to:
   Get a quick overview of the main themes that relate to
    the query.
   See similar results grouped together for faster access.
   Find results that are buried in the ranked list and would
    otherwise be missed.
   Discover unexpected results and relationships between
   Considers each listing duplicated in more than
    one SE as a “vote” for that page.
    Uses votes to rank pages per the "Condorcet
   One of the big advantages of this ranking
    method is the elimination of search engine
   Interactive Mapping display for results
   Uses proprietary algorithm to sort pages
   Relevance of results are displayed as different-sized
    When you move the pointer over these pages, the
    relevant keywords are illuminated and a brief description
    of the site appears on the left side of the screen

   Click keywords to refine the search
   Refined or further results also displayed on a map
  Targeted multi-source searching
 Searches only sources from specific domains or
  topics determined as relevant
 SurfWax can spider deeper in any site public
  site, including pages or parts that are invisible to
  traditional search engines
 Uses a site's existing search syntax to uncover
  “deeper” content
 Understands and translates, when
  possible, complex syntax
 Complete Boolean searching
 Truncation/wildcard searching
  Meta-searches SE’s, plus unique searches
  in news and other invisible web resources
 Ranks everything together
 Delivers timely resources from news
 Delivers dynamic content missing from
  other metasearch engines

Shared By: