Learning Center
Plans & pricing Sign in
Sign Out

Search Engine


									Search Engine

           Sanjeev Kumar Mishra
           Roll No: 21
     Components Of A Search Engine
• Spider : A robotic browser like program that
          downloads webpages.
• Crawler : A wandering spider that automatically
           follows links    found on pages.
• Indexer : A blender like program that dissects
            webpages downloaded by spiders.
• The Database : A warehouse of the pages
                downloaded and processed.
•Search Engine Results Engine : It digs search
                 results out of the database.
Types of Search Engine :

• Free Text Search Engines

• Index search engines

• Multi-search engines

• Natural Language search engines

• Site/Subject specific search engines
How Search Engine Works
    Search engines may match results to
    searches similar to the Following:
•   Title                 Simple Format Of HTML File
•   Domain/URL
•   Style: Bold, Italic   <html>
•   Density                <meta> …        </meta>
•   MetaInformation        <title> ….      </title>
•   Meta keywords and      <head> ….      </head>
    Meta descriptions.
•   Outbound Links            <body>     ….
•   Inbound Links                        ….
•   Insite Links               </body>
Google Architecture Overview
               •   URL Server
               •   Crawling The Web
               •   Store Server
               •   Repository
               •   Document Index
               •   Anchor
               •   Indexing Documents into
               •   Sorting
               •   Page Rank
               •   Searcher
Page Ranking:

Page A has pages T1,T2,…. ,Tn which point to it.
D - damping factor (generally 0.85).
C(A) - no. of links going out of page A.

PR(A)= ( 1-d ) + d (PR(T1)/C(T1) + … +
AltaVista :
• Type of search:          Keyword
• Search options:          Simple or Advanced
•   Relevance ranking: Ranks according to how
    many of our search terms a page contains.
•   Results presented as: First several lines of
•   Good points: Fast searches, capitalization and
    proper nouns recognized, largest database; finds
    things others don't.
•   Bad points: Multiple pages from the same site
    show up too frequently.
Excite :
• Type of search: Both concept and keyword
• Results returned in: Summaries; will also sort them
  by site. By clicking on an icon beside each summary, we
  will get a cross-reference of similar sites.

• User interface: Generally good, nothing exciting.

• Relevance ranking in: Confidence percentile provided
  on all searches, derivation unclear.

• Good points: Large index. Excellent summaries, which
  they admit are actually highlights

• Bad points: Does not specify the format or the size in
  megabytes of the hits it returns, nor does it tell we
  upfront exactly how many hits there are.
Infoseek :
• Type of search: Keyword
• Results returned in: First 30-100 words of
    the page.
•   User interface: Good, easy to use, clear.
•   Relevance ranking: Gives numerical scores
    based on frequency and comparison to words
    already in their database.
•   Good points: Fast, flexible, reliable searching.
    Good output, which gives the URL, the size of
    the document and the relevance score.
Lycos :
• Type of search: Keyword and Yahoo-
  like subject index.
• Search options: Basic or advanced.
• Relevance ranking: It doesn’t provide.
• Results presented as: First 100 or so
  words in simple search.
• Good points: Large database.
Webcrawler :
• Type of search: Keyword
• Search options: Simple refined.
• Relevance ranking: Frequency
• Results presented as: Lists of
  hyperlinks or summaries, as the user
• Good points: Easy to use. It belongs to
• Bad points: Very slow.
HotBot :
 • Type of search: Keyword
 • Results returned in: Relevancy score
   and URL.
 • User interface: Very good.
 • Relevance ranking: Search terms in the
   title will be ranked higher search terms in
   the text. Frequency also counts.
 • Good points: Fast because it uses
   parallel processing.
 • Bad Points: Help files are not good.
Yahoo :
• Type of search: Keyword
• Results returned in: Yahoo tells us the
    category where a hit is found, then gives us a
    two-line description of the site.
•   User interface: Excellent and easy to use.
•   Relevance ranking: It will never return more
    than 100 pages.
•   Good points: If we know what we want to
    find, Yahoo should be very fast and relevant.
•   Bad Points: Only a small portion of the Web
    has actually been catalogued by Yahoo.
      Simple Tips for More Exact Searches :
•   Searches are case insensitive.
•    To find prose but not poetry, try "+prose -poetry".
•    Try wish* to find wish, wishes, or wishful.
•   "heart" AND "attack"
•   acute OR chronic
•   c and language NOT vitamin.
•   NEAR
•   FOLLOWED BY means that one term must directly
    follow the other.
•   Phrases: " rise to the occasion.”
•   Capitalization: Bill, bill, Gates, gates, Oracle, oracle
    --the list is endless.
• link:address : Use to find all
    pages linking to Microsoft sites.
•   text:text : Finds pages that contain the
    specified text in the body of the document.
•   title:text : Finds pages that contain the
    specified word or phrase in the page title
•   url:text : Use url:altavista to find all pages on
    all servers that have the word altavista in the
    host name, path, or filename
General tips for good ranking

• Create a good site with good content.
• Pick keywords as visitors will actually use
    keyword on a search engine query.
•   Include keywords in the TITLE tag.
•   Use keywords in META Keyword and Description
•   Use the keywords throughout your page.
•   Have a good keyword density on your page.
•    Continually work on improving your link
Some Facts :

• 85 % new sites are found through search

• Search engines are free to users.

• I’m feeling lucky.
• Google Search Engine
• Search Engine :
• Robots Exclusion Protocol:

• Yahoo!
• http://
•   Engines.html
•   How does Spider Works :
•   Search Engines : INTERNET 101 -- Wendy G. Lehnert

To top