Search Engine
W
Shared by: neerajdalal126
Categories
Tags
-
Stats
- views:
- 11
- posted:
- 9/14/2012
- language:
- English
- pages:
- 23
Document Sample


Search Engine
Sanjeev Kumar Mishra
MCA-III
Roll No: 21
Components Of A Search Engine
• Spider : A robotic browser like program that
downloads webpages.
• Crawler : A wandering spider that automatically
follows links found on pages.
• Indexer : A blender like program that dissects
webpages downloaded by spiders.
• The Database : A warehouse of the pages
downloaded and processed.
•Search Engine Results Engine : It digs search
results out of the database.
Types of Search Engine :
• Free Text Search Engines
• Index search engines
• Multi-search engines
• Natural Language search engines
• Site/Subject specific search engines
How Search Engine Works
Search engines may match results to
searches similar to the Following:
• Title Simple Format Of HTML File
• Domain/URL
• Style: Bold, Italic <html>
• Density <meta> … </meta>
• MetaInformation <title> …. </title>
• Meta keywords and <head> …. </head>
Meta descriptions.
• Outbound Links <body> ….
• Inbound Links ….
• Insite Links </body>
</html>
Google Architecture Overview
• URL Server
• Crawling The Web
• Store Server
• Repository
• Document Index
• Anchor
• Indexing Documents into
Barrels
• Sorting
• Page Rank
• Searcher
Page Ranking:
Page A has pages T1,T2,…. ,Tn which point to it.
D - damping factor (generally 0.85).
C(A) - no. of links going out of page A.
Then
PR(A)= ( 1-d ) + d (PR(T1)/C(T1) + … +
PR(Tn)/C(Tn))
AltaVista :
• Type of search: Keyword
• Search options: Simple or Advanced
search.
• Relevance ranking: Ranks according to how
many of our search terms a page contains.
• Results presented as: First several lines of
document.
• Good points: Fast searches, capitalization and
proper nouns recognized, largest database; finds
things others don't.
• Bad points: Multiple pages from the same site
show up too frequently.
Excite :
• Type of search: Both concept and keyword
• Results returned in: Summaries; will also sort them
by site. By clicking on an icon beside each summary, we
will get a cross-reference of similar sites.
• User interface: Generally good, nothing exciting.
• Relevance ranking in: Confidence percentile provided
on all searches, derivation unclear.
• Good points: Large index. Excellent summaries, which
they admit are actually highlights
• Bad points: Does not specify the format or the size in
megabytes of the hits it returns, nor does it tell we
upfront exactly how many hits there are.
Infoseek :
• Type of search: Keyword
• Results returned in: First 30-100 words of
the page.
• User interface: Good, easy to use, clear.
• Relevance ranking: Gives numerical scores
based on frequency and comparison to words
already in their database.
• Good points: Fast, flexible, reliable searching.
Good output, which gives the URL, the size of
the document and the relevance score.
Lycos :
• Type of search: Keyword and Yahoo-
like subject index.
• Search options: Basic or advanced.
• Relevance ranking: It doesn’t provide.
• Results presented as: First 100 or so
words in simple search.
• Good points: Large database.
Webcrawler :
• Type of search: Keyword
• Search options: Simple refined.
• Relevance ranking: Frequency
calculated.
• Results presented as: Lists of
hyperlinks or summaries, as the user
chooses.
• Good points: Easy to use. It belongs to
AOL.
• Bad points: Very slow.
HotBot :
• Type of search: Keyword
• Results returned in: Relevancy score
and URL.
• User interface: Very good.
• Relevance ranking: Search terms in the
title will be ranked higher search terms in
the text. Frequency also counts.
• Good points: Fast because it uses
parallel processing.
• Bad Points: Help files are not good.
Yahoo :
• Type of search: Keyword
• Results returned in: Yahoo tells us the
category where a hit is found, then gives us a
two-line description of the site.
• User interface: Excellent and easy to use.
• Relevance ranking: It will never return more
than 100 pages.
• Good points: If we know what we want to
find, Yahoo should be very fast and relevant.
• Bad Points: Only a small portion of the Web
has actually been catalogued by Yahoo.
Simple Tips for More Exact Searches :
• Searches are case insensitive.
• To find prose but not poetry, try "+prose -poetry".
• Try wish* to find wish, wishes, or wishful.
• "heart" AND "attack"
• acute OR chronic
• c and language NOT vitamin.
• NEAR
• FOLLOWED BY means that one term must directly
follow the other.
• Phrases: " rise to the occasion.”
• Capitalization: Bill, bill, Gates, gates, Oracle, oracle
--the list is endless.
• link:address : Use link:microsoft.com to find all
pages linking to Microsoft sites.
• text:text : Finds pages that contain the
specified text in the body of the document.
• title:text : Finds pages that contain the
specified word or phrase in the page title
• url:text : Use url:altavista to find all pages on
all servers that have the word altavista in the
host name, path, or filename
General tips for good ranking
• Create a good site with good content.
• Pick keywords as visitors will actually use
keyword on a search engine query.
• Include keywords in the TITLE tag.
• Use keywords in META Keyword and Description
tags.
• Use the keywords throughout your page.
• Have a good keyword density on your page.
• Continually work on improving your link
popularity.
Some Facts :
• 85 % new sites are found through search
engines.
• Search engines are free to users.
• I’m feeling lucky.
REFERENCES:
• Google Search Engine http://google.stanford.edu/
• Search Engine : http://www.searchenginewatch.com/
• Robots Exclusion Protocol:
http://www.info.webcrawler.com/mak/projects/robot
s/exclusion.htm
• Yahoo! http://www.yahoo.com/
• http://
www.lib.berkeley.edu/TeachingLib/Guides/Internet/S
earch
• Engines.html
• How does Spider Works :
http://www.marketingtops.com
• Search Engines : INTERNET 101 -- Wendy G. Lehnert
THANK YOU
Get documents about "