Docstoc

Searching the Internet

Document Sample
Searching the Internet Powered By Docstoc
					Searching the Internet

How Search Engines and Web
     Directories Work
Search Engines and Web
Directories
 Comparable to a digital card catalog
 Organization of information isn’t a
  precise system
 No librarians inserting new sites in
  alphabetical order
Search Engines
   Process involves    

    digital robots or
    humans scouring
    millions of web
    pages
Search Engines

   Attempt to categorize the
    information by topic along with
    software that uses a complex set of
    algorithms to try to determine the
    most relevant sites based on
    keywords input by the user
“ The current state of search engines
  can be compared to a phone book that
  is updated irregularly, is biased
  toward listing more popular
  information, and has most of the
  pages ripped out. “
Steve Lawrence, author of
  “Accessibility of Information on the
  Web”
Search Engines
          Consist of three elements
           Automatic site searcher –
            bot, spider, crawler
           Index
           Software that breaks it
            all down and presents the
            search results to the user
What are some differences
 between search engines and web
 directories?
Web directories
 Do not feature automatic data
  gathering
 Humans compile lists of URLs and
  page data based on sites that have
  been formally submitted for review
Web directories
 Information is organized in tiers to
  guide users to the desired
  information
 Ability to perform keyword searches
  on contents

https://ecom.yahoo.com/dir/express/terms
Directories
 www.yahoo.com
 www.looksmart.com
 www.excite.com
 www.dmoz.com
    Yahoo directory
   Arts & Humanities                        Recreation & Sports
    Literature, Photography...                Sports, Travel, Autos, Outdoors...

    Business & Economy                        Reference
    B2B, Finance, Shopping, Jobs...           Libraries, Dictionaries,
                                              Quotations...
    Computers & Internet
    Internet, WWW, Software, Games...         Regional
                                              Countries, Regions, US States...
    Education
    College and University, K-12...           Science
                                              Animals, Astronomy, Engineering...
    Entertainment
    Cool Links, Movies, Humor, Music...       Social Science
                                              Archaeology, Economics,
    Government                                Languages...
    Elections, Military, Law, Taxes...
                                              Society & Culture
    Health                                    People, Environment, Religion
    Medicine, Diseases, Drugs,
    Fitness...News & Media
    Full Coverage, Newspapers, TV...
Information gathering

 Bots – do the work of sifting through
  millions of web pages to collect data
 “Crawl” from server to server
  gathering URLs and other
  information, I.e. page titles, text
Link popularity
   Bots can be influenced by link
    popularity, or the number of sites
    that link to a given page
   Information is
    indexed and sorted
   Directories use
    humans
   Search engines use
    software
   Search software uses algorithms to
    find keyword matches among data
    stored in databases and to present it
    in some semblance of order
 Each engine or directory handles its
  data differently
 Some look through all the words on
  each page
 Others only search through URLs or
  titles
Relevant determination
   It’s important to users how
    information is sorted to produce
    relevant search results
Location, Location, Location
                  One of the main
                   rules in a ranking
                   algorithm
                   involves the
                   location and
                   frequency of
                   keywords on a
                   web page
<TITLE>Black &amp; White World: A Celebration of
Photography</TITLE>

   Pages with the
    search terms
    appearing in the
    HTML title tag are
    often assumed to
    be more relevant
    than others to the
    topic
   Search engines will       They assume
    also check to see if       that any page
    the search                 relevant to the
    keywords appear            topic will
    near the top of a          mention those
    web page                   words right from
                               the beginning
 A search engine will
  analyze how often
  keywords appear in
  relation to other
  words in a web page
 Those with a higher
  frequency are often
  more relevant than
  other pages
 All major search engines follow the
  location/frequency method to some
  degree, in the same way a cook may
  follow a standard recipe
 But cooks like to add their own secret
  ingredients
 Search engines add extras to how
  they use location/frequency
 Some engines look through every
  single word on every single page,
  counting the number of keyword
  occurrences
 Some give priority to sites that were
  accessed frequently in past results
 Some search engines
  index more web
  pages than others
 Some search engines
  index web pages
  more often than          No search engine
  others                    has the exact
                            same collection
                            of web pages to
                            search through
 Others, such as Google, base
 results on the number of links to
 a page
   Search engines
    may also penalize
    pages or exclude
    them from the
    index, if they
    detect search
    engine
    “spamming”
 Search engines may limit the number
  of keywords on a page
 May also program software to move
  offending sites to the bottom of the
  search results
Web games
 Some webmasters have tried to place
  keywords on a web page disguised in
  the same color as the background
 Will be invisible to viewer but
  detectable to search engine
 Others put keyword text at the
  bottom of each page
 Some webmasters go to great lengths
  to “reverse engineer” the location/
  frequency systems used by a
  particular search engine
 Because of this, many search engines
  use “off the page” ranking criteria
   One factor is link
    analysis
   Search engine can
    determine what a
    page is about and
    whether a page is
    deemed to be
    important
 Another  factor is clickthrough
  measurement
 A search engine may watch what
  results someone selects for a
  particular search, then eventually
  drop high-ranking pages that
  aren’t attracting clicks
   www.google.com
   www.northernlight.com
   www.aj.com
   www.altavista.com

What sites would be listed if we look
for the subject “black & white
photography”?
Meta tags
 Used in determining relevancy and
  ranking
 Keywords placed inside meta tags are
  not visible on web page but contain
  words relating to page’s content
     http://www.photogs.com/bwworld/




META NAME="Keywords" CONTENT="Photography,
photographic, photographer,
 darkroom, black and white, cameras, photographers,
  photo competition, photo discussion group, digital
photography, digital imaging, photographic
  paper, film, 35mm, medium format, large format, film
developer, paper developer, fixer,
 stop bath, hypo, custom photoprocessing, exposure, zone
system, archival process.">
Description tags
 Used by webmaster to dictate what a
  web page’s description will say in the
  search result list
 Some engines factor description tags
  into relevancy ranking
     http://www.photogs.com/bwworld/


<html>
<head>
<TITLE>Black &amp; White World: A Celebration of
Photography</TITLE>
<X-SAS-WINDOW TOP=41 BOTTOM=615 LEFT=73
RIGHT=790>
<META NAME="Description" CONTENT="Ezine dedicated
to Black and White Photography.
 Monthly photo competition, links to the best black and
white photography
 on the web, how-to resources, discussion forums, more.">
Search Engine Math
Boolean Operators
 AND, OR, NOT
 lets searcher reduce the number of
  results by including or excluding
  results
 Putting AND between two keywords
  tells engine to look for documents
  that contain both keywords
Boolean Operators
 OR tells engine to look for documents
  containing either word
 NOT instructs engine to find
  documents that contain first word
  but not the second
 All search engines do not support
  Boolean operators
   The more specific
    your search, the
    more likely you will
    find what you want
   Sometimes, you
    want to find pages
    that have all of the
    words you enter,
    not just some of
    them
   The “ + ” symbol
    lets you do this
   Sometimes, you
    want to find pages
    that have one word
    but not another
    word
   The “ – “ symbol
    lets you do this
 A “phrase search” will give you pages
  where the terms appear in exactly
  the order you specify
 You do this by putting quotation
  marks around the phrase you want



“black & white photography”
Meta-search engines
 List results from several different
  search engines
 www.dogpile.com

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:8/25/2012
language:English
pages:44