Docstoc

Internet and Web Search Engines

Document Sample
Internet and Web Search Engines Powered By Docstoc
					    Effective Web Searching


           T.B. Rajashekar
National Centre for Science Information
      Indian Institute of Science
         Bangalore - 560 012
    (E-Mail: raja@ncsi.iisc.ernet.in)
                  Effective Web Searching
         How we use libraries and IR systems?
         Organization of the web
         Accessing web-based information: key problems
         Tools for Information retrieval on the web
           – Directories/ guides
           – Search engines
           – Meta search tools
           – People finding tools
         Strategies for web searching
         Guides to search tools
         Keeping current


T.B. Rajashekar              November 2000                2
                  How we use Libraries and IR
                          Systems?
         Libraries:
           – How the documents are organised – document types,
             classification system used
           – Access tools – catalogues, indexes, automated catalogues,
             access points
           – Our information need (search topic) – translate these in terms of
             organization scheme employed by the library
         Information Retrieval systems (e.g. bibliographic
            databases)
              – How the database is organised, record content, fields, search
                elements
              – Indexing and query language, thesaurus, Boolean logic,
                truncation, etc.
              – Our information need – formulated as a search expression using
                the query language
T.B. Rajashekar                   November 2000                                  3
                  Organization of the Web
          Adopt same strategy while searching the Web
            – Understand web information architecture
            – Understand the information access tools and the information
              access mechanisms they provide
            – Represent our query in terms of mechanisms supported by
              these tools and search the web
          Web sites:
           – How the content is organised (document types, structuring
             and navigation)
           – Searchable/indexable and non searchable/indexable content
           – Structure of web pages
           – Meta tags, page attributes (properties)


T.B. Rajashekar                 November 2000                               4
                      Organization of the Web...
           Web is the totality of web pages stored on web servers
           Spectacular growth in web-based information sources
              and services:
                  –   Education and research
                  –   Entertainment
                  –   Business and commerce
                  –   Personal home pages
           Estimated to contain over 1 billion indexable web pages
           Doubling each year
           Over 80 million web sites




T.B. Rajashekar                      November 2000                    5
                     Accessing Web-based
                  Information: Key Problems
          Identification of sources (documents)
          No central “card catalog”
          Most web pages are not indexed in standard
           vocabulary, unlike library catalogues or journal
           article indexes
          Impossible to reach all related pages/ sites directly
          Need to use intermediate, resource finding tools




T.B. Rajashekar              November 2000                         6
                  Information Retrieval on the
                             Web
           How to find relevant documents on the Web?
            – Informal:
                 Browsing (and book marking for later use)
                 Friends
                 Print sources
                 Discussion forums (mailing lists)
                 Current awareness services (e.g. Scout Report)
                 Guessing web site addresses!
            – Formal (using information finding tools)
                 Web directories/ guides
                 Web search engines
                 Meta-search tools
                 Specialty search engines


T.B. Rajashekar                 November 2000                      7
                   Web Directories/ Guides

           Also called as „virtual libraries‟ and „Internet resource
              catalogues‟
             Organised collection of descriptions and links to
              Internet sources
             Organisation: by subject categories (hierarchical); by
              resource type (patents, e-journals, institutes, etc.)
             Most use human experts for source selection, indexing
              and classification
             Some include reviews/ ratings of listed sites



T.B. Rajashekar                 November 2000                           8
                  Web Directories/ Guides...
         Examples of general web directories:
           – Librarians‟ Index to the Internet (www.lii.org)
           – Britannica‟s “Web‟s best sites” (www.britannica.com)
           – Infomine (infomine.ucr.edu)
           – Scout Report Signpost (www.signpost.org)
           – BUBL link (bubl.ac.uk/link)
           – Yahoo (www.yahoo.com)
           – Magellan (www.mckinley.com)
           – Galaxy (www.galaxy.com)
           – Looksmart (www.looksmart.com)
           – Snap (www.snap.com)



T.B. Rajashekar                 November 2000                       9
                  Web Directories/ Guides...
           Guides to directories:
             – WWW Virtual Library (www.vlib.org)
             – Argus Clearinghouse (www.clearinghouse.net)
             – Gogettem (www.gogettem.com/)
           Subject-specific guides (subject gateways):
             – Edinburgh Engineering Virtual Library (www.eevl.ac.uk)
             – Social Science Information Gateway (sosig.ac.uk)
             – The Internet Pilot To Physics (physicsweb.org/TIPTOP)
             – Chemcenter (www.acs.com)
             – Programmers Heaven (www.programmersheaven.com)
           Resource type guides:
             – Patents (www.european-patent-office.org)
             – Electronic journals (www.publist.com)

T.B. Rajashekar                  November 2000                          10
                  Web Directories/ Guides...
          Most web directories support searching within
           categories and descriptions, in addition to browsing
          Advantages:
              – Access to high quality sources
              – Do not contain redundant links
              – Faster access to sources
          Disadvantages:
            – One needs to be aware of such directories/ guides
            – May not be up-to-date
            – May not be exhaustive
            – Categories (subject hierarchy) varies across directories


T.B. Rajashekar                   November 2000                          11
                     Web Directories/ Guides...
           When to use web directories/ guides?
                  – For broad/ general topics where keyword searching on search
                    engines retrieves too many irrelevant sites
                  – When you want a few highly relevant sites and intention is not
                    exhaustive/ comprehensive search
           When not to use web directories/ guides?
                  – For concept/ keyword searches
                  – Search terms are distinctive
           Effective directory/ guide usage:
                  – Take advantage of the sub-search within categories, supported
                    by most directories/ guides
                  – Join their mailing lists for automatic updates on new sites

T.B. Rajashekar                       November 2000                                  12
                  Web Directories/ Guides...

           Demonstration of directories/ guides:
             – Librarians‟ Index to the Internet (www.lii.org)
             – Britannica‟s “Web‟s best sites” (www.britannica.com)
             – Scout Report Signpost (www.signpost.org)
             – BUBL link (bubl.ac.uk/link)
             – Yahoo (www.yahoo.com)
             – WWW Virtual Library (www.vlib.org)
             – Argus Clearinghouse (www.clearinghouse.net)




T.B. Rajashekar                  November 2000                        13
                       Web Search Engines
           Just as A&I journals index published literature, web
              search engines build a full-text index to web pages
              gathered from web sites and provide a keyword search
              interface to this index
             Spider programs periodically visit web sites and gather
              the web pages for indexing
             Also index web sites submitted by site developers
             A brief summary of the indexed web page is also
              prepared
             The index usually contains URLs, titles, headings, and
              other words from the HTML document

T.B. Rajashekar                 November 2000                           14
                         Web Search Engines...

           The search engines provide a forms-based search
            interface for entering the queries
           Support simple and advanced search interfaces
           Search results are returned in the form of a list of web
            sites matching the query
           Some key features supported:
                  – Phrase searching (“…” double quotes)
                  – Boolean searching (AND, OR, NOT)
                  – Implied Boolean: Term inclusion (+), term exclusion (-)




T.B. Rajashekar                       November 2000                           15
                   Web Search Engines…
          Key features…
           – Proximity searches (NEAR, ADJ, BEFORE, AFTER)
           – Use of parentheses to group search terms
           – Truncation searches („industr*‟)
           – Field-specific searching (Title, URL, Text)
           – Natural language queries („Why is the sky blue?‟)
           – Relevance ranking of search results
                Number of search terms
                Number of times each search term occurs
                Proximity of search terms
                Location of search terms (title, text)




T.B. Rajashekar                November 2000                     16
                     Web Search Engines…

           Key features…
            – Sub-searching (searching within retrieved records)
            – Case sensitivity
            – Limit by language
            – Limit by age of documents
            – Limit by audio, video and image type
            – Translation of search results (title and description)
            – Limit by domain, host




T.B. Rajashekar                   November 2000                       17
                         Web Search Engines...
           Examples:
             – Fastsearch (alltheweb.com)
                  – Altavista (www.altavista.com)
                  – Google (www.google.com)
                  – Northernlight (www.northernlight.com)
                  – HotBot (www.hotbot.com)
                  – Excite (www.excite.com)
                  – Lycos (www.lycos.com)
                  – InfoSeek Guide (www.infoseek.com)
                  – WebCrawler (www.webcrawler.com)
                  – Worldwide Web Worm (www.goto.com)


T.B. Rajashekar                      November 2000          18
                     Web Search Engines...
           Specialty search engines:
             – Country-specific search engines
                 www.khoj.com
                 www.123india.com
             – Subject-specific search engines
                 Chemfinder (www.chemfinder.com)
                 Engineering Resources Online (www.er-online.co.uk)
                 MathSearch
                  (www.maths.usyd.edu.au:8000/MathSearch.html)
                 Netpart: Company site locator
                  (www.websense.com/locator.cfm)
                 World Trade Locator (www.intl-tradenet.com)
             – Resource-specific search engines:
                 Patents (www.uspto.gov)
                 Journal articles (www.findarticles.com)


T.B. Rajashekar                 November 2000                          19
                        Web Search Engines...
        Advantages of search engines:
             – Best suited for complex keyword/ concept searches
             – Control over search: search terms can be combined as required
             – Searches can be limited to period of time, fields, source type,etc.
             – Currency of information, made possible by regular addition by web
               spiders
             – Exhaustive information can be retrieved (with lots of patience!)
        Disadvantages:
             – Time consuming
             – False positives
             – Search engines vary in terms of search techniques/ syntax
             – Dead links, redundant links (same document gets displayed)
             – Spamming („salting‟ of pages)
             – Higher ranking of paying sites

T.B. Rajashekar                       November 2000                                  20
                     Web Search Engines...
          Limitations of web search engines:
            – Poor retrieval effectiveness (relevance) as little vocabulary
              control is exercised by web site developers and the index
              engines
            – Different search engines return different search results due to
              the variation in indexing and search process (40% non-
              overlap)
            – None of the search engines come close to indexing the entire
              web, much less the entire Internet. Content not indexed:
                 PDF documents
                 Content that requires log in
                 Databases searched using CGI programs
                 Web content on intranets behind fire walls


T.B. Rajashekar                   November 2000                                 21
                     Web Search Engines...

           Demonstration of search engines:
             – Fastsearch (www.alltheweb.com)
             – Altavista (www.altavista.com)
             – Google (www.google.com)
             – Northernlight (www.northernlight.com)




T.B. Rajashekar                 November 2000          22
                          Meta Search Tools
          Exhaustive searches require use of more than one web search engine
             and familiarity with their search interface
          Meta search tools provide a common interface and conduct searches
             in many search engines simultaneously and return results in a
             uniform format
          Do not gather web pages, build indexes, accept URL additions,
             classify or review web sites
          Some features supported:
              – Duplicate hits removal
              – Rank results
              – Selection of search engine(s) to be used



T.B. Rajashekar                     November 2000                               23
                     Meta Search Tools...




         Search using multiple search          Search using a meta search
                   engines                                tool



T.B. Rajashekar                November 2000                                24
                          Meta Search Tools...
           Meta search tools (remote sites):
                  – MetaCrawler (www.metacrawler.com)
                  – Ixquick (www.ixquick.com)
                  – Dogpile (www.dogpile.com)
                  – ProFusion (www.profusion.com)
           Meta search tools (local, installable software):
                  – Copernic (www.copernic.com)
                  – SearchPad (www.searchpad.com)
                  – LexiBot (www.completeplanet.com)



T.B. Rajashekar                     November 2000              25
                       Meta Search Tools...

           Advantages:
             – Query can be run across multiple search engines
             – User needs to learn only the search interface of the meta
               search tool
             – Better results: retrieves top-ranking pages from individual
               search engines
           Disadvantages:
             – Unique features of individual search engines is lost
             – Not exhaustive: use only top results returned by search
               engines



T.B. Rajashekar                   November 2000                              26
                       Meta Search Tools...

           When to use meta search tools?
            – Need to be used cautiously
            – Good for simple searches, particularly if search terms are
              distinctive or unique
            – Good for testing with a few keywords – and find which
              individual search engine returns good results
            – Good for „quick and dirty searching‟ if you are in a hurry and
              want to find a few relevant sites quickly
            – For complex searches, involving many search terms,
              Boolean logic, etc., it is better to use individual search
              engines



T.B. Rajashekar                   November 2000                                27
                          Meta Search Tools...

           Demonstration:
                  – MetaCrawler (www.metacrawler.com)
                  – Ixquick (www.ixquick.com)
                  – Dogpile (www.dogpile.com)
                  – ProFusion (www.profusion.com)




T.B. Rajashekar                     November 2000       28
                      People Finding Tools
           Register names and addresses and find e-mail addresses
           Examples:
             – Bigfoot (www.bigfoot.com)
             – Peoplesearch (www.peoplesearch.net)
             – Ahoy (ahoy.cs.washington.edu:6060/)
             – Four11 (www.four11.com)
             – Switchboard (www.switchboard.com)
             – Whowhere (www.whowhere.lycos.com/)
           Most search engines also support people searches (e.g.
              Altavista, Google, Yahoo!)



T.B. Rajashekar                 November 2000                        29
                      People Finding Tools
          Using people finding tools:
            – Person should have registered in the tool(s)
            – Searcher should know both surname and first name, else too
              many names will be retrieved
            – Bias for U.S. –based people
            – Often, required e-mail cannot be retrieved through these tools
            – Alternatively, any search engine may be used (phrase search
              using person‟s name)
            – If person‟s affiliation is known, Yahoo! Directory may be used
              to locate the institution and e-mail




T.B. Rajashekar                  November 2000                                 30
                         Web Search Strategies
           Search steps:
                  1. Analyze the search topic and identify the search terms (both
                     inclusion and exclusion), their synonyms (if any), phrases and
                     Boolean relations (if any)
                  2. Select the search tool(s) to be used (meta search engine,
                     directory, general search engine, specialty search engine)
                  3. Translate the search terms into search statements of the
                     selected search engine
                  4. Perform search
                  5. Refine the search based on results
                  6. Visit the actual site(s) and save the information (using File-
                     Save option of the browser)

T.B. Rajashekar                        November 2000                                  31
                           Web Search Strategies
           Tips for effective web searching:
                  – Broad or general concept searches: start with directory-based
                    services (want a few highly relevant sites for a broad topic)
                  – Highly specific or topics with unique terms/ many concepts:
                    use the search tools
                  – Go through the „help‟ pages of search tools carefully
                  – Gather sufficient information about the search topic before
                    searching
                         Spelling variations, synonyms, broader and narrower
                          terms
                  – Use specific keywords, rare/unusual words are better than
                    common ones
T.B. Rajashekar                        November 2000                                32
                    Web Search Strategies...
          Tips for effective web searching…
              – Prefer phrase & adjacency searching to Boolean („stuffed
                animal‟ than „stuffed‟ and „animal‟)
              – Use as many synonyms as possible - search engines use
                statistical retrieval methods and produce better results with
                more query words
              – Avoid use of very common words (e.g., „computer‟)
              – Enter search terms in lower case. Use upper case to force
                exact match (e.g. „Light Combat Aircraft‟, „LCA‟)
              – Use „More like this‟ option, if supported by the search engine
                (e.g. Excite, Google)


T.B. Rajashekar                    November 2000                                 33
                       Web Search Strategies...
           Tips for effective web searching…
                  – Repeat the search by varying search terms and their
                    combinations; try this on different search tools
                  – Enter most important terms first - some search tools are
                    sensitive to word order
                  – Use the NOT operator to exclude unwanted pages (e.g.: bio-
                    data, resumes, courses)
                  – Go through at least 5 pages of search results before giving
                    up the scan
                  – Select 2 or 3 search tools and master the search techniques



T.B. Rajashekar                       November 2000                               34
                    Sample Web Searches

           “Companies dealing with polymers”


           Do not use search engines (too many irrelevant hits)
           Use directory sources (e.g. www.yahoo.com)
             – Follow the categories:
                 Business and Economy
                 Business-to-Business
                 Chemicals
             – Do a sub-search on „Polymers‟
           Use specialty search engines (e.g. www.bizweb.com)


T.B. Rajashekar               November 2000                        35
                    Sample Web Searches...
           “Web pages related to Light Combat Aircraft”


           Keywords are unique
           Use Search Tools (e.g. www.altavista.com)
             – Search for “Light Combat Aircraft” (phrase search in simple
               search interface)
             – Use of double quotes will force the search engine to consider
               the set of keywords as a phrase
             – Search can be limited to specific dates
             – More refined search in advanced search interface: “Light
               Combat Aircraft” AND India



T.B. Rajashekar                  November 2000                                 36
                    Sample Web Searches...

           “Web sources related to simulation or modeling of
              activated sludge process”

           This is a concept search - search tools are better
           Using Altavista, the query may be submitted as
             – (simulat* OR model*) AND “activated sludge process”
             – Note use of „*‟ to cover word variations like simulated,
               simulate, models, etc.
             – Note use of phrase form for activated sludge process




T.B. Rajashekar                   November 2000                           37
                    Guides to Search Tools

           www.beaucoup.com (guide to 2,000+ search engines,
              indices and directories)
             www.searchpower.com (a very comprehensive search
              engine directory - claims over 16,000 search engine
              listings!)
             www.123go.com/drw/search/search.htm (Dr. Webster‟s
              Big Page of Search Engines )
             www.finderseeker.com (The search engine of search
              engines)
             www.virtualfreesites.com (Over 1,000 specialised
              search engines)

T.B. Rajashekar               November 2000                         38
                        Keeping Current

          AskScott (www.askscott.com): Provides a very
           comprehensive tutorial on search engines
          SearchEngineWatch (www.searchenginewatch.com)
           The site offeres information about new developments in
           search engines and provides reviews and tutorials.
          Botspot (www.botspot.com): Collection and guide to
           variety of bots (intelligent agents)




T.B. Rajashekar              November 2000                          39
                  raja@ncsi.iisc.ernet.in




T.B. Rajashekar        November 2000        40

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:39
posted:11/27/2010
language:English
pages:40