invisible by liuhongmei

VIEWS: 21 PAGES: 20

									The Invisible Web
      ‫מבוסס על‬
  Gary Price, MLIS
 George Washington University

    Chris Sherman
       Associate Editor
     Search Engine Watch


           2001
How Search Engines Work

Crawler
                    URL1
                           URL2


Indexer              The Web

                   URL3    URL4



 Search                                    All - 90%
                                          EggsAbout
 Engine                                       Eggs
                                             Your
                                          Eggo - 81%
Database   Eggs?                               by
                                            Browser
                                           Ego- 40%
                                  Eggs.     S. I. 10%
                                          Huh? - Am
  What is the Invisible Web?
• “Stuff” that search engine crawlers
  (spiders) can not -- or will not --
  add to their databases
• 2 to 50 times larger than the
  visible Web
• Resources often much higher
  quality than the visible Web
  What is the Invisible Web?
• Certain file formats (PDF, Flash,
  Office files, streaming media)
  – Why? They aren’t HTML text
• Most real-time data (stock quotes,
  weather, airline flight info)
  – Why? Ephemeral & storage intensive
  What is the Invisible Web?
• Dynamically generated pages
  (cgi, javascript, asp, or most
  pages with “?” in URL)
  – Why? Spider traps
• Web accessible databases
  – Why? Spiders can’t type
       Hidden Web sites
• Opaque Web – material that can be,
  but is not included in search engine
  results. Ex: new material added and not
  yet picked up.
• Private Web – sites intentionally
  excluded from search engine results.
  Ex: password protected
• Proprietary Web – sites that require
  user registration. Ex: eBay, New York
  Times
     Invisible Web Gateways
• Complete Planet
  – http://www.completeplanet.com/
• Librarians’ Index to the Internet
  – http://www.lii.org
• Digital Librarian
  – http://www.digital-librarian.com/
• Direct search (‫)לא מעודכן ולא מסודר‬
  – http://www.freepint.com/gary/direct.htm
           The Invisible Web
            & The Librarian
         The Need For Knowledge!
• Awareness that the IW Exists
  Maybe the IW Hold the Content Your Users Can’t
  Find! What is the cost in both wasted time/effort and
  total frustration?
• Let Others Know About the IW
• Awareness of The Synonyms
   – Invisible Web
   – Deep Web
   – Hidden Web
          The Invisible Web
           & The Librarian
Why is the IW Useful to the Librarian
          and the End User?
• Quality of Content (Authority)
• Deep Content on Subject Area (Comprehensiveness)
• Focused Databases (Limited Scope)
  Smaller Universe of Documents to Search (Maximize
  Precision/Recall)
           The Invisible Web
            & The Librarian
         Why is the IW Useful to the
         Librarian & the End User?
• Material Unavailable Elsewhere on the Web
  (Uniqueness)
• Many Options to Limit, Sort, Interact with the Data
  (Maximize Precision)
• Timeliness vs. Time Lag of General Search Tools
  (Currency)
            The Invisible Web
             & The Librarian
                    Challenges

•   It’s Not The Magic Bullet. It’s a Tool
•   We Still Need Traditional Online Databases
•   Learning Curve, Sorry!
•   Database Selection, When To Use the IW?
•   Numerous Interfaces, Syntax
•   A Non-Stop Flow of New Material
       The Invisible Web
        & The Librarian

  Types of IW Content in Librarian Terms


• Bibliographic    • Non-Bibliographic
  - OPAC’s           - Full-Text
  - Subject Bibs     -   Numeric
                     -   Graphic
                     -   Directory
                     -   Real-Time
               Databases
• Information stored in tables (Access, Oracle,
  SQL Server, DB2) and accessible only by
  query.
• Examples:
   – Phone books, People finders
   – Patents, laws
   – Items for sale in a Web store or Web-based
     auctions
   – Digital exhibits
   – Multimedia and graphical files
   – Stock and bond prices
         Invisible Web:
      Scholarly information
• Citeseer (computer science)
  – http://citeseer.ist.psu.edu/
• Google Scholar
  – http://scholar.google.com/
• Infomine: Scholarly Internet Resource
   – http://infomine.ucr.edu
• Scirus
  – http://www.scirus.com
         Invisible Web:
      Intellectual Property
• USPTO search
  http://www.uspto.gov/patft/index.
  html
• ESP@CENET (European Patent
  Office) Patent Database
  – http://ep.espacenet.com/
          Invisible Web:
           Art & Artists
• ADAM (Art, Design, Architecture &
  Media Information Gateway)
  – http://www.intute.ac.uk/artsandhum
    anities/
• Artcyclopedia
  – http://www.artcyclopedia.com/
         Invisible Web:
     Real-Time Information
• Flight Tracker
  – http://flightaware.com/
• Stock prices –
  – http://money.cnn.com/data/markets/
  – http://www.tase.co.il
• Weather http://www.weather.com
• Currency exchange rates
  – http://www.oanda.com
       Invisible Web:
 Maps and Driving Directions
• www.mapquest.com
• Google maps
• http://map.search.ch/
         Invisible Web:
        Health & Medicine
• Medline Plus – Medical encyclopedia
  – http://www.nlm.nih.gov/medlineplus/encycl
    opedia.html
• WebMD
  – http://www.webmd.com
• Economics of Tobacco Control Database
  – http://www1.worldbank.org/tobacco/databa
    se.asp
• WHO
  – http://www.who.int
       Invisible Web:
    News & Current Events
• Google news
• RSS feeds (Web 2.0)

								
To top