Docstoc

The Invisible Web (PDF)

Document Sample
The Invisible Web (PDF) Powered By Docstoc
					The Invisible Web

       Price
  Gary Price, MLIS
 George Washington University

    Chris Sherman
    Ch i Sh
       Associate Editor
     Search Engine Watch
How Search Engines Work

Crawler
                    URL1
                           URL2


Indexer
I d                  The Web

                   URL3    URL4



 Search                                    All - 90%
                                          EggsAbout
 Engine                                       Eggs
                                             Your
                                          Eggo - 81%
Database   Eggs?                               by
                                            Browser
                                           Ego- 40%
                                  Eggs
                                  Eggs.     S. I. 10%
                                          Huh? - Am
  What is the Invisible Web?
• “Stuff” that search engine crawlers
  (spiders) can not -- or will not --
  add to their databases
• 2 to 50 times larger than the
  visible Web
                            g
• Resources often much higher
  quality than the visible Web
  What is the Invisible Web?
• Certain file formats (PDF, Flash,
  Office files, streaming media)
              aren’t
  – Why? They aren t HTML text
• Most real-time data (stock quotes,
  weather,
  weather airline flight info)
  – Why? Ephemeral & storage intensive
  What is the Invisible Web?
• Dynamically generated pages
  (cgi, javascript, asp, or most
  pages with “?” in URL)
  p g                    )
  – Why? Spider traps
• Web accessible databases
  – Why? Spiders can’t type
    Invisible Web Gateways
• Intelliseek
  – http://www.invisibleweb.com
  – http://beta.profusion.com
• Complete Planet
  – http://www.completeplanet.com/
• Librarians’ Index to the Internet
  – http://www.lii.org
           The Invisible Web
            & The Librarian
         The Need For Knowledge!
• Awareness that the IW Exists
  Maybe the IW Hold the Content Your Users Can’t
                                            /
  Find! What is the cost in both wasted time/effort and
  total frustration?
• Let Others Know About the IW
• Awareness of The Synonyms
   – Invisible Web
   – Deep Web
   – Hidden Web
  Let the Content be Your C lli Card
• L t th C t t b Y        Calling C d
  Focus Less on the Amount IW Data
          The Invisible Web
           & The Librarian
Why is the IW Useful to the Librarian
          and the End User?
  Q     y            (        y)
• Quality of Content (Authority)
• Deep Content on Subject Area (Comprehensiveness)
• Focused Databases (Limited Scope)
                       (         p )
  Smaller Universe of Documents to Search (Maximize
  Precision/Recall)
           The Invisible Web
            & The Librarian
         Why is the IW Useful to the
         Librarian & the End User?
• Material Unavailable Elsewhere on the Web
  (Uniqueness)
• Many Options to Limit, Sort, Interact with the Data
  (
  (Maximize Precision))
• Timeliness vs. Time Lag of General Search Tools
  (Currency)
  (C        )
           The Invisible Web
            & The Librarian
  The IW, The Librarian, The Future
• What Happens If/When the General Search Tools
  Crawl IW Material? Good News? Bad News?
• General Search Tools May NOT:
  Offer Many Interactive/Limiting Tools
  May Not be Updated/Refreshed (time lag) as
  Frequently
  Timeliness, making current info available is one of
  the things the NET does well.
          The Invisible Web
           & The Librarian
  The IW, The Librarian, The Future
• The Search Engine Business, Will IW Material be a
  Priority?
• Just One Dialog or SilverPlatter Database?
  NO, in Terms of Content!!!
  Yes, C
• Y             I    f
        Common Interface, SSyntax
  Perhaps XML will Assist
            The Invisible Web
             & The Librarian
                           g
                    Challenges

•   It s                       It s
    It’s Not The Magic Bullet. It’s a Tool
•   We Still Need Traditional Online Databases
•             Curve,
    Learning Curve Sorry!
•   Database Selection, When To Use the IW?
•              Interfaces
    Numerous Interfaces, Syntax
•   A Non-Stop Flow of New Material
           The Invisible Web
            & The Librarian
                  Things T D !
                  Thi    To Do!

• Build Your Own Collections
                                     p
  Internet Resource Collection Development
• Mine Entire Sites, Often the IW Material Gets Little or
  No Notice In Reviews
• Create Links When Possible DIRECT to the Interface.
• “Save the Time of the Web Researcher”
• Keep Current
       The Invisible Web
        & The Librarian

  Types of IW Content in Librarian Terms


• Bibliographic    • Non-Bibliographic
  - OPAC’s           - Full-Text
  - Subject Bibs     -   Numeric
                     -   Graphic
                     -   Di
                         Directory
                     -   Real-Time
          Future Trends
• Killer apps will lead the way
  – Research Index (CiteSeer)
• Search engines will work harder to
  “find” Invisible Web content
    Inktomi (I d
  –I k             Connect, Ultraseek)
          i (Index C        Ul      k)
  – WhizBang (“wrappers”)
• No matter what, there will always
  be a problem!
   Coming Soon




     Available: July 2001
CyberAge Books 0-910965-51-X
 http://www.invisible-web.net
        Invisible Web:
       Computer Science
• MacAfee World Virus Map
  – http://www.mcafee.com
• ResearchIndex
  – http://www.researchindex.com
         Invisible Web:
       Company Research
• European High-Tech Industry
  Database
    http://www.tornado
  – http://www.tornado-
    insider.com/radar/
• Kompass
  – http://www.kompass.com
         Invisible Web:
      Intellectual Property
• Delphion Intellectual Property
  Network
  – http://www.delphion.com/
• ESP@CENET (European Patent
  Office) Patent Database
  – http://ep.espacenet.com/
         Invisible Web:
   Dictionaries & Languages
• EuroDicAutom
  – http://eurodic.ip.lu
• Verbix
  – http://www.verbix.com/index.html
          Invisible Web:
           Art & Artists
• ADAM (Art, Design, Architecture &
  Media Information Gateway)
  – http://adam.ac.uk/
• Artcyclopedia
   http://www.artcyclopedia.com/
  –h    //          l   di     /
         Invisible Web:
     Real-
     Real-Time Information
• Flight Tracker
  – http://www.trip.com/ft/home/0,2096,
    1-1,00.shtml
       ,
• J-Track 3-D Satellite Locator
    http://liftoff.msfc.nasa.gov/realtime/J
  – http://liftoff msfc nasa gov/realtime/J
    Track/Spacecraft.html
       Invisible Web:
 Maps and Driving Directions
• MapBlast
  – http://www.mapblast.com
  Streetmap.co.uk
• Streetmap co uk
  – http://www.streetmap.co.uk/
         Invisible Web:
        Government Info
• Parline Database
  – http://www.ipu.org
• United Nations Daily Press
  Briefings
   http://www.un.org/News/
  –h    //          /N   /
        Invisible Web:
       Health & Medicine
• Economics of Tobacco Control
  Database
  – http://www1.worldbank.org/tobacco/
    database.asp
• International Digest of Health
  Legislation
  – http://www.who.int
       Invisible Web:
    News & Current Events
• Cold North Wind Newspaper
  Archive Project
  – http://www.coldnorthwind.com
• Financial Times Global Archive
   http://www.globalarchive.ft.com
  –h    //     l b l   hi   f
          Invisible Web:
             Science
• Great Barrier Reef Online Image
  Catalogue
    http://www.gbrmpa.gov.au/corp_site
  – http://www.gbrmpa.gov.au/corp site
    /info_services/library/index.html
• Nuclear Explosions Database
  – http://www.ausseis.gov.au/databases
          Invisible Web:
          Transportation
• Equasis (Merchant Ships)
  – http://www.equasis.org/
• World Aircraft Accident Summary
  (WAAS) Fatal Airline Accident
  Subset
  – http://www.waasinfo.net/

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:129
posted:5/8/2011
language:English
pages:28