Exploring the Deep Web

Document Sample
Exploring the Deep Web Powered By Docstoc
					Exploring the Deep Web
         Peter L. Kraus
  J. Willard Marriott Library –
      University of Utah
     What is the Deep Web?
The deep Web is the hidden part of the
 Web, containing a huge volume of content
 that is inaccessible to conventional search
 engines, and consequently, to most users.
   How big is the Deep Web?
• 550 billion documents
• 500 times the content of the surface Web
• Google has identified 1.2 billion
• An Internet search typically searches .03%
  (1/3000) of available content.
      What’s in the Deep Web?
•   Searchable databases
•   Downloadable files & spreadsheets
•   Image and multi-media files
•   Data sets
•   Various file formats such as .pdf
•   Lots of government information
     Why use the Deep Web?
• Higher quality sources
  – Selected and organized by subject experts
• Dynamic display
• Customized data sets
• Some data is visual, and not word
• Regular search engines miss vast
  resources available in the Deep Web
     Why are we talking about
   Government Sites in the Deep
• Governments have the mandate and the
  capacity to gather information that
  individuals don’t
• Most government information is copyright
• Government information is authoritative
• Governments have the financial and
  human resources to maintain Deep Web
              The Web Today
• Web sites from the federal government only
  occupy about 1% of the entire global web.
  However, they hold 85% of “The Deep Web”.
• The content of these web sites include items
  with either an .html or .pdf format (reports,
  records, data-sets, etc) – diversity of files. Little
  standardization or uniformity ; Common term for
  this content is “Grey Literature”.
  Definition of “Grey Literature”
• “That which is produced on all levels of
  government, academics, business and
  industry in print and electronic formats, but
  which is not controlled by commercial
     Growth and Life of Federal
• On federal web sites the amount of
  information grew 13-fold between 1992-

• The average life expectancy of federal
  web resource is 4 months (2003)
       What can libraries do?
• LOCKSS-DOCS project (BYU and UU are
  members) (Archival project)
• Cooperative efforts in specific subject
  areas (Western Waters Digital Library)
• Individual Institutional Initiatives; such as
  Institutional Repositories ; reflecting the
  institutional productivity in research
  (Information often funded by federal
Finding Naked People - Forsyth, Fleck
(1996) (Correct) (54 citations)

This paper demonstrates an automatic
system for telling whether there are naked
people present in an image. The approach
combines color and texture properties to
obtain a mask for skin regions, which is
shown to be effective for a wide range of
shades and colors of skin.

Graph showing number of citations to
“Finding Naked People”
Arches National Park : NASA Landsat 7 10/3/99
searching for     displaying
""University of   records 1 - 25     next 25   last 25
Utah""            of a total of 27

Development and Evaluation of Stitched
Sandwich Panels
Larry E. Stanley; Daniel O. Adams
NASA Langley Research Center
NASA/CR-2001-211025 , June 2001; 20010702
….. test panels were produced initially at the
University of Utah and later at NASA Langley
Research Center……
Marriott Library, Salt Lake City, Utah,
United States 9/18/2003 (TerraServer)
Utah Seismic Hazards (National Atlas)
International Deep Web Resources
• International organizations collect an
  amazing amount of data
• Statistical data is often best organized in
  database and spreadsheet format
• Like the US Government, individual
  countries post data files and databases
• This information may not be available in
  print sources in schools and libraries
United Nations Official Documents
• http://documents.un.org/
         Why use the ODS?
• Full-text Official United Nations
  Documents (1993 -) online, free
• Retrospective digitization in process
• Highly relevant material for almost any
  international topic
• Timely and authoritative
         United Nations Statistical
• Value of the        • Database topics
  information:          include:
  –   Authoritative   • Commodity trade
  –   Comparative     • Demographics
  –   Time series
                      • Disability statistics
  –   Compact
                      • Social indicators
                      • Statistics on men and
   Individual Country Statistics
• http://www.census.gov/main/www/stat_int.html
 Why use this kind of information?
• Aggregate statistical sources are often not
  as up-to-date
• Individual countries are often more specific
  in their indicators than aggregate sources
• Information in databases, spreadsheets,
  and downloadable files is usually NOT
  searchable by web crawlers
For Further Information

• Marriott Library, University of Utah


Shared By: