Sometimes Google Isn’t Enough Finding Information on the Invisible Web Shirley McDonald firstname.lastname@example.org Hilda Donaldson email@example.com First: a definition of the Visible (Surface) Web “It’s made up of HTML Web pages that the search engines have chosen to include in their indices. It’s no more complicated than that.” Sherman and Price. Static Web pages Fixed,or static, pages do not change and can be linked to other pages. Ex: http://www.truthorfiction.com http://exploratorium.com Dynamic Web Pages Dynamic - generated only by a specific query; does not exist after that query. www.mapquest.com http://www.aeroseek.com/webtrax/ The Invisible, Deep, or Hidden Web Web sites or information that Google or other popular search engines are not capable of indexing Websites specifically excluded by the search engine Invisible (Deep or Hidden) Web Public info is 400 – 550 times larger 550 billion individual documents vs one billion on surface web Quality content is 1,000 to 2,000 times greater than surface web 95% of Deep Web is accessible to public (no fees or subscription required) (Bergman) Hidden Web sites Opaque Web – material that can be, but is not included in search engine results. Ex: new material added and not yet picked up. Private Web – sites intentionally excluded from search engine results. Ex: password protected Proprietary Web – sites that require user registration. Ex: eBay, New York Times Pay per click – Ex: overture.com, FindWhat.com Content of Databases Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query. Examples: Phone books, People finders Patents, laws Items for sale in a Web store or Web-based auctions Digital exhibits Multimedia and graphical files Stock and bond prices Examples of Hidden Sites Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference Pages requiring login or registration: Blackboard, New York Times Government publications or databases: ERIC Online databases: Gale Research PDF files, audio, video, any new format More Examples Dictionaries and thesauri Sites that require forms to be filled out (ex: travel direction, job hunting) Product catalogs and library catalogs Newspaper and magazine archives Dynamic web pages (ex: airline flight checkers, mapquest) Interactive tools (ex: calculators) How are pages excluded from search engines? Google’s Webmasters that PageRankTM puts have figured out pages at the top of how to manipulate the hit list by the PageRank’sTM number of times behavior are able to move their pages to they are linked to the top of the hit list other pages (popular) Faulty typing and/or judgment Search engine spiders and crawlers cannot see the site unless it contains a link to another site Search engines can primarily see text pages in HTML form This will change in the future as search engines become more capable of retrieving the “hidden” web Use of blocking techniques by the webmaster or server Password protection HTML blocking in the web page A listing on the server of blocked pages Searching the Invisible Web Use the following to get around, just like the visible web: Directories – subject guide compiled by human editors Search Engines Specialized Databases Directories to search the Invisible Web Big Hub http://www.thebighub.com/ Complete Planet: The Deep Web Directory 70,000 searchable databases and specialty search engines http://www.completeplanet.com Digital Librarian: A Librarian’s Choice of the Best of the Web www.digital-librarian.com More directories IncyWincy: The Invisible Web Search Engine Offers Web Search, Directory Search, Metasearch, News Note: Kids & Teens, Reference http://www.incywincy.com Invisible Web Directory http://www.invisible-web.net/ Infomine: Scholarly Internet Resource http://infomine.ucr.edu Invisible Web Directory http://www.invisible-web.net/ Librarian’s Index to the Internet www.lii.org Open Directory Project (dmoz) http://www.dmoz.org (want to edit?) ProFusion: The Original Meta-Search Engine http://www.profusion.com/ Search Engines for the Invisible Web AlltheWeb: find it all http://www.alltheweb.com Bright Planet http://www.brightplanet.com/ Direct Search: SearchCenter (59 pages!) Can get updates through emails - Resourceshelf http://www.freepint.com/gary/direct.htm IxQuick: the world’s most powerful metasearch engine http://ixquick.com/ More Search Engines Search-22 http://www.search-22.com Search Adobe PDF Online http://searchpdf.adobe.com/ Turbo10 http://turbo10.com Vivisimo/Vivisimo Clustering http://www.vivisimo.com Specialized Databases Library of Congress http://catalog.loc.gov LookSmart’s Find Articles (over 900 publications http://www.findarticles.com National Science Digital Library http://www.nsdl.org Singing Fish – audio and video http://www.singingfish.com Choosing the Best Search NoodleTools http://www.noodletools.com/debbie/literacies/infor mation/5locate/adviceengine.html Great chart that connects the information need to the search strategy How to Choose a Search Engine or Directory http://library.albany.edu/internet/choose.html Access to the Hidden Web is Constantly Improving “Google Scholar Offers Access to Academic Information.” written by Danny Sullivan, November 18, 2004 http://searchenginewatch.com/searchday/article. php/3437471 Google makes arrangement with publishers to get into password protected sites – sometimes shows only abstract Includes libraries of Oxford, Stanford, Michigan, Harvard, NY Public http://scholar.google.com/ Issues “Let a Thousand Googles Bloom.” – by Lawrence Lessig http://www.latimes.com/news/opinion/commentary Questions the legality and copyright issues “Does Google move augur commercialization of libraries?” – Detroit Free Press http://www.freep.com/news/statewire/sw108716_2 0041214.htm Alternative to Google Scholar “Internet Archive to Build Alternative to Google.” – by Mark Chillingworth “Ten major international libraries have agreed to combine their digitized book collections in a free text-based archive hosted online by the not-for- profit Internet Archive.” Open Access Bibliography Bergman, Michael K. “The Deep Web: Surfacing Hidden Value.” http://www.beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp (8 November 2004). Cadwallader, Joy. “Searching the Invisible Web.” http://www.inf.aber.ac.uk/academicliaison/internet/invisible.asp (4 November 2004). Chillingworth, Mark. “Internet archive to build alternative to Google.” Information World. http://www.iwr.co.uk/IWR/1160176. (30 December 2004). Cohen, Laura. “How to Choose a Search Engine or Directory.” http://library.albany.edu/internet/choose.html (4 November 2004). “Does Google move augur commericalization of libraries?” http://www.freep.com/news/statewire/sw108716_20041214.htm (15 December 2004). Grimes, Brad. “Expand your Web search horizons: six tips for finding the info you want by searching hidden corners of the Web.” PC World. June, 2002. “Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.” http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html (4 November 2004). Lessig, Lawrence. “Let a Thousand Googles Bloom.” http://www.latimes.com/news/opinion/commentary/la-oe- lesig12Jan12,1,1292618.story?ctrack=1 (13 January 2005). McLaughlin, Laurianne. “Beyond Google: the web is so full of useful info that no search engine can find it all. But a multitude of specialty sites deliver shopping advice, reference databases, leisure-time ideas, and more – fast.” PC World. April, 2004. Bibliography Niederlander, Mary. “More on Searching: The Hidden Web or Invisible Web Resources.” http://www.librarysupportstaff.com/hiddenweb.html (4 November 2004). O’Leary, Mick. “Invisible Web Discovers Hidden Treasures.” Information Today. January, 2000. “Search Engines 101 – Search Engines Explained.” http://www.submittoday.com/search_engines_101.htm (4 November 2004). “Searching the Hidden Web.” http://www2.canisius.edu/canhp/canlib/guides/hidden-web.html (4 November 2004). Sherman, Chris and Gary Price. “The invisible web: uncovering sources search engines can’t see.” Library Trends Fall, 2003. Smith, C. Brian. “Invisible Web: Explore hidden troves of information.” http://www.libraryspot.com/features/invisibleweb.htm (4 November 2004). Sullivan, Danny. “Google Scholar Offers Access to Academic Information.” http://searchenginewatch.com/searchday/article.php/3437471 (1 Dec. 2004). Vine, Rita. “Going beyond Google for faster and smarter web searching.” Teacher Librarian. October, 2004.