Search Engines June 20, 2005 LIBS100 Linda Galloway LIBS 100 Word of the Day A search engine that queries other search engines and then combines the results. What is a search engine?? A program that searches documents for specified keywords and returns a list of the documents where the keywords were found. How Search Engines Work • Spider or crawler – Visits page – Follows links on page to other pages – Sends terms to the holding area • Index – Sorts through holding area – Stores significant words with a link to pages that have those words – Ignores words like “the” “and” “of” “to” • Search engine software – Accepts your query term – Finds matching pages Boolean Operators • AND (+) locates records containing both terms. • OR locates records containing either term • NOT (–) locates records containing first term, but not the second • Most of the time, operators MUST be capitalized Review Major Search Engines Top Choices Crawler Based • Google www.google.com • Yahoo www.yahoo.com • Ask Jeeves www.askjeeves.com (results from Teoma) Source: www.searchenginewatch.com Major Search Engines Good Choices Crawler Based • AlltheWeb www.alltheweb.com (editorial results from Yahoo) • AOL Search http://aolsearch.aol.com (editorial results from Google) • Hotbot www.hotbot.com (editorial results from Yahoo, Google, Teoma) • Teoma www.teoma.com Subject Directories • Human-powered • Humans review, select, categorize web sites • Changes to a site will not affect its listing on a directory How Subject Directories Work • Humans decide on a set of categories • Humans review web sites (sometimes based on suggestions from users) • Humans assign a site to a category • Sometimes humans write actual content Subject Directories Ranking • No automated ranking algorithm • Humans put categories in order • Sites usually listed alphabetically • Sponsored links Yahoo Directory • “Classic” Yahoo – uses humans to organize web sites into categories – http://dir.yahoo.com – Yahoo directory only directory based search engine to get top rating • Librarians Index to the Internet – www.lii.org Subject Directories – Pros and Cons • Pros – Human review/intervention – Sites are organized by topic – Sites can’t artificially inflate their ranking • Cons – Very limited content – Only updated when humans find time Popular Subject Directories • Yahoo Directories (http://dir.yahoo.com) • About.com (http://www.about.com) • Librarian’s Index to the Internet (http://lii.org) • Google Directories (http://directory.google.com) • Infomine (http://www.infomine.ucr.edu) • LookSmart (http://www.looksmart.com) So Which Do I Use? • Search engine – You already have a very specific topic – You have a very new topic/need very latest info – You need quick facts • Subject directory – You have a broad topic and want to narrow it down – You aren’t sure how to get more specific Metasearch Engines • A search engine that queries other search engines and then combines the results that are received from all. • Searcher uses a combination of search engines at one time. Metasearch Engines Disadvantages?? • User cannot tailor search to each search engine. • Dependant on other search engines’ technology. Good Metasearch Engines • Dogpile www.dogpile.com • Vivisimo www.vivisimo.com (www.clusty.com) • Hotbot www.hotbot.com • Kartoo www.kartoo.com • Mamma www.mamma.com Editorial Results (or Main Results) Results that are gathered by crawling or indexing web sites. Web masters pay a lot of attention to how their sites are listed. These are non-fee based listings Paid Listings Web sites pay a fee to be among top hits for certain keywords. With some search engines, it is difficult to tell difference between editorial and paid listings. Paid hits are probably not the most relevant. Search Engines Crawler-based Directory Metasearch Results Listing Paid Editorial Resources • Rothenberger, Michelle. “Search Engines.” 6 Feb 2005 <http://www.carpeindexum.com/libs100/srcheng/srchdir.ppt>. • Staff. “Resources, INFS100.” Minneapolis Community and Technical College. 6 Feb 2005 <http://www.mctc.mnscu.edu/library/courses/infs1000/infs1000 pt2.htm#Resources>. • Sullivan, Danny. “Search Features Chart.” Searchenginewatch.com 26 Oct 2001. 6 Feb 2005 <http://searchenginewatch.com/facts/article.php/2155981>. Assignment 3 Due June 22, 2005 • Handed out in class on Wednesday, June 15th • You will perform a focused search using two search engines on your research topic Assignment 3 • Must Document Your Sources!!! • Use MLA format • Described on your Assignment • Follows this format for web pages: Author’s Last Name, Authors First Name. “Title of Web Page.” Title of Complete Web Site, if Applicable. Date of Publication or last revision. Date accessed <Web Page Address (or URL)>. Like This! Sherman, Chris. “Metacrawlers and Metasearch Engines.” SearchEngineWatch.com 15 March 2004. 8 Feb 2005 <http://searchenginewatch.com/links/a rticle.php/2156241>.