Googling
Welcome !
While you are waiting, please…
find in your packet:
“What Do You Want Google to Tell You?”
Exercise 6 - Questions for the Final Exercise
begin writing down your questions in three or more categories
Googling
Instructor: Joe Barker jbarker@library.berkeley.edu An Infopeople Workshop 2005
Googling
This Workshop is Brought to You By the Infopeople Project
Infopeople is a federally-funded grant project supported by the California State Library. It provides a wide variety of training to California libraries. Infopeople workshops are offered around the state and are open registration on a first-come, first-served basis. For a complete list of workshops, and for other information about the Project, go to the Infopeople Web site at infopeople.org.
Introductions
Name
Library Position How do you use Google?
Workshop Overview
Google’s way of “thinking”
Taking charge of the driving
Using limits to find the hard-to-get
Finding information on a subject
Special Google databases and tools What to do when Google doesn’t work
Go to: bookmarks.infopeople.org
Click on extreme_googling_bk.htm
Make a bookmark of this page
Add to Favorites
Exercise 1
How does Google “think” about your searches?
Please pause and wait for discussion when you reach a
A Close Look at Google Search Results
• Which Google database used • Approx. # of hits • Terms actually searched on, as Dictionary links
• Excerpt of page with your terms • Matched terms in bold
• URL, size, date last crawled • Link to Cached copy • Pages supposedly like this one
• 2nd page from same site • All Google pages from this site
Don’t believe the number of Results
They are approximate, changing, and not comprehensive
Default Matching on Search Terms
Default AND between terms
Google takes a FUZZY approach
only some of the words if a page is “important” words may occur only in pages that link to the page words occur somewhere on the site a page belongs to
may differ from the current page Cached exists if a page is full-text indexed About 1 billion pages in Google are not cached Not fully searchable no Cached if a page owner requests not to be cached
Cached reveals the page as Google found it
How Can You Know
Why Google Found a Page ?
Click Cache link toward end of results
top area often explains what was matched
Stemming
Google
stems “when appropriate”
automatically detects word stem or root retrieves with various endings
kite flying gets kite kites kiting
fly flying, flyers, flyer’s, flyers’
to turn off
+kite +flying “kite flying”
single word searches not stemmed
Words Google Does Not Search
Common or “stop” words ignored
to be or not to be
no list of “common” terms Google tells you below search box in results to turn off
+to +be +or not +to +be “to be or not to be” single word searches possible on common words
Ranking of Results
Word order matters
favoring phrases (words together) looks for phrases with something in place of stop words word repetition and proximity also count PageRank combines many factors
Google ranking is a great mystery
popularity - links to a page and their importance “importance” - a value of 0 (low) to 10 (high) term placement - phrases, proximity, repetition
See Cheat Sheet #1
Google Preferences
Interface language
Selected languages for pages SafeSearch filtering
“moderate” is default 20 or 30 is best
Number of results returned
Open new browser window for search results
Back of Cheat Sheet #1
The Google Toolbar
Search any Google databases Search within a site Pop-up blocker Search history list Set Google preferences quickly Customizable in Options download from
toolbar.google.com
Other browsers toolbar download from googlebar.mozdev.org
Googling
Exercise 2
Installing the Google Toolbar Customizing Preferences
Taking Charge of Driving Google OR Getting the Most from Google’s FUZZY Thinking
Improving Google’s “FUZZY” Default AND
Problems with AND default:
words can occur anywhere in results pages
may have different meanings or contexts
some pages may not contain all of your words some may not have any of your words
Use quotation marks to require words together
turns common words into unique search terms “working mothers” 145,000 5% of working mothers 2,680,000 “dry cells” 11,500 1% of dry cells 1,010,000
Hyphen makes phrases and searches with and without hyphens
bite-sized retrieves bite-sized, bite sized, bitesized
Force “FUZZY” with OR Searches
Singulars and plurals not covered by stemming
parent OR parents
Equivalent or synonymous terms
parent OR guardian
Misspellings
libarian OR librarian
Apostrophes and their misuse
april's OR aprils OR april "fools day"
Ask Google to be “FUZZY”
Synonym search
~ immediately before a word
sometimes “thinks” of very broad, related terms
~food ~facts ~help
recipes, nutrition, cooking information, statistics guide, tutorial, FAQ, manual
hike, hikers, hiking, hikes
Often: Terms appear in links pointing to a retrieved page
Take advantage of stemming
Let stemming handle variant endings: “wild flowers” OR wildflowers hike “point reyes” april OR may OR spring
Ask for “FUZZY” Number Ranges
Numrange search uses
babe ruth 1921..1935
..
(no spaces)
results have highlighted dates within this range
3..6 megapixels digital camera
most numbers will be associated with megapixels
DVD player $250..
can be open-ended -- any number above starting number
The Whole-Word Wildcard: Allowing FUZZY within “ ”
Can’t remember the exact wording in a phrase?
Who wrote something like, “The stag at night drank his fill”?
Try searching:
“the stag * * * his fill” OR “the stag * * * * his fill”
ANSWER: “The stag at eve had drunk his fill” - in most sources
--Sir Walter Scott, “Lady of the Lake” "george bush" "george * bush" "george * * bush" "bush george" "bush * george"
Construct proximity searches
Or
try GAPS www.staggernation.com/cgi-bin/gaps.cgi
Excluding to Control “FUZZIness”
You want: Medical info about a pancreatitis diet Start with: pancreatitis diet 172,000 Eliminate undesirable words in results:
pancreatitis diet -cat -dog pancreatitis -cat -dog -"support group"
132,000 128,000
Select exclusions carefully
Ask Google to be Very “FUZZY”: Related & Similar
Two commands for the same function
click Similar at end of result search related:www.infopeople.org links to and from the target page major words in and ranking of related pages comparison shopping find more sites like a site related:www.econsumer.gov use to evaluate a suspect page
Sometimes hard to see how related
Possible uses
Googling
Exercise 3
Taking Charge of Driving Google
Googling
Limiting to Find the Hard-to-Get
Limiting: Words in
intitle:
finds pages concentrated on your term
hybrid cars intitle:mileage hybrid cars mileage 7,060 296,000 581 28,000
with quotes:
intitle:”cuban embargo” “cuban embargo”
with OR:
intitle:”global warming” OR intitle:”greenhouse effect”
Use allintitle: to require all words in title
allintitle: hybrid cars mileage
86
11
can combine only with site:
allintitle: hybrid cars mileage –site:com
Exploiting a Page’s URL
Limiting to domain (edu, gov, etc):
site:edu OR site:gov OR site:ca.us
complete list at:
http://en.wikipedia.org/wiki/List_of_Internet_TLDs
Searching within a Site
site: site:memory.loc.gov lincoln “sheet music”
works only in top/first part of URL omit http:// and final / makes Google into a search engine for pages that are indexed
in Google
inurl: less specific
term may be anywhere in URLs
inurl:lincoln “sheet music”
finds “lincoln” anywhere in any URL and “sheet music” somewhere in the pages
Limiting to Types of Documents
filetype:
OR to find more than one
form 1040 filetype:pdf - finds forms
-filetype:
exclude certain filetypes
form 1040 -filetype:pdf - finds help with forms
View as HTML link can be useful
avoids viruses a document might carry if opened allows viewing without the software or reader
Caveats for Limit Commands
Cannot always be combined
link: similar: must stand alone allintitle: allintext: allinanchor: allinurl: with site: only inurl:ucla intitle:admissions statistics intitle:”thyroid disease” site:edu OR site:com
You can mix all other limit commands, usually:
Be careful not to ask for the impossible:
site:ucla.edu -inurl:edu site:com site:edu site:gov
Some require understanding HTML hypertext links:
inanchor:links looks for text in link tags in the HTML code: Links
See Cheat Sheet #3
Advanced Web Search page
Restricted Opportunities Useful if you want to:
Not useful if you want to:
Try limiting to pages updated in 3 mos, 6 mos, year Change language of results pages Select from list of filetype formats Change content filtering (also in Preferences)
Construct complex searches
OR with phrases multiple phrases
Use OR for more than one limiter
site: filetype: inurl:
Use intitle: inurl:
I almost never use it
only the allin... commands in Advanced Search
Googling
Exercise 4
Limiting
Googling
Finding Info on a Subject
Finding Directories & Link Lists
EXAMPLE - looking for links or directories about: “women’s history” “middle east” Use words likely to occur in link-list or directory pages
links OR "directory of" OR guide “women’s history” “middle east”
“what’s new” OR “what’s cool” “women’s history” “middle east”
field limit to focus pages you want intitle:links OR intitle:”directory of” OR intitle:”encyclopedia of” “women’s history” “middle east” intitle:”women’s history” intitle:directory “middle east”
Are there agencies or organizations with links on this topic? inanchor:links society OR association "middle east" "women's studies"
Be creative. Substitute database for “directory” to find searchable databases
Google’s Directory
1.5+ million pages (compare with 8+ billion in web search)
DMOZ Open Directory
Google “importance” ranking within directory
EXAMPLE:
women's history middle east OR eastern Click on useful subject categories for more:
Science > Social Sciences > Area Studies > Middle Eastern Studies Society > People > Women > Women's Studies > By Topic Society > Issues > Human Rights and Liberties > Regional > Middle East
Search Google for Weblogs
Current commentary, opinions, misc. musings
Google indexes “important” blogs frequently more than most web pages
blog OR weblog OR “web log” your subject words inurl:blog OR inurl:weblog your subject words
Thorough search impossible
If you know the software a blog is using:
“powered by blogger” your subject words site:blogspot.com your subject words “powered by geeklog” your subject words
Try searching the Google Directory
Search Google Groups for Info
Usenet news groups back to 1981
archive of UNevaluated public thoughts, advice & opinions some not found elsewhere select threads with more than one article for context
Search differences:
search for a group by name search within a group + required for common words even in “ “
“hair loss” OR "loss +of hair" OR balding group:alt.support.thyroid
use Advanced Search to limit by group or date posted
Create new mailing lists with registration
Google as Encyclopedic Glossary
Use the command define:[no space]
Google finds and ranks Web pages with definitions define:internet define:due diligence internet “what is” “what is the internet” “internet stands +for” internet ~beginners internet ~FAQ
Or build searches for pages with definitions:
Also many common facts available:
population of japan currency in algeria birthplace of hitler
Exercise 5
Finding Info on a Subject Brainstorming
How would you approach Google 7. Whatcanto find some goodfromand how and the 4. 2. Where the Ibirthplace of Teddy wide good of 1. I want to find websitesof the following range places I currency ofdirecting find blogs Nepal, debates, collections 3. 5. How iscan solve of California? a me to of links and 6. size each about California much of it could $100 headaches? Roosevelt? perspectives, about what constitutes for blogs in on migraineUS buy as of a informationproblems?particularly blogsnear-death use ofbird watching in Northern California. to keep in libraries, January 15, 2004? experience? I'm interested in proofs that what people touch with other librarians and libraries in the state report can be believed. and how they’re using blogs?
Googling
Special Google Databases and Tools
Shortcuts and Services
Shortcuts: dictionaries and other definitions phonebooks - white and yellow movie showtimes stocks with recent news maps, weather converters, math problem calculators, physical constants number searches UPS, FedEx, USPS, VIN, UPC codes, area codes, airplane reg. #, patents, more http://www.googleguide.com/shortcuts.html Translate click [Translate this page] or URL or enter text at www.google.com/language_tools Page Info - better to enter a URL @ alexa.com
Many search engines offer useful shortcuts & similar tools: See Search Cheat Sheet #4 & Supplement
“Hacking” Google URLs
Structure of a Google search result URL
Your search is for:
“web searching” tutorial
http://www.google.com/search? Google URL ? indicates query num=20& Number of results per page hl=en& Interface language lr=& Search language blank (ALL) safe=off& SafeSearch off q=%22web+searching%22+tutorial Query search terms %22 means quote mark + joins terms
Will vary according to your Preferences setting
You
can modify results by changing values
A “Hack” for Country Searches
Type the search: egypt history 1950..1970
http://www.google.com/search?num=20&hl=en&lr=&safe=off& q=egypt+history+1950..1975 &restrict=countryEG
Append in Address/URL box (no spaces): &restrict=countryEG General format - capitalized country code: &restrict=countryXX Complete country codes list:
http://en.wikipedia.org/wiki/List_of_Internet_TLDs
More countries and pages than in Language Tools search page
www.google.com/language_tools
Google’s Other Proprietary Databases
Besides Web, Directory, and Groups
Images
1.3+ billion SafeSearch filter only works in English language 4,500 news sources Useful, specific limit settings 30 days international versions - other news slants shopping sites from Google - a subset + merchant uploads of catalogs not on the web no fees, no pay for position scanned mail-order catalogs (not web), text searchable to navigate within a catalog, click an image and use the special catalogs navigation bar
News
Use Advanced Search forms
Froogle for shopping
Catalogs (Google Labs still)
Local Information
local.google.com
“businesses & services” from Google web database + several yellow pages
topic box address/location box restrict to 1, 5, 15, 45 miles away
geographic proximity, maps EXAMPLE:
vegetarian restaurants 100 Larkin St, San Francisco, CA
maps.google.com
draggable images, satellite view local (yellow pages), driving directions
earth.google.com
requires download, 200 MB memory exotic toy or useful tool?
Google Labs
More upcoming Google services (beta)
Sets - create and explore sequences of things Suggest - browse possible search terms video.google.com – some TV programs My search history – registration and privacy considerations
Print.google.com – search only in Print database
project to make full text books available online
Scholar.google.com – special page to search from
scholarly articles (mostly) on the web
abstracts if full text not available integrated with OCLC for library holdings integrated with some college campuses
See Cheat Sheet #5
Exercise 6
Where would you look?
1. 2.
Choose ONE or TWO questions to answer Write down what you did & learned
3.
It’s O.K. to talk, ask questions, and help
each other as needed
Googling
When Google Doesn’t Work
Other Effective Search Engines
Yahoo Search (3+ billion)
no 10-word limit
accepts ( ) around Boolean OR
(“global warming” OR “greenhouse effect”) (site:edu OR site:gov OR site:uk)
pay-for-position sites not identified
Teoma (1+ billion)
popularity within subjects sometimes finds link collections as Resources
Bookmarklets for Searching
Java
Script applications that reside in your Bookmarks or Favorites (Favlets) Search engine tools:
run
a search in another search engine
@Teoma @Yahoo!
search
highlighted text in a search engine
Information
and more about them at
searchengineshowdown.com/bmlets
Recommended Directories
By library people
LII.ORG Academic Info Infomine
Complement to searching
when search engines do not seem to work when you know or have a hunch there is a site about your question
Thinking in Sync with Search Engines
Search engine balancing act:
Do we agree with Google’s “importance”?
tyrannical or democratic?
favors established more than new websites favors trendy, high-speed, consumer, vroom & zoom
Are Google’s secretiveness & fuzziness trustable? Do we accept “good enough” quicker? Have we given up “thorough” and “certain”? Or bring in a new age of “whatever” thinking
Have search engines changed us?
Will semantic & linguistic analysis help?
Googling
Exercise 7
Make your own Cheat Sheet Write down up to seven things you want to remember to do or practice
Circle the ONE you like most
Googling
Workshop Evaluation
infopeople.org/WS/eval