Google Hacking - A Crash Course by alexkeller

Document Sample
Google Hacking - A Crash Course by alexkeller Powered By Docstoc
					            Hacking : A Crash Course
Alex Keller, Network/Systems Administrator for BSS
                Computing @ SFSU
What is "Google"?
   Definition: Googol
    Pronunciation: 'gü-"gol
    Function: noun
    Google is a play on the word googol, which was coined by Milton Sirotta, nephew of
    American mathematician Edward Kasner, and was popularized in the book,
    "Mathematics and the Imagination" by Kasner and James Newman. It refers to the
    number represented by the numeral 1 followed by 100 zeros. Google's use of the
    term reflects the company's mission to organize the immense, seemingly infinite
    amount of information available on the web.

   Originally called "Backrub", the logic behind the Google search engine was develop
    by graduate students Larry Page and Sergey Brin at Stanford University in 1995.
    Their first place of business was literally a garage. The garage location was chosen
    because it had a washer/dryer and a hot tub out back, they were already serving
    10,000 searches a day.


                                              http://www.google.com/corporate/history.html
         How We Got Here....
   For the last 5 years, Google has been the
    undisputed leader in online search technology.
   Before Google; Altavista, FAST, and Inktomi had
    the largest databases; but suffered from poorer
    search algorithms.
   Google's profit is partially ad driven, but
    sponsors do not garner higher ratings in
    searches.
             Searching and Beyond...

   Localization          Programming tools
   Language options      Intra-network searches
   Toolbar               Print searching
   Blogger               Desktop search
   Translation           Mobile Access
   Calculator            News
   Stock Quotes          Spell Checker
   Phonebook             Pricing
   Newsgroups
Search Engine Supremacy




        http://searchenginewatch.com/reports/article.php/2156481
How Big is Google?




      http://searchenginewatch.com/reports/article.php/2156481
     Searches Per Day in Millions
       80

                                                    250
45



                                                                       Google
80                                                                     Yahoo/Overture
                                                                       Inktomi
                                                                       Looksmart
                                                                       Others

             167


            http://searchenginewatch.com/reports/article.php/2156461
    So How Does Google Work?
 Crawls and indexes web pages et al.
 Stores copies of web pages and graphics
  on their caching servers
 Presents users with simple front end to
  query the database of cached pages
 Returns search results in a ordered
  fashion based upon relevancy
                      Anatomy of a Search
      Server Side                                      Client Side




http://computer.howstuffworks.com/search-engine1.htm
       What Can Google Search?
   Adobe Portable Document Format (pdf)
   Adobe PostScript (ps)
   Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
   Lotus WordPro (lwp)
   MacWrite (mw)
   Microsoft Excel (xls)
   Microsoft PowerPoint (ppt)
   Microsoft Word (doc)
   Microsoft Works (wks, wps, wdb)
   Microsoft Write (wri)
   Rich Text Format (rtf)
   Shockwave Flash (swf)
   Text (ans, txt)
  So What Determines Page
   Relevance and Rating?
    Exact Phrase: are your keywords found as an
     exact phrase in any pages?
    Adjacency: how close are your keywords to each
     other?
    Weighting: how many times do the keywords
     appear in the page?
    PageRank/Links: How many links point to the
     page? How many links are actually in the page?

Equation: (Exact Phrase Hit)+(AdjacencyFactor)+(Weight) * (PageRank/Links)


                                     From: Google 201, Advanced Googology - Patrick Crispen, CSU
     Enough BS, How Do I Get Results?

 Pick your keywords carefully & be specific
 Do NOT exceed 10 keywords
 Use Boolean modifiers
 Use advanced operators
 Google ignores some words*:
a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of,
on, or, that, the, this, to, we, what, when, where, which, with




                 *From: Google 201, Advanced Googology - Patrick Crispen, CSU
       Google's Boolean Modifiers
   AND is always implied.
   OR: Escobar (Narcotics OR
    Cocaine)
   "-" = NOT: Escobar -Pablo
   "+" = MUST: Escobar +Roberto
   Use quotes for exact phrase
    matching:          "nobody puts baby in a corner"
                                      OR
    "there are known knowns; there are things we know we know. We also
    know there are known unknowns; that is to say we know there are
    some things we do not know. But there are also unknown unknowns,
    the ones we don't know we don't know."
                  Wildcards
   Google supports word wildcards but NOT
    stemming.
     "It's the end of the * as we know it" works.
     but "American Psycho*" won't get you decent
      results on American Psychology or American
      Psychophysics.
       Advanced Searching
Advanced Search Page:
 http://www.google.com/advanced_search
     Advanced Operators
   cache:                                     filetype:
   define:                                    numrange 1973..2005
   info:                                      source:
   intext:                                    phonebook:
   intitle:
   inurl:
   link:                     DEMO:
                              on-2-13-1973..2004
   related:                  visa
                              4356000000000000..435699999999
   stocks:                   9999

    http://www.googleguide.com/advanced_operators.html
                            Extras...
   Translation and Language options - over 100 to choose from:
    http://www.google.com/language_tools
   Stock Quotes - enter stocks:, example: stocks:GOOG
   Newsgroups - http://groups.google.com
   Calculator - "1024 minus 768" or "12 to the 10 power"
   Froogle - http://froogle.google.com
   Images - http://images.google.com
   Spell Checking - just type it in: "convienence"
   Blogger - http://www.blogger.com/start



Extras can be found at http://www.google.com/help/features.html
   Google, doesn't make it right...
GOOD
   FAIR - Fairness and Accuracy in Reporting
http://www.fair.org/
   Federation of American Scientists:
http://www.fas.org/main/home.jsp
   OneWorld.net:
http://www.oneworld.net/

BAD
   Holocaust Never Happened?
http://www.air-photo.com/english
   School of the Americas:
http://carlisle-www.army.mil/usamhi/usarsa/main.htm

UGLY
   Pixyland!
http://www.pixyland.org/peterpan/photo_closeups_pp4.htm
             Bibliography and Further Research
Search Engine Watch:
http://searchenginewatch.com

Google Hacks: 100 Industrial-Strength Tips & Tools
by Tara Calishain, Rael Domfest

Johnny I Hack Stuff:
http://johnny.ihackstuff.com

Google:
http://www.google.com

HowStuffWorks:
http://computer.howstuffworks.com/search-engine1.htm

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:6/25/2011
language:English
pages:19