Google Hacking - A Crash Course by alexkeller

Document Sample
Google Hacking - A Crash Course by alexkeller Powered By Docstoc
					            Hacking : A Crash Course
Alex Keller, Network/Systems Administrator for BSS
                Computing @ SFSU
What is "Google"?
   Definition: Googol
    Pronunciation: 'gü-"gol
    Function: noun
    Google is a play on the word googol, which was coined by Milton Sirotta, nephew of
    American mathematician Edward Kasner, and was popularized in the book,
    "Mathematics and the Imagination" by Kasner and James Newman. It refers to the
    number represented by the numeral 1 followed by 100 zeros. Google's use of the
    term reflects the company's mission to organize the immense, seemingly infinite
    amount of information available on the web.

   Originally called "Backrub", the logic behind the Google search engine was develop
    by graduate students Larry Page and Sergey Brin at Stanford University in 1995.
    Their first place of business was literally a garage. The garage location was chosen
    because it had a washer/dryer and a hot tub out back, they were already serving
    10,000 searches a day.

         How We Got Here....
   For the last 5 years, Google has been the
    undisputed leader in online search technology.
   Before Google; Altavista, FAST, and Inktomi had
    the largest databases; but suffered from poorer
    search algorithms.
   Google's profit is partially ad driven, but
    sponsors do not garner higher ratings in
             Searching and Beyond...

   Localization          Programming tools
   Language options      Intra-network searches
   Toolbar               Print searching
   Blogger               Desktop search
   Translation           Mobile Access
   Calculator            News
   Stock Quotes          Spell Checker
   Phonebook             Pricing
   Newsgroups
Search Engine Supremacy

How Big is Google?

     Searches Per Day in Millions


80                                                                     Yahoo/Overture


    So How Does Google Work?
 Crawls and indexes web pages et al.
 Stores copies of web pages and graphics
  on their caching servers
 Presents users with simple front end to
  query the database of cached pages
 Returns search results in a ordered
  fashion based upon relevancy
                      Anatomy of a Search
      Server Side                                      Client Side
       What Can Google Search?
   Adobe Portable Document Format (pdf)
   Adobe PostScript (ps)
   Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
   Lotus WordPro (lwp)
   MacWrite (mw)
   Microsoft Excel (xls)
   Microsoft PowerPoint (ppt)
   Microsoft Word (doc)
   Microsoft Works (wks, wps, wdb)
   Microsoft Write (wri)
   Rich Text Format (rtf)
   Shockwave Flash (swf)
   Text (ans, txt)
  So What Determines Page
   Relevance and Rating?
    Exact Phrase: are your keywords found as an
     exact phrase in any pages?
    Adjacency: how close are your keywords to each
    Weighting: how many times do the keywords
     appear in the page?
    PageRank/Links: How many links point to the
     page? How many links are actually in the page?

Equation: (Exact Phrase Hit)+(AdjacencyFactor)+(Weight) * (PageRank/Links)

                                     From: Google 201, Advanced Googology - Patrick Crispen, CSU
     Enough BS, How Do I Get Results?

 Pick your keywords carefully & be specific
 Do NOT exceed 10 keywords
 Use Boolean modifiers
 Use advanced operators
 Google ignores some words*:
a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of,
on, or, that, the, this, to, we, what, when, where, which, with

                 *From: Google 201, Advanced Googology - Patrick Crispen, CSU
       Google's Boolean Modifiers
   AND is always implied.
   OR: Escobar (Narcotics OR
   "-" = NOT: Escobar -Pablo
   "+" = MUST: Escobar +Roberto
   Use quotes for exact phrase
    matching:          "nobody puts baby in a corner"
    "there are known knowns; there are things we know we know. We also
    know there are known unknowns; that is to say we know there are
    some things we do not know. But there are also unknown unknowns,
    the ones we don't know we don't know."
   Google supports word wildcards but NOT
     "It's the end of the * as we know it" works.
     but "American Psycho*" won't get you decent
      results on American Psychology or American
       Advanced Searching
Advanced Search Page:
     Advanced Operators
   cache:                                     filetype:
   define:                                    numrange 1973..2005
   info:                                      source:
   intext:                                    phonebook:
   intitle:
   inurl:
   link:                     DEMO:
   related:                  visa
   stocks:                   9999
   Translation and Language options - over 100 to choose from:
   Stock Quotes - enter stocks:, example: stocks:GOOG
   Newsgroups -
   Calculator - "1024 minus 768" or "12 to the 10 power"
   Froogle -
   Images -
   Spell Checking - just type it in: "convienence"
   Blogger -

Extras can be found at
   Google, doesn't make it right...
   FAIR - Fairness and Accuracy in Reporting
   Federation of American Scientists:

   Holocaust Never Happened?
   School of the Americas:

   Pixyland!
             Bibliography and Further Research
Search Engine Watch:

Google Hacks: 100 Industrial-Strength Tips & Tools
by Tara Calishain, Rael Domfest

Johnny I Hack Stuff:



Shared By: