Google Hacking

Document Sample
Google Hacking Powered By Docstoc
					 Google Hacking

University of Sunderland
       CSEM02
 Harry R Erwin, PhD
   Peter Dunne, PhD
                     Basics
•   Web Search
•   Newsgroups
•   Images
•   Preferences
•   Language Tools
             Google Queries
• Non-case sensitive
• * in a query stands for a word
• „.‟ in a query is a single character wildcard
• Automatic stemming
• Ten-word limit
• AND (+) is assumed, OR (|) and NOT (-) must be
  entered
• “” for a phrase
             More Queries
• You can control the language of the pages
  and the language of the reports
• You can restrict the search to specific
  countries
               Controlling Searches
•   Intitle, allintitle   •   Related
•   Inurl, allinurl       •   Phonebook
•   Filetype              •   Rphonebook
•   Allintext             •   Bphonebook
•   Site                  •   Author
•   Link                  •   Group
•   Inanchor              •   Msgid
•   Daterange             •   Insubject
•   Cache                 •   Stocks
•   Info                  •   Define
      Controlling Searches (II)
• These operators can be used to restrict
  searches.
• To restrict the search to the university:
  site:sunderland.ac.uk
• Or to search for seventh moon merlot in the
  uk: “seventh moon” merlot site:uk
          Typical Filetypes
•   Pdf
•   Ps
•   Xls
•   Ppt
•   Doc
•   Rtf
•   Txt
              Why Google
• You access Google, not the original
  website.
• Most crackers access any site, even Google
  via a proxy server.
• Why? If you access the cached web page
  and it contains images, you will get the
  images from the original site.
              Directory Listings
•   Search for intitle:index.of
•   Or intitle:index.of “parent directory”
•   Or intitle:index.of name size
•   Or intitle:index.of inurl:admin
•   Or intitle:index.of filename
•   This can then lead to a directory traversal
•   Look for filetype:bak, too, particularly if you want
    to expose sql data generated on the fly
    Commonly Available Sensitive
           Information
•   HR files
•   Helpdesk files
•   Job listings
•   Company information
•   Employee names
•   Personal websites and blogs
•   E-mail and e-mail addresses
          Network Mapping
• Site:domain name
• Site crawling, particularly by indicating
  negative searches for known domains
• Lynx is convenient if you want lots of hits:
  – lynx -dump “http://www.google.com/search?\
  – q=site:name+-knownsite&num=100” >\
  – test.html
• Or use a Perl script with the Google API
              Link Mapping
• Explore the target site to see what it links
  to. The owners of the linked sites may be
  trusted and yet have weak security.
• The link operator supports this kind of
  search.
• Also check the newsgroups for questions
  from people at the organization.
 Web-Enabled Network Devices
• The Google webspider often encounters
  web-enabled devices. These allow an
  administrator to query their status or
  manage their configuration using a web
  browser.
• You may also be able to access network
  statistics this way.
     Searches to Worry About
• Site:                  • Admin|administrator
• Intitle:index.of       • -ext:html -ext:htm
• Error|warning            -ext:shtml -ext:asp
• Login|logon              -ext:php
• Username|userid|empl   • Inurl:temp|inurl:tmp|
  oyee.ID| “your           inurl:backup|inurl:bak
  username is”
                         • Intranet|help.desk
• Password|passcode|
  “your password is”
       Protecting Yourselves
• Solid security policy
• Public web servers are Public!
• Disable directory listings
• Block crawlers with robots.txt
• <META NAME=“ROBOTS”
  CONTENT=“NOARCHIVE”>
• NOSNIPPET is similar.
            More Protection
• Passwords
• Delete anything you don‟t need from the
  standard webserver configuration
• Keep your system patched.
• Hack yourself
• If sensitive data gets into Google, use the
  URL removal tools to delete it.