Beyond Google

Document Sample
Beyond Google Powered By Docstoc
					Beyond Google Workshop

       Contents

       Overview

       Inside Google

       Searching v Browsing

       10 tips for using Google effectively

       Links to Learning

       Assignment




Google: Overview & Uses

In a very short space of time the Internet has entered into the very fabric of our lives. For an

increasing number of people the information technology driving the Internet has become a 'utility'

comparable with electricity, water and gas in homes, work places and in lesure activities. We take it

for granted and often, use it without thinking. This unthinking, uncritical use is at its most evident

when we search for information. We seem to find what we need and often what we want. Magically,

the information appears before us in thousands of 'results' from a search engine. Behind the

technologies running such search engines however, are complex processes, organisations and

motivations which effect our lives through the 'hits' they give us. Controlling how we find what we

(think) we are looking for has become as important as the question of who controls the Internet itself.

The battle for this control seems to have been won, for the moment. Early search engines such as Alta

Vista, Excite, Inktomi, LookSmart, Lycos, Alltheweb, and Yahoo are now largely unkown partly

because of their desire to control the movement of Web users and keep them looking at the web from

their own services. But mostly the disappearance of these search engines is due to the growth and
dominance of Google.

Google:

      • controls 86% of all searches in the UK in 2007 (Hitwise).

      • has a market capitalization of $207.6 billion.

      • controls 40% of all online advertising (HipMojo).

      • employed almost 16,000 people at the end of 2007, a 50% increase over 2006

      • became the No. 1 brand in the world in 2007 (Millward Brown Brandz Top 100).



For something so integral to our information 'habits', the general lack of basic knowledge about how
Google works can be surprising. What is Google searching? How does it decide what to search? Who

decides which web page comes first in the results of a search? Are any pages excluded? And how is it

that Google can remember your previous searches and allow you to search those as well?

Essentially any search engine is a tool that lets you explore databases containing the text from

hundreds of millions of pages on the Internet. When the search engine software finds pages that

match your search request (often referred to as 'hits'), it presents them to you with brief descriptions

and clickable links to take you there. There are two obvious processes at work here:

      1. your input (what you search for)

      2. the output (what the search engine tells you has been found)

If we can understand a little more about the second process then we should be able to improve our use

of the first.



Inside Google

Looking inside the 'engine' of Google we find:

      1. automated programs (googlebots) which crawl through the web looking for new information

      2. an 'index' which stores the information found similar in kind to the index of a book

      3. software that gives the search interface (what you see when you look at the search page) and

          which ranks the list of the pages corresponding to the search request.

The first element, automated programs (the details of which are not important to us), remind us of the

fact that the index created by Google is not, in the main, created by human hand. There are no human

web surfers working for Google and indexing vast numbers of new or updated information on the web.

The second alerts us to the fact that the index created by those 'Googlebots' is always a historical

snapshot and so always slightly outdated (the 'bots' may check popular web pages every few seconds

for updates but for less popular sites the check may be made only every few weeks). Because of the

speed at which new information is added even the best automated program can only locate and index

a fraction of the Web pages that are out there on the Internet. Currently, Google has the largest index
amongst popular search engines with 4.2 billion pages located and indexed. However, it is the third

component which has been at the heart of Google's success since it was launched in 1998 and which is

crucial to our effective use of the service.



When we search for information on Google, how does the search engine know that the results it gives

us are what we are looking for? And if there is more than one result, how do they order them into a

rank from most useful to least useful?
Larry Page and Sergey Brin, graduates of Stanford University in the US    and originators of Google, explain their thinking behind the

answer to this question in a paper they wrote about PageRanking:



          'The importance of a Web page is an inherently subjective matter, which depends on the

          readers interests, knowledge and attitudes. But there is still much that can be said

          objectively about the relative importance of Web pages. This paper describes PageRank,

          a method for rating Web pages objectively and mechanically, effectively measuring the

          human interest and attention devoted to them.' (from: The PageRank Citation Ranking:

          Bringing Order to the Web [http://dbpubs.stanford.edu:8090/pub/1999-66] by Page,

          Brin, Motwani, Winograd)


Page and Brin used citation notation, a concept which is popular in academia. Citations have many functions in academic writing, one of

which is akin to 'reputation management'. If a paper I write is cited numerous times, it is understood to have had an impact and to have value: a

citation acts a little like a vote. Obviously some votes count more than others. My Mum quoting my paper in the local paper will not enhance my

professional reputation in the same way as a citation in the International Journal of Cultural Studies by Dr X at the University of Y. Being cited by

prestigious professors in authoritative academic journals is a lot like receiving a 5-star vote. On the web, links act as citations and count as votes: and

again, some votes count more than others. My own position in the ranking and the power of my vote for others depends upon my authority: how

many people link to me and how trustworthy those links are. So, I publish on the web. My Mum links to me from her personal blog. Yahoo links to

me from its directory pages. For Google it is clear which link will improve my reputation and so push me up the page ranking.

To summarise:

The value of a web page is determined by a combination of 1. the number of other pages that are

linked to it; and 2. the value of each of those links. So the potential value of information in your search

results rankings is determined by analysing links. If you want to know more about how Google works

have a look at this video lecture by John Dean of Google.com: http://www.researchchannel.org/asx/

uw_cse05_google_1300k.asx.

Before we look at how to make effective searches of Google's index we need to examine one more element of exploring information on the Internet.



Searching versus Browsing

There is an important distinction between searching and (topic) browsing which signals an important

difference for finding useful information.
                                                                                           We have seen

                                                                                           that Google

                                                                                           uses

                                                                                           automated

                                                                                           programs to

                                                                                           search, find

                                                                                           and index

                                                                                           pages on the

                                                                                           Web.

                                                                                           However,

                                                                                           there are more

                                                                                           'human'

                                                                                           services which

do some of this work for us. Topic directories such as the one illustrated here are compiled by people

who select and classify Web sites based on content. Yahoo maintains its own staff of 'Yahoo! Surfers'

who visit selected sites and decide where they best belong in the directory. When we search the

directory we are searching an index that has been prepared by others.



Intute (www.intute.ac.uk ) is another directory of resources on the Internet that are evaluated and

selected by a network of subject specialists to meet the needs of Higher and Further Education in the

UK. When you search the Intute directories you are likely to get far fewer 'hits' in your results, but you

can be confident that those results have been vetted for their quality by subject specialists. You may

reach a more appropriate and relevant resources this way than by searching the 4.2 billion pages

indexed by Googlebots.


10 tips for using Google effectively

     1. You can enter up to 32 words in the search box (it used to be 10)

     2. The default operator is AND so student representative will retrieve pages with both student

         and representative (you don't need to type AND).

     3. Use OR to find related or synonymous terms

                • e.g. student services OR service OR representatives
     4. Use the plus (+) and minus (-) signs in front of words to force their inclusion and/or

          exclusion in searches.

          Example: +meat -potatoes

          (NO space between the sign and the keyword)

     5. Use double quotation marks (" ") around phrases to ensure they are searched exactly as is,

          with the words side by side in the same order.

          Example: "standing on the shoulder of giants" (Do NOT put quotation marks around a single

          word.)

     6. Put your most important keywords first in the string.

          Example: dog breed family pet choose

     7. Use wildcards (e.g., *) to look for variations in spelling and word form. Example: colo*r

          returns colour (British English spelling) and color (American English spelling); theor* will

          return theory, theories, theorist, theorists, theoretical, theoretically etc.

     8. Use the search options in searching for images, video, news, maps, and more (the list of

          specialist indexes is getting longer)

     9. Use Google Help Central to learn more tips to improve you search strategies

    10.   Remember: a search retrieves results based on your terms and then ranks them according

          to:

      • word frequency (how many times query words appear in each document),

      • word order (first terms in query are given greater weight),

      • word proximity (how close terms are to each other),

      • word location (e.g. in title or heading)

      • PageRank (Google's link popularity algorithm based on how many other pages link to this

          one similar to cited reference searching) among other complex statistical algorithms.



Links to Learning
History of search engines (http://www.searchenginehistory.com/)

Wikipedia PageRank article (http://en.wikipedia.org/wiki/PageRank)

Google Help Centre (http://www.google.co.uk/help/)

How Google works - flash tutorial (http://www.portfolio.com/images/site/editorial/Flash/google/

google.swf)

Google: A behind the scenes look (http://www.researchchannel.org/prog/

displayevent.aspx?rID=3898&fID=345 )

Google scholar FAQ (http://library.cord.edu/sonsteby/google_scholar.html)
Information Revolution (http://youtube.com/watch?v=-4CV05HyAbM)



Assignment

Complete the Informs Advanced Searching online tutorial.

Write a blog post comparing the results of a search made with Google to one using Intute. Use terms

relating to an assignment you are preparing or resources for a particular unit you are taking.

				
DOCUMENT INFO
Shared By:
Categories:
Tags: Google, search
Stats:
views:782
posted:4/5/2008
language:English
pages:6
Description: Materials for workshop
Clive McGoun Clive McGoun
About