Beyond Google Workshop Contents Overview Inside Google Searching v Browsing 10 tips for using Google effectively Links to Learning Assignment Google: Overview & Uses In a very short space of time the Internet has entered into the very fabric of our lives. For an increasing number of people the information technology driving the Internet has become a 'utility' comparable with electricity, water and gas in homes, work places and in lesure activities. We take it for granted and often, use it without thinking. This unthinking, uncritical use is at its most evident when we search for information. We seem to find what we need and often what we want. Magically, the information appears before us in thousands of 'results' from a search engine. Behind the technologies running such search engines however, are complex processes, organisations and motivations which effect our lives through the 'hits' they give us. Controlling how we find what we (think) we are looking for has become as important as the question of who controls the Internet itself. The battle for this control seems to have been won, for the moment. Early search engines such as Alta Vista, Excite, Inktomi, LookSmart, Lycos, Alltheweb, and Yahoo are now largely unkown partly because of their desire to control the movement of Web users and keep them looking at the web from their own services. But mostly the disappearance of these search engines is due to the growth and dominance of Google. Google: • controls 86% of all searches in the UK in 2007 (Hitwise). • has a market capitalization of $207.6 billion. • controls 40% of all online advertising (HipMojo). • employed almost 16,000 people at the end of 2007, a 50% increase over 2006 • became the No. 1 brand in the world in 2007 (Millward Brown Brandz Top 100). For something so integral to our information 'habits', the general lack of basic knowledge about how Google works can be surprising. What is Google searching? How does it decide what to search? Who decides which web page comes first in the results of a search? Are any pages excluded? And how is it that Google can remember your previous searches and allow you to search those as well? Essentially any search engine is a tool that lets you explore databases containing the text from hundreds of millions of pages on the Internet. When the search engine software finds pages that match your search request (often referred to as 'hits'), it presents them to you with brief descriptions and clickable links to take you there. There are two obvious processes at work here: 1. your input (what you search for) 2. the output (what the search engine tells you has been found) If we can understand a little more about the second process then we should be able to improve our use of the first. Inside Google Looking inside the 'engine' of Google we find: 1. automated programs (googlebots) which crawl through the web looking for new information 2. an 'index' which stores the information found similar in kind to the index of a book 3. software that gives the search interface (what you see when you look at the search page) and which ranks the list of the pages corresponding to the search request. The first element, automated programs (the details of which are not important to us), remind us of the fact that the index created by Google is not, in the main, created by human hand. There are no human web surfers working for Google and indexing vast numbers of new or updated information on the web. The second alerts us to the fact that the index created by those 'Googlebots' is always a historical snapshot and so always slightly outdated (the 'bots' may check popular web pages every few seconds for updates but for less popular sites the check may be made only every few weeks). Because of the speed at which new information is added even the best automated program can only locate and index a fraction of the Web pages that are out there on the Internet. Currently, Google has the largest index amongst popular search engines with 4.2 billion pages located and indexed. However, it is the third component which has been at the heart of Google's success since it was launched in 1998 and which is crucial to our effective use of the service. When we search for information on Google, how does the search engine know that the results it gives us are what we are looking for? And if there is more than one result, how do they order them into a rank from most useful to least useful? Larry Page and Sergey Brin, graduates of Stanford University in the US and originators of Google, explain their thinking behind the answer to this question in a paper they wrote about PageRanking: 'The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them.' (from: The PageRank Citation Ranking: Bringing Order to the Web [http://dbpubs.stanford.edu:8090/pub/1999-66] by Page, Brin, Motwani, Winograd) Page and Brin used citation notation, a concept which is popular in academia. Citations have many functions in academic writing, one of which is akin to 'reputation management'. If a paper I write is cited numerous times, it is understood to have had an impact and to have value: a citation acts a little like a vote. Obviously some votes count more than others. My Mum quoting my paper in the local paper will not enhance my professional reputation in the same way as a citation in the International Journal of Cultural Studies by Dr X at the University of Y. Being cited by prestigious professors in authoritative academic journals is a lot like receiving a 5-star vote. On the web, links act as citations and count as votes: and again, some votes count more than others. My own position in the ranking and the power of my vote for others depends upon my authority: how many people link to me and how trustworthy those links are. So, I publish on the web. My Mum links to me from her personal blog. Yahoo links to me from its directory pages. For Google it is clear which link will improve my reputation and so push me up the page ranking. To summarise: The value of a web page is determined by a combination of 1. the number of other pages that are linked to it; and 2. the value of each of those links. So the potential value of information in your search results rankings is determined by analysing links. If you want to know more about how Google works have a look at this video lecture by John Dean of Google.com: http://www.researchchannel.org/asx/ uw_cse05_google_1300k.asx. Before we look at how to make effective searches of Google's index we need to examine one more element of exploring information on the Internet. Searching versus Browsing There is an important distinction between searching and (topic) browsing which signals an important difference for finding useful information. We have seen that Google uses automated programs to search, find and index pages on the Web. However, there are more 'human' services which do some of this work for us. Topic directories such as the one illustrated here are compiled by people who select and classify Web sites based on content. Yahoo maintains its own staff of 'Yahoo! Surfers' who visit selected sites and decide where they best belong in the directory. When we search the directory we are searching an index that has been prepared by others. Intute (www.intute.ac.uk ) is another directory of resources on the Internet that are evaluated and selected by a network of subject specialists to meet the needs of Higher and Further Education in the UK. When you search the Intute directories you are likely to get far fewer 'hits' in your results, but you can be confident that those results have been vetted for their quality by subject specialists. You may reach a more appropriate and relevant resources this way than by searching the 4.2 billion pages indexed by Googlebots. 10 tips for using Google effectively 1. You can enter up to 32 words in the search box (it used to be 10) 2. The default operator is AND so student representative will retrieve pages with both student and representative (you don't need to type AND). 3. Use OR to find related or synonymous terms • e.g. student services OR service OR representatives 4. Use the plus (+) and minus (-) signs in front of words to force their inclusion and/or exclusion in searches. Example: +meat -potatoes (NO space between the sign and the keyword) 5. Use double quotation marks (" ") around phrases to ensure they are searched exactly as is, with the words side by side in the same order. Example: "standing on the shoulder of giants" (Do NOT put quotation marks around a single word.) 6. Put your most important keywords first in the string. Example: dog breed family pet choose 7. Use wildcards (e.g., *) to look for variations in spelling and word form. Example: colo*r returns colour (British English spelling) and color (American English spelling); theor* will return theory, theories, theorist, theorists, theoretical, theoretically etc. 8. Use the search options in searching for images, video, news, maps, and more (the list of specialist indexes is getting longer) 9. Use Google Help Central to learn more tips to improve you search strategies 10. Remember: a search retrieves results based on your terms and then ranks them according to: • word frequency (how many times query words appear in each document), • word order (first terms in query are given greater weight), • word proximity (how close terms are to each other), • word location (e.g. in title or heading) • PageRank (Google's link popularity algorithm based on how many other pages link to this one similar to cited reference searching) among other complex statistical algorithms. Links to Learning History of search engines (http://www.searchenginehistory.com/) Wikipedia PageRank article (http://en.wikipedia.org/wiki/PageRank) Google Help Centre (http://www.google.co.uk/help/) How Google works - flash tutorial (http://www.portfolio.com/images/site/editorial/Flash/google/ google.swf) Google: A behind the scenes look (http://www.researchchannel.org/prog/ displayevent.aspx?rID=3898&fID=345 ) Google scholar FAQ (http://library.cord.edu/sonsteby/google_scholar.html) Information Revolution (http://youtube.com/watch?v=-4CV05HyAbM) Assignment Complete the Informs Advanced Searching online tutorial. Write a blog post comparing the results of a search made with Google to one using Intute. Use terms relating to an assignment you are preparing or resources for a particular unit you are taking.