Docstoc

Google Print™_ Million Book Project_ and Google Scholar

Document Sample
Google Print™_ Million Book Project_ and Google Scholar Powered By Docstoc
					Google Print™, Million Book Project, and Google Scholar™
Presented to the FAO in Rome February 2005 Gloriana St. Clair Dean of University Libraries

“Commercialize the great research libraries with a handshake, suddenly and epochally.”
Rory Litwin, in Library Juice1

“This is the day the world changes.”
John Wilkin, University of Michigan2

Thesis




Google‟s new projects are exciting and, of course, commercial This talk will compare Google Print™ with the NSF-funded Million Book Project, and then touch briefly on Google Scholar™

Main Points
 



Why / Genesis - Leaders, Partners Realities - Collections, Logistics Worries – Duplication, Copyright, Copyright, Copyright, Printing . . .

Sources For This Talk


News / web / talks / interviews, with help:


     

Jean Alexander, Head, and the Hunt Library Reference Department Denise Troll Covey, Special Projects Librarian Missy Harvey, Computer Science Librarian Penn State Reference Department David Seaman, Digital Library Federation Anthony Tomasic, E-XMLMedia Michael Lesk, Rutgers University

Google Print™ Leaders/Partners
 


 



Google, Inc. U. Michigan Stanford University Harvard University U. Oxford New York Public Library

Million Book Project Leaders/Partners in India
         



Indian Institute of Science International Institute of Information Technology Indian Institute of Information Technology Anna University Mysore University University of Pune Goa University Tirumala Tirupati Devasthanams Shanmugha Arts, Science, Technology & Research Academy Arulmigu Kalasalingam College of Engineering Maharashtra Industrial Development Corporation

Million Book Project Leaders/Partners in China




 




Chinese Academy of Science Chinese Ministry of Education Fudan University Nanjing University Peking University Tsinghua University Zhejiang University

Google Print™ Collections
 





Stanford – entire collection Harvard – 40,000-volume pilot from a 15-million volume collection U. Michigan – virtually the entire collection; add seven million to search engine; Michigan to “receive and own a high quality digital copy”3 and provide access New York Public Library – a subset of a 20million volume collection; selection criteria = in public domain (1923), interesting, not too fragile

Million Book Project Targeted Subcollections
 



Books for College Libraries (best books) University presses / scholarly societies (copyright permissions work) U.N.‟s Food and Agriculture Organization content

Google Print™ Handling the Copyright Issue


Displays “a snippet of text”4 online for books in copyright



A „snippet‟ is defined as three lines A search returns three snippets per book, and lists the number of times your search terms appear in the book



BUY button

Million Book Project Handling the Copyright Issue


After extensive work, we are experiencing growing success in efforts to gain permission from university presses / scholarly societies to digitize books in searchable full text

Million Book Project Research Initiatives




  

Machine translation Massive distributed database Storage formats Use of digital libraries Distribution and sustainability




 

 

Security Search engines Image processing Optical Character Recognition (OCR) Language processing Copyright laws

Google™ began as a research project at Stanford in 1995.

Google Print™ Logistics


“Google will be doing all the digitizing with their own staff at Google headquarters and supposedly at Harvard and Michigan.”5
  

Six-year time frame 2.25 books per minute Onsite

Million Book Project Logistics
● With scanning time @ one page per second:
● 20,000 pages per day shift x 200 working days per year ● 100 years to scan 1 million books ÷ (number of operators/machines)

● Several mega scanning centers are set up in India and China

Million Book Project Finances






India - $25M annually to support a large set of language translation research projects China - $8.46M from Ministry of Education over 3 yrs (2006) United States - $3.63M from NSF over 4 yrs (2005); and equipment, staff and money from the Internet Archive

Google Print™ has funding of $???, but estimates costs at $10 per book.

Worries


Duplication


“De-duplication is NOT part of the [Google Print™] process. NOTE Stanford is interested in having multiple copies of the same materials across various partners.”6

Million Book Project will use OCLC’s Digital Registry as soon as batch loading is available.

Worries


Copyright




Google will be responsible for determining what‟s in copyright.”7 “A team is working on copyright issues but, in the meantime, Google is treating [copyright] conservatively.”8 “Google will disable printing for out-of-copyright books.”9



Printing


More Worries Google Print™


Rory Litwin, “On Google‟s Monetization of Libraries”10 1. Privacy [cookies] 2. Introduction of commercial bias 3. Questions about democratization and equity of access 4. Disintermediation issues 5. Decontextualization of knowledge 6. Closing of the information commons

More Worries Million Book Project
1. 2. 3. 4. Getting it done Sustainability Cohesion of content Usefulness

Google Scholar™ Beta


Reviewed by Péter’s,11 Anthony Tomasic, and reference librarians at Carnegie Mellon and Penn State:


  

Not as good as Citebase, Research Index, RePEc/LogEc (Péter’s) Not as good as CiteSeer (Tomasic) Not as suitable as CiteSeer (Lesk) Not as good as Google press releases indicate (St. Clair)

Google Scholar™ Beta


What:12




  



Offers free access to bibliographic records and some abstracts May lead to full text if the university library subscribes or if free-to-read May lead to a document delivery company Does not penetrate the invisible Web Has significantly enlarged the scope by crawling additional publishers, preprint and reprint servers Competes with other aggregators, such as SFX

Google Scholar™ Beta


What:




Meets the needs of students looking for a different kind of material, and targets advertising to them It is easy for a human to identify a scholarly article, but it is a challenge for a machine (Tomasic)

Additional Challenges for a Better Scholarly Search Engine13
  

 



Exploit highly structured and tagged web pages with rich metadata from scholarly publishers Create field-specific indexes for many distinct data elements Offer advanced navigation with pull-down menus for limited search by document type, publisher, publication year, journal Consolidate cited references Collect information from all relevant materials Develop utilities to help libraries find all materials subscribed to, not just one path

Thank you


Gloriana St. Clair Dean of University Libraries Carnegie Mellon University gstclair@andrew.cmu.edu or 412-268-2447

If you would like an electronic copy of this talk, contact Cindy Carroll, stell@cmu.edu

Endnotes
1. 2. 3. Litwin, Rory. “On Google‟s Monetization of Libraries. Library Juice 7,26 (December 17, 2004). Available: http://www.libr.org/Juice/issues/vol7/LJ_7.26.html#3. Wilkin, John. Quoted in “Google to Scan Books from Major Libraries.” MSNBC Tech News & Reviews. Available: http://www.msnbc.msn.com/id/6709342. University of Michigan (Nancy Connell). “Google/U-M Project Opens the Way to Universal Access to Information .“ University of Michigan News Service (December 14, 2004). Available: http://www.umich.edu/news/?Releases/2004/Dec04/library/index. University of Michigan. “Google/U-M Project Questions and Answers.” The University Record Online (January 7, 2005). Available: http://www.umich.edu/~urecord/0405/Dec13_04/lib_qa.shtml.

4.

Endnotes
5. Misseli. “The Google Deal (Down on the Farm).” Message posted by a Stanford staff member to Confessions of a Mad Librarian. Available: http://edwards.orcas.net/~misseli/blog/archives/000222.html. Ibid. Ibid. Adam Smith, Senior Business and Product Manager for Google Print and Google Scholar, speaking informally with the ALA Electronic Text Centers Discussion Group. American Library Association Mid-Winter Conference (January 15, 2005). Price, Gary. “Google Partners with Oxford, Harvard & Others to Digitize Libraries.” Search Engine Watch (December 14, 2004). Available: http://searchenginewatch.com/searchday/article.php/3447411.

6. 7. 8.

9.

Endnotes
10. Litwin. 11. Péter’s Digital Reference Shelf. “Google Scholar Beta.” (December 2004). Available: http://www.galegroup.com/servlet/HTMLFileServlet?imprint=9999&r egion=7&fileName=reference/archive/200412/googlescholar.html. 12. Ibid. 13. Ibid.


				
DOCUMENT INFO
Shared By:
Tags:
Stats:
views:115
posted:8/12/2009
language:English
pages:28