hathitrust-update-201001 by xiagong0815


									         HathiTrust Digital Library
         Update On January Activities                                                        In This Newsletter

Top News                                                             February 12, 2010   Top News
                                                                                           • New Cost Model
New Cost Model – The HathiTrust ex-         workflow documentation. Guided by
                                                                                           • Disaster Recovery Plans
ecutive committee approved a new cost       industry standards such as TRAC and
                                                                                           • Digital Library Profile
model for partnership in December that      best practices in the digital preservation
                                                                                         Working Groups
will be adopted by all partners begin-      community, the committee will ensure
                                                                                           • Quality
ning in 2013. In the new model, partners    a high level of preparedness for known
                                                                                           • Discovery Interface
will share in the cost of public domain     and unknown risks to the long-term in-
                                                                                           • Development Environment
and open access volumes preserved in        tegrity and use of materials in the repos-     • Storage
HathiTrust, and in the cost of in copy-     itory. A preliminary meeting of key staff    Ingest
right volumes that they hold, or have       will occur in February, and membership         • Internet Archive Ingest
held, in their physical collections. The    in the Disaster Recovery Planning Com-         • Non-Google Ingest
model will distribute the costs of curat-   mittee will be finalized soon thereafter.    Development Updates
ing and managing the digital collections                                                   • Shibboleth
                                            Digital Library Profile – As part
in a way that more accurately reflects                                                     • Data API
                                            of its participation in an NSF EAGER
the benefits each partner receives from                                                    • Large-scale Search
                                            grant awarded in September 2009, Ha-
deposited volumes. It will also allow in-                                                  • PageTurner
                                            thiTrust completed a technological pro-
stitutions to join HathiTrust who do not
                                            file of its repository based on two frame-              New Growth
necessarily have content to deposit, but
                                            works developed by Johns Hopkins
who wish to support and benefit from                                                      Number of volumes added:
                                            University. The profile can be found at
the long-term curation and access ser-                                                                 January   Total
vices that HathiTrust provides. Such
institutions are eligible for partnership   Working Groups                                Indiana
                                                                                                       38,344    151,816
effective immediately, and do not need
                                            Quality – In July 2009, the Strate-           Penn
to wait for the 2013 general adoption.                                                                 0         5016
                                            gic Advisory Board (SAB) assembled a          State
Details of the new cost model are avail-
                                            working group to investigate issues sur-      Univ. of
                                                                                                       972       1,156,339
able at http://www.hathitrust.org/doc-                                                    California
                                            rounding the quality of partner institu-
uments/hathitrust-cost-rationale-2013.                                                    Univ. of
                                            tion volumes downloaded from Google.                       71,094    3,730,968
pdf. Please contact hathitrust-info@                                                      Michigan
                                            The working group was asked to re-
umich.edu for additional information                                                      Univ. of
                                            search and provide recommendations                      691          268.044
and inquiries about partnership.                                                          Wisconsin
                                            on a quality threshold HathiTrust uses        Total        104,342   5,312,183
Disaster Recovery Planning – Fol-           to limit ingest of poor quality volumes.
lowing an evaluation of disaster pre-       The working group presented its recom-        5,384 public domain volumes
                                                                                          were added in December, bring-
paredness performed last summer by an       mendations to the SAB in January and          ing the total number to 764,331
IMLS-funded intern, and the hiring of a     the SAB decided to continue the work-         (about 14% of total content).
preservation librarian in November, the     ing group with a revised and expanded
University of Michigan is taking steps to   charge. The new charge is to a) develop a
formalize and expand HathiTrust’s poli-     set of quality principles for HathiTrust,
cies and practices relating to disaster     b) monitor quality control as related to
recovery. The UM preservation librar-       user experience, c) track developments
ian is leading a process to form a Disas-   in a separate quality working group es-
ter Recovery Planning Committee and,        tablished by Google and Google library
with support of a winter intern from the    partners following the Google partner
UM School of Information, has begun         summit in October, and d) evaluate Ha-
to gather key inventory, personnel, and     thiTrust practices with regard to thresh-
          HathiTrust Digital Library
          Update On January Activities                                                              February Forecast

                                                                                             •    Complete and deploy Shib-
olding or limiting ingested content.          with making recommendations on a                    boleth authentication sup-
Membership in the new group, called           third instance of storage for HathiTrust            port
the HathiTrust Quality Ingest and Error       presented its final report to the Execu-       •    Complete quality assurance
Rate Working Group, is currently being        tive Committee in January. The group                processes for pilot of UC’s
determined.                                   concluded that although there were                  Internet Archive-digitized
                                              significant benefits to implementing a              materials and begin ingest
Discovery Interface – With the ver-
                                              third instance of storage, given the high           into the repository
sion 1 catalog beta release only a few
                                              level of preservation confidence in Ha-        •    Continue large-scale search
months away, the Discovery Interface
                                              thiTrust and the absence of economic                performance monitoring
Working Group is turning its focus to
                                              conditions favorable for acquiring and         •    Make progress toward the
the usability of the catalog and its inte-
                                              operating new storage, there was no                 integration of Collection
gration with existing HathiTrust Digi-
                                              urgency in establishing a new instance.             Builder functionality in full-
tal Library services (Collection Builder,
                                              The group noted, however, that Ha-                  text search results
Page Turner, and Full-Text Search).
                                              thiTrust should be prepared to estab-
The Working Group formed a usability
subgroup, which will collaborate with
                                              lish a third instance of storage if such                Presentations
                                              a course becomes more economically
staff at OCLC to begin usability testing                                                         NISO Webinar       Feb 10
of the catalog before it is released. Test-
                                                                                             Please see http://www.hathitrust.
ing will also be performed in post-re-        The Executive Committee would like to          org/papers for links to all Ha-
lease phases. Aspects of the pre-release      solicit broader feedback from partner          thiTrust presentations, papers, and
analysis will include verifying accurate      institutions regarding these recommen-         reports.
functionality and fulfillment of agreed-      dations (especially from a collection de-
upon requirements.                            velopment perspective), and requests
                                              that thoughts on the report and a third
In preparation for loading HathiTrust
                                              instance of storage be sent by email to
volumes into Worldcat for the version
                                              hathitrust-info@umich.edu. Those who
1 release, staff at UM provided an API
                                              wish to remain anonymous should indi-
that will allow OCLC to display Ha-
                                              cate this in their email. The full report of
thiTrust volume information in World-
                                              the working group is available at http://
cat records.
Collaborative Development Envi-               age.
ronment – UM staff have been gather-
ing specific topics for the working group     Ingest
to discuss when it reconvenes (now            General – Ingest rates were low in
planned for late February), and have          January, due in part to challenges UC
developed a draft timeline for the steps      experienced in retrieving bibliographic
ahead. A message to reassemble the            records from one of its systems. UM
group was sent in early February, and         loaded the first set of bibliographic re-
scheduling is underway. The area the          cords for Minnesota, but could not be-
group will address first is the design of     gin ingest because of problems with
a version control system. UM staff have       Google’s delivery of the content files.
also begun to research the GlusterFS          Ingest numbers from other institu-
cluster file system as a storage back-end     tions were also low because HathiTrust
for the environment.                          caught up with the rate that partner vol-
Storage – The working group tasked            umes were made available from Google.
          HathiTrust Digital Library
          Update On January Activities

Internet Archive Ingest –UM began              Data API – In January, staff at the         time devoted to the ingest of materials
testing validation routines on a batch         University of Michigan began work on        from the Internet Archive decreases.
of 200 volumes of Internet Archive-            a web application that will use the Data
                                                                                           Outages – There were no outages in
digitized volumes from the University          API to facilitate the location and down-
of California in January. The teams are        load of complete book packages for
revising validation strategies based on        public domain volumes not digitized by
the findings of these tests and the re-        Google. The application is being creat-
sults of quality assurance performed           ed entirely with data and services avail-
by UC staff on transformed, but not yet        able to the general public and is meant
ingested objects. UM and UC will pro-          to demonstrate uses that can be made
ceed with the ingest pilot in February,        of the API. The first step of crawling
testing all aspects of bibliographic and       the repository for eligible volumes is in
content loading, validation, and access.       progress, and release of a beta version
Completion of the pilot is projected for       of the application is expected in Febru-
late February.                                 ary.
New Programmer For Non-Google                  Large-scale Search – UM improved
Ingest – UM extended the bidding pe-           logging and log analysis in January,
riod for the new programmer position           enabling staff to monitor search per-
through mid-January, and several new           formance in a way that more closely re-
qualified candidates have been inter-          sembles the user’s experience. UM staff
viewed. UM staff are in the final stages       documented changes to large-scale
of selecting candidates, and expect to         search hardware in a new blog post en-
have a new full-time staff member and          titled “Scaling up Large Scale Search
a new part-time staff member on board          from 500,000 volumes to 5 Million vol-
by the end of February.                        umes and beyond”.

Development Updates                            New index servers were ordered for the
                                               Indiana site and are scheduled to be in
Shibboleth – Shibboleth implemen-              service before the end of March. The
tation in HathiTrust is nearly complete.       current index release process already
Major portions of the code are in place        synchronizes an updated version of the
and UM staff have begun to contact             index to be stored in Indiana on a daily
partner institutions to exchange infor-        basis. Acquisition of the new hardware
mation that will allow individuals from        will provide full redundancy of the
partner institutions to authenticate           large-scale search application servers
into HathiTrust. Initial benefits to part-     as well. Two additional servers that will
ners will be increased facility in creat-      be used exclusively for index building
ing personal collections in Collection         are on their way to the Michigan site,
Builder and full-PDF download of all           and one server originally purchased for
public domain volumes. Non-partners            production service is being re-purposed
will still be able to create collections us-   for testing and development.
ing the University of Michigan “friend
account” system. Deployment of Shib-           PageTurner – PageTurner develop-
boleth is planned for March.                   ment was slowed in January but will
                                               pick up in February and March as staff

To top