Extending Data Curation to the Humanities Curriculum Development

Document Sample
Extending Data Curation to the Humanities Curriculum Development Powered By Docstoc
					        Extending Data Curation to the Humanities: Curriculum Development and Recruiting

Assessment of Need
Disciplines such as art, history, literature, music, philosophy, jurisprudence, and others embody a record of the human
condition. For centuries libraries and museums have been in the business of keeping, and providing access to this record
— as primary and secondary sources, and artifacts of cultural and historical significance. The humanities themselves are
an information-based domain in a double sense. Both the development of data on the one hand, and its exploitation in new
scholarship, education, culture on the other, are intrinsic to these disciplines and thoroughly intertwined.
    “Humanists have traditionally viewed locating and compiling information, or ‘data,’ as a basic task of scholarship.”
    (Nichols, 2007).
Data collection, scholarship, and curation have therefore always intersected in the humanities. Particularly impressive
curatorial practices are obvious in such areas as manuscript studies, bibliography, traditional textual criticism, and
archaeology, but in fact highly theorized curatorial methods are evident in many traditional fields and disciplines.
At the heart of these traditional curatorial pursuits is a quest for authenticity and provenance. But at the same time other
curatorial objectives are in play, not only preservation, of course, but the maintenance of systems that ensure that data can
be found, can be reliably referred to, is understood in context, is in useable formats, and can be connected with recent
relevant findings that extend or qualify it. All of these curatorial objectives assume new, different dimensions, and present
new challenges (as well as opportunities), when the object of study and the corresponding tools for discovery,
exploitation, and enhancement are digital (Hedstrom, 1997, Cullen, Hirtle, Levy, Lynch & Rothenberg, 2000). From the
start of the data lifecycle, curation measures must be in place not only in order to establish authority and prevent data
corruption, deterioration, and loss, but to ensure maximum value in a digital context. (Thibodeau, 2002; Lavoie &
Gartner, 2005).
Recently the curation of data has been recognized as an e-Science problem, mainly because of the rapidly growing amount
of computationally created and processed science data (Hey & Trefethen, 2003). However, data curation is no less of an
emerging problem for the humanities. (Lesk, 2003; Crane, Babeu & Bamman, 2007) Increasingly, humanities scholarship
engages computing applications for the display, analysis, and understanding of, for example, works of art and literature,
events in history, archeological discoveries, or linguistic text corpora—all of which have stimulated the gathering of
digital data:
     [R]esearch in the humanities is becoming data-centric, with a large amount of data available in digital formats. These
    developments quickly change the landscape of humanities research. (Blanke, Dunn & Dunning, 2006).
The new digital context — data, tools, practices, and expectations — challenges the methods and techniques with which
we pursue traditional curatorial objectives such as authenticity, provenance, authoritative reference, and annotation. More
than that, it even challenges our understanding of those objectives. The need for improved understanding of curatorial
practices, and education of a humanities data curation workforce, is therefore urgent. This project addresses that need.

Identified Needs
To begin surveying this landscape for the purposes of this proposal, we conducted structured interviews with six leading
figures in digital humanities scholarship, including directors of digital archives, academic researchers, and PIs. Selected
themes that emerged from those interviews are:
    •    Humanities data need to include information that typically is not entered in the written record itself. Important
         examples include the rationale for methods chosen, the development of the project perspective over time, and
         workflow documentation.
    •    Curatorial practice must be informed by our best understanding of how the data will be used in research,
         scholarship, and general access, that is, by current and emerging methods of use.
    •    More and more, metadata becomes data. Without metadata the meaning of data is obscure and inaccessible,
         and metadata play a particular role in connecting data across domain boundaries as well as object boundaries.
    •    Because data formats and data models are necessarily changing, conducting data curation only as part of one’s
         own project does not scale well over time.
    •    21st-century scholarly digital publishing brings a host of new humanities data curation challenges.
According to one PI whose collaborative text-based project aggregates encoded content from a variety of different
production systems, additional data-related problems include: interoperability and transformation output difficulties
stemming from a lack of uniformity in the markup; deciding on a practical yet valuable level of data granularity to make
available to users; creating a user interface for the data to enable standard, usable visualizations of them; and crafting a
pedagogical dimension to the data to encourage user experimentation toward the discovery of new patterns and meanings.
These concerns have been echoed in the literature, and many organizations now acknowledge humanities data
preservation as an urgent problem. Preservation developments are monitored through activity summaries (DPC/PADI,
2007) and workshop reports (ARL, 2006), and possible solutions, such as the construction of data dictionaries and
schemas toward a preservation metadata framework, presented (http://loc.gov/standards/premis/). In the UK, data
preservation has long been considered a priority, attested by the forty-year existence of the UK Data Archive. The only
comparable effort in the United States is the Inter-University Consortium for Political and Social Research instituted in
1962 (ACLS, 2006).
Precedents are also developing, especially abroad, for applying e-Science methods to humanities scholarly practice. In the
UK, the Arts and Humanities Research Council (AHRC), along with information and communications technology
organizations, is funding projects that integrate e-Science tools or resources with research in the humanities. A
complementary initiative, also AHRC-based, is the Arts and Humanities e-Science Support Centre, offers advisory,
training, and outreach for humanities students to learn to apply the practices, concepts, and potentials of e-Science.

Intended Results
In 2006 GSLIS received grant funding from IMLS to create a Data Curation Education Program (DCEP) to develop
curriculum for the next generation of science data curators. The concentration in data curation at GSLIS built on faculty
research in technology development and information requirements in the sciences and solid instruction in metadata,
information modeling, document processing, ontology development, archival management, and information storage and
retrieval. Core courses are now in place in DCEP—notably Foundations of Data Curation and Digital Preservation—and
complementary elective courses, such as Biodiversity Informatics, are being offered or under development. In addition,
the first round of internships is set to begin in summer 2008.
From the beginning, we anticipated that the DCEP would offer parallel science and humanities educational concentrations
in the LIS masters program. Unlike in the UK, however, where “induction” into e-Science is the model for training arts
and humanities students (Arts and Humanities e-Science Support Centre), at GSLIS humanities data curation education
will be reciprocal, with rich interactions between the science and the humanities development activities. These two
educational areas correspond directly with two GSLIS research concentrations in the sciences and the humanities aimed at
advancing development and integration of digital information, coordinated within the GSLIS Center for Informatics
Research in Science and Scholarship (CIRSS) (http://cirss.lis.uiuc.edu/).
The specific objectives of the proposed Humanities Data Curation (HDC) program are to:
    1) Develop and refine a humanities data curation curriculum. This curriculum will be informed by DCEP and also
       build on other graduate programs at GSLIS. Among the outcomes will be new and updated courses and a series of
       case studies in humanities data curation.
    2) Develop a network of internship sites at libraries, museums, data centers, digital humanities centers, and centers
       of digital archive production where students can further develop and apply their growing expertise.
    3) Promote the role of LIS professionals in humanities data curation and expand the understanding of the role of
       digital data curation in humanities research.
    4) Disseminate best practice reports and provide a model curriculum for other data curation programs.
    5) Develop a model institute for delivering the curriculum as continuing professional development.
    6) Deliver the curriculum to both new masters students and continuing professionals.
The program will train a new generation of LIS professionals qualified as humanities data curators and provide continuing
education opportunities for practitioners currently in the field. Through workshops, conference presentations,
publications, and the influence of our graduates, it will also raise the visibility of the importance of information
professionals in managing our knowledge resources for many decades, possibly centuries, to come. These students are the
next-generation information professionals who will build and maintain humanities data applications, tools, and systems to

work in concert with the many digital libraries, archives, repositories, and museums, as well as the indexing systems,
metadata standards, ontologies, taxonomies, and vocabularies associated with digital data and products.
While many professions and disciplines have much to contribute to the development of digital data curation, there is no
doubt, contrary to Ross (2007), that the LIS field is well-grounded theoretically in areas of critical importance: collection
development, management, and preservation; cataloging, classification, and metadata; information storage and retrieval;
and user access, behavior, and services. This range of theory, and the related practice, is necessary for managing the entire
lifecycle of data. Humanities data require, in the words of one of our experts, “close curatorial attention,” not least
because of the access and preservation conundrums surrounding their inherently ephemeral character, but also because
they “introduce horizons of data re-use.” Data curation professionals, in collaboration with humanities scholars, will
construe and exploit the potentials for optimum data re-purposing, harmonization, and integration in the increasingly
complex universe of digital information. LIS is the only field that is concerned with the full landscape of scientific and
scholarly information and the interactions therein, and with provision of services to exploit that base of knowledge (Bates,
1999; White et al, 1992).
GSLIS has a solid and unique base of resources and expertise for a comprehensive HDC program, with top-ranked MSLIS
on-campus and distance programs, a new and growing science data curation concentration, a digital libraries certificate of
advanced study, and a certificate in special collections offered through the Midwest Book and Manuscript Studies
program, all coupled with our current digital research initiatives across the science, humanities and cultural heritage
domains. UIUC is home to the National Center for Supercomputing Applications (NCSA) and the Illinois Center for
Computing in the Humanities, Arts, and Social Sciences (I-CHASS), organizations whose researchers collaborate
regularly with GSLIS faculty on joint digital humanities projects. This year GSLIS co-hosted, with NCSA and I-CHASS,
the Digital Humanities 2007 Conference, an annual conference presenting research on a range of topics in the digital
humanities and involving researchers and practitioners from many professions. In 2009 GSLIS will be the site of the
fourth annual iConference, a meeting of the iSchools Caucus of leading “information schools”. In 2010 GSLIS will host
the International Digital Curation Conference, which has become the premier forum for generating and exchanging new
ideas and research in data curation. By that point DCEP at GSLIS will have been active for three years, enough time for
the first class of data curation students to have completed their concentration, participated in internships, and commenced
applying their curatorial training actively in the workforce.

Scope of current educational programs
While it is not uncommon to find programs in archival studies and electronic records management in schools of library
and information science, only a few schools feature full programs or concentrations in areas related to data curation. The
Universities of Toronto and British Columbia offer established curricula in electronic and digital information and
archiving, and UBC's InterPARES research projects on digital archives are noteworthy. The Universities of Pittsburgh and
Michigan both have specializations in Archiving and Records Management, and the curriculum at Michigan is designed
for training LIS students in digital curation in certain science and social science domains. The University of Texas is
known for traditional conservation and preservation work. The University of Arizona recently announced the start of an
online graduate certificate program in digital information management. In addition, in 2007 the University of North
Carolina launched the Carolina Digital Curation Curriculum Program and hosted an international symposium on digital
curation issues. However, none of the existing educational options provide broad humanities data curation training, nor
are they well-integrated with research activities in the digital humanities.

We expect HDC professionals to have significant local impacts on research enterprises and libraries, and other institutions
with humanities data responsibilities, while also contributing to global development and integration of information
systems for scholarly humanities research and data sharing. Placements will be in libraries, university institutional
repositories, archives, museums, digital archive projects, research institutes, academic departments, and other humanities
data centers. While levels of responsibility will vary, the influence may be far-reaching. Impacts will include: 1) Progress
toward solving information lifecycle problems with humanities data collections as practices for implementation,
evaluation, continual improvement, and sustainability of data systems evolve. HDC professionals will have expertise to
work with data resulting from content production that is not only text-based but also of diverse media formats and
knowledge of emergent tools, applications, and technical methods relevant to data processing and storage. Moreover, they
will work to make sure systems are responsive to the real needs of humanities scholars and other users. 2) Wider adoption
of data standards and improved interoperability across scholarly communities. HDC professionals will keep abreast of
data standards and contribute to data sharing and federation activities within and across disciplines. They will be an
important conduit for communication and collaboration in standards work. 3) Alignment of data and scholarly
communication activities. HDCs will be active in coordinating and reconciling data, literature, and communication
practices and systems in the humanities.
HDC professionals will have a strong base of traditional and technical LIS but also be immersed in the “data-centric”
approaches to humanities research, receiving the specialized training to keep humanities data safe and usable. They will
have skills in project management and in the design of service interfaces to the data they curate. And, local assessment
and coordination of the extensive and ever growing universe of data and information resources, informatics tools, and
scholarly communication will foster more rational and equitable global development in terms of integration of data,
literature, and information technologies.

Three programs at GSLIS provide a unique opportunity to recruit underserved communities to the LIS profession: our
undergraduate minor in information technology studies, the LIS Access Midwest Program (LAMP), and the LEEP online
education program. First, our undergraduate program has proven to be an excellent introduction to LIS for a range of
students from different backgrounds. Second, the LAMP initiative recruits undergraduate students from statistically and
historically under-represented populations for participation in activities and events intended to introduce them to graduate
school and career opportunities in LIS. Third, since the target population of students is national, we will recruit for the on-
campus LIS program as well as for our award-winning online masters education program, LEEP. Through LEEP, students
from anywhere in the world who are not able to leave their homes and travel to central Illinois because of financial, work,
or family constraints will be able to obtain master’s level training in humanities data curation. Because the LEEP program
requires once-a-semester visits to the University of Illinois campus, however, we have requested funds to support travel
and lodging in Champaign-Urbana for students from underserved groups with economic need. Moreover, advertising
through our internship institutions in major metropolitan areas across the United States will help insure a large pool of
applicants, and our advisory board will be able to direct promising students to the program.
The program will benefit from disciplinary diversity as well. Students come into the program from a range of humanities
and social science disciplines. GSLIS, as a school, is already highly interdisciplinary, and faculty with humanities,
sciences, and computer sciences orientations teaching in the program.

Project Design and Evaluation
The HDC program will be an extension and refinement of the current DCEP, redeveloped around four core themes: (i)
text and documents; (ii) digital artifacts and media; (iii) collections, metadata, and ontologies; (iv) scholarly information
use and behavior.
The curriculum will build on and complement two other GSLIS programs, a digital libraries post-masters certificate of
advanced study (DL-CAS) and a certificate in special collections. An expert advisory committee will guide the design and
implementation of the program and assist in establishing student internships at their institutions and at associated libraries,
museums, and data centers. Ongoing GSLIS research projects with humanities data curation components will be
integrated into the student learning experience. Local experts will be consulted in curriculum development and used as
guest instructors, including librarians working in preservation, rare books and manuscripts, modern languages and
literatures, digital services and development, and institutional repositories.
In addition to the internship option, students will have practicum opportunities in academic and special libraries, archives,
museums, and institutional repository operations, to be aligned with their interests and career goals. With IMLS support
we will be able to work with preeminent experts in the field, provide new courses with a base of well-developed case
studies on existing humanities data curation challenges in various settings, offer fellowships, provide internships across a
range of different humanities and cultural heritage institutions involved in data creation, management, archiving, and
preservation, and integrate our curriculum with state-of-the-art research projects at GSLIS.

Engagement with data communities
All of the proposed curriculum development will be carried out in close coordination with communities of practice and
research in the humanities, with strong ties to the institutions involved and understanding of their data collections,
information system requirements, and user communities. It is essential that communities producing and working with
humanities data inform and guide the training of HDC professionals.

Advisory Committee
We will meet at least twice per year with an advisory committee of experts to identify the main topics and best practices
in humanities data curation, one meeting will be face-to-face at UIUC or matched with a conference, the second will be a
teleconference. The committee members will also serve as guest lecturers on special topics in humanities data curation
courses. Some will give symposia open to the larger community on current topics in HDC. Because of our
telecommunications and distance education infrastructure, lecturers need not be physically in the same location as GSLIS
to give a public lecture. The following individuals have agreed to serve on the preliminary advisory board:
Gregory Crane, Professor of Classics and Winnick Family Chair in Technology and Entrepreneurship, Tufts University.
Crane is Editor-in-Chief of Perseus a pioneering and influential digital library with a wide range of object types.
Lorcan Dempsey, Chief Strategist and Vice President for Research, OCLC. Dempsey leads research and strategy for one
of the most important organizations involved in metadata research and technologies for bibliographic control. He worked
for JISC (UK) before coming to OCLC and is a member of the NISO Board.
Julia Flanders, Director, Women Writers Project, Brown University. Flanders, a recent chair of the Text Encoding
Initiative Consortium and vice president of the Association for Computers and the Humanities, is a well-known expert on
humanities text encoding.
Christian-Emil Ore, Unit for Digital Documentation, Faculty of Arts, University of Oslo. Ore is the President of the
Committee on Documentation (CIDOC) of the International Council of Museums (ICOM).
Daniel Pitti, Associate Director, Institute for Technology in the Arts and Humanities (IATH), the University of Virginia.
Pitti is Fellow of the Society of American Archivists and a leading figure in the development of the Encoded Archival
Harold Short, Director, Center for Computing in the Humanities, Kings College London. CCH is one of the largest
humanities computing centers in the world and has been prominent in a wide range of initiatives in European digital
resources; currently it is developing, in collaboration with the UK Centre for e-Research, a master’s program in digital
asset management.

Data Centers
The data center collaborators will serve as internship locations and sites for learning about current challenges and state-of-
the-art practices. The advisory committee represents the initial set of participating centers, all of which run, or support,
large-scale digital humanities programs and generate large and diverse stores of data. Additional non-local centers will be
established as the project evolves. Additionally, a number of local initiatives will provide a rich, and convenient, base of
data centers. Two Library of Congress projects under the NDIIPP program: ECHO DEPository and the Preserving
Virtual Worlds (PVW), will offer unequaled opportunities for students to participate in major digital preservation
initiatives. ECHO DEP is a 3-year digital preservation and research development project at UIUC in partnership with
OCLC; its main activities include the development of tools for web archiving and investigations into interoperability
architectures for digital repositories, with particular focus on preservation metadata. PVW is a 2-year project at UIUC
exploring the preservation challenges surrounding video games and interactive fiction. Its project partners include
University of Maryland (Maryland Institute for Technology in the Humanities), Stanford University (Stanford Humanities
Laboratory), Rochester Institute of Technology (Game Design and Development), and Linden Lab (creators of Second
Life, the 3D online virtual world). The UIUC University Library Institutional Repository, IDEALS (Illinois Digital
Environment for Access to Learning and Scholarship), will also serve as a data center. Working in partnership with NCSA
and Illinois faculty, the IDEALS team is building a data archiving component over the next three years. The University
Library in particular is concerned with the policies and technology that sustain long-term preservation and management of
data, such as suitable policies for selection and retention and the development of supporting documentation to be archived
with data to guarantee their enduring usefulness.

Local research laboratories and projects
Two Mellon-funded research initiatives wll serve as local “learning laboratories” for text and music data.
MONK ("Metadata Offer New Knowledge"), directed by John Unsworth at UIUC, and includes faculty, staff, and
students from Northwestern University, McMaster University, the University of Nebraska, and the University of Alberta,
as well as the National Center for Supercomputing Applications. Providing access to terabytes of full-text humanities
resources publicly available on the web, it is designed to be an inclusive and comprehensive text-mining and text-analysis
tool-kit for scholars in the humanities. As it works with many collections, dispersed across many different institutions (not
only libraries but also publishers and search engines) is laboratory not only for contemporary tools, but the problems of
data heterogeneity, provenance, and authority. www.monkproject.org/
NEMA (Networked Environment for Music Analysis) – Stephen Downie’s group is investigating the distribution of
music analysis tools and virtual collections and is developing a terascale GRID-based data store of music materials in
symbolic and audio formats that serves as a permanent and stable repository of music materials for researchers to conduct
scientifically valid year-over-year evaluations of music information retrieval techniques.
Other local humanities data-intensive projects that can provide test beds include Digital Collections and Content with a
collection registry and metadata repository for over 200 digital cultural heritage collections (IMLS, Palmer,PI);
Investigating Data Curation across Research Domains (IMLS, Palmer in partnership with Brandt, PI, Purdue University);
Machine Learning for Automatic Museum Label Databasing (NSF) and Georeferencing Museum Specimen Sources
(Moore) (Heidorn, PI); and Bonnie Mak’s work on history of books, libraries, and archives in the digital era.

Internships and Fellowships
It is well known that practical workplace experience builds on student educational experience and improves the prospects
of obtaining an education-related job after graduation. Therefore a core aspect of HDC is a set of internships that help
provide that experience. Within HDC we would aim to offer each humanities data curation student an opportunity to
engage in a for-credit internship at some point in their graduate education. In order to help develop an internship network,
we would seed the internship program with a constellation of opportunities at participating institutions, beginning with 6
paid placements in Years 2 & 3. The funds for these awards would be managed through GSLIS. The standard internship
would include a four to six-week residency at the data center site. The remainder of the semester would be spent preparing
for the residency portion and preparing a written report on the project after the residency portion. The internship
supervisor at the data center and GSLIS faculty would help develop a study plan for each internship. We would encourage
collaborating data centers to post internship opportunities. Students in the program could apply for the internships, and the
data centers would select among the pool of applicants. Throughout the grant period we would strive to find new
institutions that could provide funded or unfunded internship opportunities to expand the program.
To recruit promising professionals, five fellowships would be offered to exceptional applicants in Years 2 and 3, with two
being designated for minorities with financial aid needs. These fellowships will include tuition remission and a stipend for
three semesters. Each fellowship will carry with it a guaranteed paid internship.

Curriculum Approach
As an LIS school, our approach to curriculum development and course content will reflect our commitment to research
librarianship, a professional that has long been responsible for large-scale monitoring, coordination, and access to
scholarly and scientific material. And, more recently, this training has prepared graduates to move into jobs as data
services librarians, assistant directorships of digital humanities services in libraries, project managers for metadata
federation initiatives, institutional repositories, and digital collection initiatives. The HDC will draws on the MSLIS
courses listed below, which will need to be assessed and refined to include humanities content and two newly developed
core courses for the DCEP program: Foundations of Data Curation and Digital Preservation.

Existing courses in the MSLIS and DL-CAS:
Information Storage and Retrieval                 Architecture of Networked Information Systems
Information Policy                                Interfaces to Information Systems
Design of Digitally Mediated Information          Systems Analysis and Management
Electronic Publishing and Information             Representing and Organizing Information
Processing Standards                              Resources
Indexing and Abstracting                          Information Modeling
Metadata in Theory and Practice                   Electronic Records Management
Arrangement and Description for Archives          Ontologies in the Humanities
and Museums
Information Sources and Services in the Arts     Museum Informatics
and Humanities
Digital Libraries: Research and Practice         Use and Users of Information
Understanding Multimedia Information:            Digital Humanities
Concepts and Practices
Preserving Information Resources                 Administration and Use of Archival Materials

An initial set of new courses is outlined below. This list will grow and change in consultation with our advisors and based
on our exploration of data curation practices and needs.

Proposed Courses for Humanities Data Curation
                                                  Humanities Informatics Resources and Tools
Digital Media in Libraries, Archives, and
Humanities Text-Base Development                  Policy and Ethical Issues in Data Curation
Managing Collaborative and Distributed            Selection, Appraisal and Retention of Cultural
Research (managing across organizations)          Heritage Data
Selection, Appraisal and Retention of Cultural
Heritage Data                                     Digital Archive Project Management
Museums, Computing, and Data: Changing
Curatorial Practices

Objectives and Activities
The six project objectives will be met through a range of activities, many of which will be ongoing processes throughout
the 3-year period. IRB approval will be secured for all assessment and evaluation activities.
Objective 1 (Method and Evaluation). Develop and refine a humanities data curation curriculum building on existing
graduate programs at GSLIS.
Establish the advisory board. The Board directs outcomes and advises on continual development.
Conduct a needs assessment of expertise in humanities data curation, targeting museums, libraries, and archives, as well
as digital humanities centers across the country. This work has already begun, in preparation for this proposal through
informal interviews with digital humanities figures. In continuing, we will collect data from researchers and project
managers using a multi-method approach, including surveys, interviews, and focus groups. These activities will provide
in-depth information on how new HDC professionals can contribute to organizations responsible for humanities data.
Monitor the current job market, collect announcements and job descriptions from major research centers and websites.
Refine and develop curriculum. Conduct a full review of existing relevant courses and coordinate with the current DCEP,
DL-CAS, and special collections certificate program committees to evolve courses that meet the needs of all programs.
Design a humanities data curation concentration that includes the current foundations courses and a series of new courses,
based on the outcomes of activities listed above.
Evaluate courses, internships, and program through the UIUC course evaluation system and a survey of graduating
students on the effectiveness of the program. We also plan to track placement and do follow-up evaluation with employers
one year after placement.
Objective 2 (Training and Sustainability). Develop a network of internship sites at institutions where students can
develop and apply their growing expertise.
Build partnerships with institutions that will offer internship opportunities for students to work on humanities data
curation projects. Collaborate with site supervisors to establish internship guidelines and outcome measures. Continue to
explore and build relationships with other internship sites.
Develop and implement an evaluation process in collaboration with site supervisors to assess the performance of interns
and to assess the value of individual internship positions to students and the organization.
Objective 3 (Recruiting & Integration). Promote the role of LIS professionals in data curation and expand
understanding of the role of humanities data curation in the production of research.
Recruit students through site visits, mailings, conference presentations, and engagement with advisors and new contacts.
Develop mechanisms for integrating course work into current research and practice at GSLIS CIRSS, UIUC library,
internship institutions, and other collaborating and interested institutions.
Develop case-based modules of study and assignments in the four areas (listed under Project Design) in coordination with
GSLIS and UIUC research projects and those at participating institutions.
Identify opportunities for student projects to contribute to ongoing research projects.
Objective 4 & 5 (Dissemination). Disseminate best practice reports and provide a model curriculum for other data
curation programs. Develop a model institute for delivering the curriculum as continuing professional development
Develop best practices reports based on needs assessment and educational activities. Make these and all project
documents, including syllabi, case studies, and recorded guest lectures available on the project web site. These materials
will be deposited into the UIUC Institutional Repository, IDEALS, for wide access and long-term preservation.
Disseminate progress and outcomes of the project at information science, museum informatics, digital humanities, and
related conferences. Publish papers and training materials from the 2009 Summer Institute and the DCC pre-workshop to
be held at GSLIS in 2010.
Gather evaluative feedback from institute and workshop instructors and participants. A focus group will be held with the
instructors at the close of these events, and surveys with both closed and open-ended, qualitative questions will be sent to
Objective 6 (Deliver curriculum). Deliver the curriculum to both new masters students and continuing professionals
Program delivery: conduct classes, coordinate internships, continue integration activities.

Evaluation Plan
Program effectiveness will be measured against both project-generated standards and external measures using a variety of
methods including those outlined above.
Course content and instructional methods will be evaluated using teaching objectives developed in consultation with the
advisory committee in the first year of the project and reviewed systematically at each advisory group meeting and by the
project team at the beginning and end of each semester. These objectives will be derived, in part, from desired student
performance in the workplace.
Teaching effectiveness will be evaluated using instruments developed at the UIUC Center for Teaching Excellence, which
offers online course evaluation.
Workplace evaluation procedures for internships and practicum are already in place for other GSLIS programs and will be
adapted for HDC. Additional interviews and surveys with employers will be implemented in Year 3 after a larger number
of students have been involved with these aspects of the program. We expect time to graduation will be approximately 3-4
semesters, depending on the student’s background.
We will apply outcome-based measures to answer the following questions: Were our students more likely to be employed
for humanities data curation jobs as a result of taking data curation courses oriented toward the humanities? Did the
student meet the qualifications for humanities data curation jobs? Were any of our humanities data curation courses, case
studies, or modules used in other schools of library and information science? Did the program serve as a model for other
institutions resulting in the establishment of related programs?
Degree targets: As of fall 2007, the new DCEP program has begun, with particular emphasis on science data curation.
DCEP expects 20 students to pursue the concentration, and with the addition of HDC activities would aim for 30 overall,
with one-third to one-half of the graduates to train for the humanities.

Project Resources: Budget, Personnel, and Management Plan
Decisions about activities and budget allocations have been largely informed by our experiences starting the DCEP
science concentration, funded by IMLS in 2006. In particular, we have found a need for increased recruiting and
promotion activities and are seeing a great demand for continuing opportunities. We are also been able to provide
considerable in-kind contributions due to progress made in the past year in data curation teaching and activities.
Budget: The funds requested from IMLS will cover core project personnel and student support in the form of fellowships
and internships, and travel for distance education students, as well as general travel, services, and supplies. Travel funds
include advisory board travel to annual meetings, and project personnel conference and data center site travel, which we
have learned is essential for promotion and recruiting, and for building awareness in this kind of new professional role.
Services for data collection and analysis for needs assessment and evaluation are allocated for CIRSS, and basic computer
and office supply lines are also included.
Management plan and personnel: The project will be lead by Allen Renear, who will be responsible for overall project
coordination and with advisory committee members and data centers. Carole Palmer, director of CIRSS, will provide
coordination with the DCEP project and oversee needs assessment and evaluation. John Unsworth will be responsible for
curriculum development in digital humanities and assisting with building partnerships with humanities centers. Jerome
McDonough will be responsible for course preparation and offerings in digital media and preservation, and Bonnie Mak
will be responsible for course preparation and offerings in digital archives instruction. Day-to-day management and
coordination with other GSLIS projects and educational programs will be provided by a ½ time project coordinator, and
research and evaluation activities supported by a ½ time RA. Weekly project meetings will be held to plan, assess and
share progress, and facilitate communication among the project team. GSLIS works with the University Grants and
Sponsored Contracts Office to oversee finances and ensure conformance with regulations.
Allen Renear teaches courses on information modeling and ontology development, has served on the Advisory Board of
the Text Encoding Initiative (TEI), chaired the Open E-Book Publication Structure Working Group, and is a past president
of the Association for Computers and the Humanities, and before joining the GSLIS faculty was Director of the Brown
University Scholarly Technology Group.
Carole Palmer is an internationally recognized expert on scientific and scholarly information work, digital collection
evaluation and representation, and use and users of information. She directs the GSLIS Center for Informatics Research in
Science and Scholarship, which conducts research on scholarly communities and their use of information resources and
tools, with focal areas in the digital humanities and collections and metadata, and existing funded projects in data curation.
John Unsworth, Dean of GSLIS, teaches courses and conducts research on humanities computing and scholarly
communication. He is Chair of the Digital Humanities Alliance and former Director of the Institute for Technology in the
Arts and Humanities (IATH) at the University of Virginia.
Jerome McDonough coordinates the Certificate of Advanced Study in Digital Libraries, is lead editor of the Metadata
Encoding and Transmission Standard (METS), a widely used schema for encoding the descriptive, administrative, and
structural metadata about a digital object in a single package.
Bonnie Mak will begin as an assistant professor at GSLIS in Fall 2008. She brings experience from the InterPARES
project, where she held a post-doctoral position, and was recruited to build teaching and research capacity in digital
archives and new media.

Presentations at conferences will build awareness of the program, encourage similar programs elsewhere, and disseminate
our findings on best practices and curation needs. The team will contribute in venues such as ALA, SAA, IFLA, ALISE,
JASIS&T, JCDL, Dublin Core, Digital Humanities (DH), the International Committee on Museum Documentation
(CIDOC), Museum Computer Network (MCN), International Conference on Digital Curation.

Over the three-year period, the HDC concentration will be integrated with existing GSLIS program development and
administration, as is currently being done with DCEP and other new educational initiatives in the school. The
development activities are also closely tied to ongoing research interests of several of the faculty members in GSLIS,
providing a continued educational outlet and contextualization for the research. Beyond GSLIS, the participation of the
advisory committee and the internship programs will help insure good curatorial practices in these institutions and
ongoing recognition of the need for LIS professionals trained in humanities data curation. We will make all project
documents, syllabi, lecture materials, and reports available on the project web site. Finally, dissemination of the findings
in conferences and publications will help spread the best practices in humanities data curation education to other LIS
graduate programs.

American Council of Learned Societies. (2006). Our Cultural Commonwealth: The Report of the American Council of
    Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences, December 13, 2006,
    New York, NY, Sponsors: American Council of Learned Societies, Andrew W. Mellon Foundation.
Arts and Humanities E-Science Support Centre. Training. AHeSSC, Kings College, London, UK.
Association of Research Libraries (2005). Managing Digital Assets – Strategic Issues for Research Libraries, October 28,
    2005, Washington, DC, Sponsors: Association of Research Libraries, Coalition for Networked Information, Council
    on Library and Information Resources, Digital Library Federation.
Bates, M.J. (1999). The Invisible Substrate of Information Science. Journal of the American Society for Information
    Science, 50(12), 1043-1050.
Blanke, T., Dunn, S., & Dunning, A. (2006). Digital Libraries in the Arts and Humanities – Current Practices and Future
    Possibilities. Presentation at the 2006 International Conference on Multidisciplinary Information Sciences and
    Technologies (InSciT 2006), Mérida, Spain.
Crane, G., Babeu, A., & Bamman, D. (2007). EScience and the Humanities. International Journal of Digital Libraries, 7,
Cantara, L. (2006). Long-Term Preservation of Digital Humanties Scholarship. OCLC Systems and Services, 22(1), 38-
Cullen, C.T., Hirtle, P.B., Levy, D.B., Lynch, C.A., & Rothenberg, J. (2000). Authenticity in a Digital Environment.
    Washington, DC: Council on Library and Information Resources.
Doorn, P. & Tjalsma, H. (2007). Introduction: The Archiving of Research Data. Archival Science, 7(1), 1-20.
DPC/PADI (2007). What’s New in Digital Preservation? Compiled by Najla Semple for the Digital Preservation Coalition
    (DPC) and Gerard Clifton (National Library of Australia), 19 September 2007.
Hedstrom, M. (1997). Digital Preservation: A Time Bomb for Libraries. Computers and the Humanities, 31(3), 189-202.
Hey, T., & Trefethen, A. (2003). The Data Deluge: An E-Science Perspective. In Grid Computing: Making the Global
    Infrastructure a Reality. Eds., F. Berman, G. Fox, & T. Hey. Wiley Series in Communications Networking &
    Distributed Systems. New York: J. Wiley.
Lavoie, B., & Gartner, R. (2005). Preservation Metadata. DPC Technology Watch Report 05-01. Sponsors: OCLC Online
    Computer Library Center, Oxford University Library Services, Digital Preservation Coalition.
Lesk, M. (2003). The Future of Digital Libraries. Wave of the Future: NSF Post Digital Library Futures Workshop, June
    15-17, 2003, Chatham, Massachusetts, Sponsors: National Science Foundation. Available at
    http://www.sis.pitt.edu/~dlwkshop/paper_lesk.html [accessed 23 November 2007].
Nichols, S. (2007). Digital Scholarship: What’s All the Fuss? CLIR Issues, 58 (July/August).
Preservation Metadata Maintenance Activity (PREMIS) website, http://loc.gov/standards/premis [accessed 2 Dec. 2007].
Thibodeau, K. (2002). Overview of Technological Approaches to Digital Preservations and Challenges in Coming Years.
    In The State of Digital Preservation: An International Perspective. Conference Proceedings. Washington, DC: Council
    on Library and Information Resources.
White, H.D., Bates, M.J., & Wilson, P. (1992). For Information Specialists: Interpretations of Reference and
    Bibliographic Work. Norwood: Ablex.