MAC Meeting Columbus, Ohio May 3, 2007
WINING AND DINING; OR, HOW TO WIN FRIENDS AND INFLUENCE PEOPLE: SUCCESSFUL MANAGEMENT OF ELECTRONIC RECORDS
Brian A. Williams, University of Michigan Bentley Historical Library
Blogs (Mblog) and Institutional Repositories (Deep Blue) My slice of the pie for this session focuses on blogs and institutional repositories. According to a recent technology study, the world generated 161 billion gigabytes of digital information in 20061. So clearly we have our work cut out for us. Just like everyone in this room, we are still wrestling with digital record issues at the University of Michigan’s Bentley Historical Library. Most of our experience with digital records has been hard earned through practice, experimentation, and adaptation. It has unfolded as an ongoing series of case studies leading to a number of recommendations and best practices. Our first intensive experience with digital records took place a decade ago when the outgoing university president James D. Duderstadt responded to our overture asking for his personal papers by showing up at our loading dock with a computer and hard drive. The former president proclaimed that all of his speeches, position statements, and other important “documents” were contained on that hard drive. We soon learned firsthand the challenges of item level work with digital files in their original form. Along with that we gained valuable experience in access, preservation, and online delivery (see the Duderstadt Digital Collection). In the ensuing ten years we have captured and cataloged websites, preserved online course and degree requirement information, received faculty promotion casebooks in purely digital format, and kept an eye on changing technology on campus. Through all of this we’ve been learning how to adapt archival description to reflect increasingly digital content. Most of these experiences involved the Bentley trying to catch up with digital content after the fact. In this presentation the focus is on two cases where the archives has been a partner in the planning and development with a number of invested stakeholders. (To demonstrate the commitment to high technology at this session, two hardcopy handouts are available. One is on blogs [Handout Number 1 at the end of this document] and the other is on institutional repositories [Handout Number 2 at the end of this document]).
1
According to a study done by IDC, a technology research firm, the world generated 161 billion gigabytes of digital information. http://www.cbc.ca/cp/Oddities/070305/K030517AU.html
Since I hope to cover two topics in the brief time allotted I only have a few minutes to talk about institutional repositories (IRs) so I can skip over most of the technical detail (and fortunately avoid exposing my limited knowledge of the technical underpinnings). Staff members from the university archives were involved in the planning of the IR from its early inception and continue to serve on committees related to the IR. This gave us a valuable opportunity to learn more about the system. More importantly, it gave us a forum to introduce a number of archival considerations into the early developmental process. The IR system at the University of Michigan is called “Deep Blue.” It utilizes DSpace technology. Deep Blue is a place to centrally deposit, access, and preserve the scholarly and creative output of the university. Contributors must be affiliated with the university and submissions to Deep Blue must be educational, artistic, or research oriented. Submissions must also be ready for distribution. Copyright and intellectual property issues naturally enter the equation. Deep Blue has a four part policy: an overall intellectual property policy; an author’s deposit agreement; a collection coordinator agreement; and a blanket agreement for batch deposits. Deep Blue also encourages use of the creative commons license as an alternative to more restrictive copyright. Content submitted in Deep Blue typically includes articles, working papers, reports, exhibits, presentations and datasets. Minimum required data fields include author/creator, title, and date of publication. Among the key benefits of submission into the system are a new context, a persistent URL, and increased access and visibility. The key point I want to make today is about the shifting digital preservation landscape. It is a rather audacious claim to suggest that we will perpetually preserve digital collections, especially in the face of rapidly changing technology. Based on prevalence of formats, the proprietary nature of many formats, and the availability of emulation and migration tools, Deep Blue developed the rather innovative idea of defining and supporting three different levels of preservation. This tiered approach to preservation is a very effective tool. I think it is the key idea to take away from this presentation. The handout on Deep Blue provides more detail and includes the URL http://deepblue.lib.umich.edu/about/deepbluepreservation.jsp. There are also some very useful best practices outlined and explained for image files and audio files among others. These best practices guidelines had extensive input from the archives and are used by us outside of Deep Blue. To summarize the tiered preservation approach, material deposited in Level 1 formats receives the highest level of support with efforts to maintain the content, structure and functionality of the original material. This applies to a format like TIFF, which is openly available and widely used.
Level 2 promises limited preservation. The content will be preserved, but appearance and functionality may not be guaranteed. This applies to proprietary commercial formats like Microsoft Word (see Microsoft Office best practices). The formats are commonly used and there should be tools to convert and migrate them but the commercial interest could change software, discontinue support, or go out of business. Level 3 essentially only promises “as is” preservation. These are specialized or highly proprietary formats used in a single software application. Example are Photoshop (.psd files) and Windows Media Video (.wmv files). Another way to think of the three levels are like the options at a car wash. The top level “gold package” includes the deluxe wash with wax. The “silver package” includes a quality wash. The “bronze package” or basic wash will get your car through but it might not have side view mirrors or an antenna when it comes out. Given the time, let’s leave institutional repositories on that note and turn to blogs.
We’ll cover blogs in the remaining three minutes. Mblog is a blogging service sponsored by the University Library in partnership with Information Technology Central Services (ITCS) and the Bentley Library. It is aimed at students, faculty and staff. The archival interest is in viewing blogs as the 21st century equivalent of daily diaries as a window into campus life reflecting observations and experiences. An added benefit of the partnership is campus-wide training, workshops, and technical support for the blogging service. The key archival component of the Mblog system is the upfront consideration of archiving. When students, faculty or staff create a blog, they are prompted by a check box asking if they would like to have their blog “considered for preservation and access at the Bentley Historical Library.” Additional explanation and information on preservation and access is available via a link to the Bentley Library website. While the decision to archive or not archive the blog is up to the individual blog creator, there is no automatic assumption that the archives will chose to retain the blog for archival preservation. Our policies state that, “the ultimate decision as to whether to accession remains with the archivist and is based upon standard archival appraisal criteria.” For further guidance, some basic appraisal criteria concerning blogs appears on our website. Our site indicates that blogs of particular value over time would include:
content that reflects academic life and culture content that is significant and unique content that will enhance other archival records documenting academic life content that is considered complete as a record in itself
The mblog FAQs includes the question “What happens to my blog after I leave UM?” The answer states, “It gets deleted unless you have selected the option to archive your materials at the Bentley Historical Library. The option to archive your blog or blogs is available on the initial mblog policy screen.” Similarly, another FAQ asks What happens if I choose to archive my blog or blogs at the Bentley Historical Library? The answer states: Once your blog is no longer active, an archivist at the Bentley Historical Library will appraise the blog for it’s archival value. Note that only a selection of blogs will ultimately be selected for transfer to the University Archives at the Bentley Historical Library. For further information see the University Archives collection policy for blogs. Additional reinforcement is given to the notion of archiving by explaining that the “archiving” setting used in configuring preferences for organizing posting in the active blog is different from the long-term archiving offered by the Bentley: “Another type of “archiving” is offering your blog or blogs to the University Archives for possible long-term use and preservation. This type of “archiving” is a process that would happen once your blog is no longer being used. If this is an option that you are interested in, choose the option when you sign up for the blog and then contact the University Archives when you are no longer using a blog. The University Archives staff will then review the content for it’s archival value.” Revisiting the idea of format preservation introduced with the tiered support approach in Deep Blue, the university archives took care to note that format and storage decisions are based on available technology and resources and every effort will be made to keep archived blogs in a digital form. This format and storage gets into the very real concern of maintaining the look and feel of the original. It also hints at the boundary issues for links to external sites outside of the mblog domain. At present the plan is to capture information about the external link (the address and where it pointed, but not to capture the content of the link). Another key consideration is defining when a blog is considered inactive. For now, we took the approach that a blog is considered inactive when there have not been any new posting or comments added to the blog in a 24 month period. Since the mblog service has been live for less than two years, we haven’t had to test this yet. Nor have we actually had any inactive blogs arrive for archival consideration.
What is known is that there are currently 4770 blogs in mblogs (not all are active). Of that number, 436 have checked the initial box indicating that they would like to have their blog archived at the Bentley. Our rigorous time keeping indicates that this is the perfect point at which to stop. Maybe at a future MAC meeting there will be a chance to discuss how the actual archiving process has worked once it has been tested. Thank you.
Handout Number 1 (mBlogs)
mBlog FAQ: Service & Policy
http://www.lib.umich.edu/help/mblog/
What is mBlog?
mBlog is a new blogging service brought to you by the University Library in partnership with ITCS and the Bentley Historical Library. This service allows students, faculty and staff to create blogs for themselves or groups.
Who can use the mBlog service?
The mBlog service is available to current University of Michigan faculty, staff and students.
What happens to my blog after I leave UM?
It gets deleted unless you have selected the option to archive your materials at the Bentley Historical Library. The option to archive your blog or blogs is available on the initial mblog policy screen.
What happens if I choose to archive my blog or blogs at the Bentley Library?
Once your blog is no longer active, an archivist at the Bentley Historical Library will appraise the blog for it’s archival value. Note that only a selection of blogs will ultimately be selected for transfer to the University Archives at the Bentley Historical Library. For further information see the University Archives collection policy for blogs.
Is archiving at the Bentley the same as “archiving” in the configuration settings?
No. The “archiving” function in mBblog is a way for you to organize past postings and comments within your blog. Preferences for the way you want the entries organized can be selected in the Settings section during setup. Another type of “archiving” is offering your blog or blogs to the University Archives for possible long-term use and preservation. This type of “archiving” is a process that would happen once your blog is no longer being used. If this is an option that you are interested in, choose the option when you sign up for the blog and then contact the University Archives (764-3482) when you are no longer using a blog. The University Archives staff will then review the content for it’s archival value.
http://bentley.umich.edu/bhl/uarphome/mblog/archivepolicy.htm
Collection Practice The first obligation of the University of Michigan archives within the Bentley Library is to preserve and promote access to official records of the University of Michigan. As a secondary interest, the archives of the University of Michigan seek to preserve and promote access to a wide variety of original sources--including blogs--which reflect the experiences and observations of individuals who engage with the university either as faculty, administrators, staff, students, or alumni. Blogs of particular value over time would include:
have content that reflects academic life and culture have content that is significant and unique have content that will enhance other archival records documenting academic life have content that is considered complete as a record in itself
As with other offerings from individuals, there is no fixed schedule for transfers to the archives. The offering is at the initiative of the individual but the ultimate decision as to whether to accession remains with the archivist and is based upon standard archival appraisal criteria. Practices Regarding Access and Preservation to Archived Blogs Blogs that have been selected for preservation to the University Archives will be open for use as soon as the collection is considered fully processed. Note that content is not edited once it is transferred to the University Archive. This applies to all content including name identifiable content. The University Archives staff reserves the right to store content selected for the archives in a format other than the format(s) and/or systems originally used by the blog author. Archives staff will make format and storage decisions based on available technology and resources. Every effort will be made to keep archived blogs in a digital form. Note that once a blog is selected by the University Archive, it can not be removed by the blog author. Blog authors should be aware that archived blogs may be searchable by commercial webbased search engines such as Google. When is a blog considered inactive? A blog is considered inactive when there have not been any new posting or comments added to the blog in a 24 month period.
Handout Number 2 (Institutional Repositories)
deepblue.lib.umich.edu
Visibility. Making your work accessible via Deep Blue will ensure more of your peers can find it (in Google Scholar, for example) and will cite it. Permanence. Deep Blue uses special technology that assures the stability of your work’s location online, making the citation to it as reliable as a scholarly journal, while as accessible as any website. No broken links! Comprehensiveness. Deep Blue supports a variety of formats, and we encourage you to deposit not just the finished work but related materials (including data, images, audio and video files, etc.) to create a “director’s cut” that gives context to that work and promotes further scholarship. Safe storage. This goes hand-in-hand with permanence. Deep Blue ensures that you only have to deposit the work once. From then on the Library takes care of backups, compatibility, and format issues. There are some technical limitations to the formats we can support indefinitely, but our commitment to preserving the integrity of your work exactly as you deposit it is 100%. Control over access. Deep Blue allows you to limit who can see various aspects of your work for a given time, if you need to. This is difficult to do on a personal website without hiding the work completely. Context. Beyond what is described above, Deep Blue provides context in two additional ways. First, UM is a destination for the best researchers and scholars, and Deep Blue places you in the larger context of the UM environment, side-byside with the scholarly and artistic contributions of your colleagues and students. Second, as other universities, institutions, and organizations begin to provide this service for their work as well, we will collaborate with them to create disciplinespecific services. The University Library provides this service free to you as part of the UM scholarly community. Further, Deep Blue is designed to meet not only today’s demands but also new ones as they evolve. It will continue to grow and evolve to reflect current publishing needs and norms identified by UM faculty, staff, students, and the communities you form.
Deep Blue Preservation and Format Support Policy
Deep Blue (hereafter the “Repository”) is committed to providing long-term access to all deposited content by applying best practices for data management and digital preservation while also acknowledging the complexities involved in preserving digital information. The Repository commits to preserving the content in the form it is originally deposited and, for some formats, will preserve the content, structure and functionality of the files through migration or other preservation strategies. In addition, the Repository will provide basic services including secure storage, backup, management, fixity-checks, and periodic refreshment by copying the data to new storage media. At the outset, the Repository will provide three levels of preservation support for specific file formats. We have determined these support levels by applying a set of evaluation criteria including prevalence of the file format in the marketplace, whether the format is proprietary, the availability of tools for emulation or migration and the availability of local resources to take specific preservation actions. The Repository will undertake appropriate format monitoring and provide adequate staffing and other resources to support the services offered at each level. Over time, our ability to provide full preservation support for more formats is likely to grow as additional tools and techniques are developed. To assist content creators in saving and depositing documents that meet the level of quality necessary for full information capture and the highest degree of preservability over time, Deep Blue is developing a set of specification and format bestpractice guidelines for common content types. Feature Level 1 Level 2 Level 3 Persistent identifier that will always point to the • • • object and/or its metadata Provenance records and other preservation metadata to support • • • accessibility and management over time Secure storage and backup • • • Periodic refreshment to • • • new storage media Fixity checks using proven • • • checksum methods Storage in a trusted for preservable format • some (making a normalized formats version, if necessary) Strategic monitoring of • •
format Migration to succeeding format upon obsolescence
•
The Repository provides three levels of support for various submission file formats. Level 1 The Repository will provide its highest level of preservation support, making its best effort to maintain the content, structure and functionality in the future. This service level is currently provided only for formats that are both publicly documented and widely used, giving us a high degree of confidence in our preservation commitment, making it more likely that tools will exist or be developed to undertake preservation actions, and that those actions will result in an understood and controlled transformation or migration. The content may also be normalized (transformed to another stable format) to provide additional assurance that the information content is preserved. Finally, the content will be preserved as originally deposited to ensure the original bitstream is always available. TIFF is an example of a Level 1-supported format, as its specifications are publicly available, it is well-supported and widely deployed. Level 2 The Repository will make limited efforts to maintain the usability of the file as well as preserving it as submitted (bit-level preservation). The format will be monitored and may be transformed when significant risk to access is imminent but it is likely to be difficult to predict or control the consequences of any transformation or migration on content, structure or functionality. The file may also be transformed to a more preservable format to ensure that the information content is not lost, even if some structure and functionality are sacrificed. This level of support is generally applied to proprietary formats that are widely used, where there is substantial commercial interest in maintaining access to files saved in the format, and therefore tools will likely be available to migrate them to successor formats (e.g., Microsoft Word). Level 3 The Repository provides basic preservation of the file (bitstream) and associated metadata as-is with no active effort made to monitor the format and associated risks or to normalize, transform or migrate the file to another format. Files may be openable and/or readable by future applications, but there is no guarantee that the content, structure, or functionality will be preserved. This service level usually applies to files written in highly specialized, proprietary formats, often usable only in a single software environment, formats no longer widely utilized, and/or formats about which little information is publicly available. PhotoCD is an example of a format that would receive Level 3 support in the Repository. Any format not yet reviewed and evaluated by Deep Blue will also receive Level 3 service on deposit. A higher level may be assigned after format review takes place.