Archives, Digital Archives and Encoded Archival Description

Document Sample
Archives, Digital Archives and Encoded Archival Description Powered By Docstoc
					Archives, Digital Archives and
Encoded Archival Description

                    Chris Prom
           Assistant University Archivist
               University of Illinois
     Mortenson Visiting Scholars Tech Training
                  April 19, 2006
• Overview of Archives, Arrangement and
• Review Standards and Tools related to
  Archival Description
• Review Standards and Tools for providing
  access to digital archival materials
• Lots of interaction
        Archives Background
• Archives: Organized non-current
  “records”; generated by institutions
• Manuscripts: non-current “papers”;
  generated by individuals or families
• Preserved because of „enduring‟ value
  – Not necessarily „permanent value‟
• Both generally referred to as “collections”
           The Archival Mission
• Identify, preserve, make available records and papers

     From Gregory Hunter, Developing and Maintaining Practical Archives
                     Libraries                          Archives
Nature               Published, discrete, make          Unpublished, grouped with related
                     sense on own, multiple copies      items, make no sense on own

Creator              Many                               One parent organization

Method of            Each created separately            Organically produced as part of
                                                        normal business or life
How Received         Selected as items                  Appraised as groups

How Arranged         By subject classification          Provenance and original order
                                                        (structure and function)

How described        By item                            In aggregate (record group,
                                                        series, collection)
Where described      Built into item itself (provided   Prepared by archivist (e.g.
                     title, author, CIP data), in       supplied title) in „finding aids,
                     catalog                            guides, inventories, databases
How accessed         Items circulate                    No circulation

    Based on chart in Hunter, Developing. . . p. 7
           Archival Appraisal 101
• Process of determining
• Done over aggregates not
• Primary: operational, legal,
  fiscal, administrative
• Secondary: Historical or
  „archival‟ value
• Types of archival value           Credit: Hunter, p. 51
    – Evidential: documents
      organization and
      functioning of organization
    – Informational: sheds light
      on people, events, things
      aside from organization
      Archival Arrangement 101
• Provenance
  – Records from one creator must not be intermingled
    with those from another
  – NOT by subject
• Original order
  – Maintain records in order placed by creator
• Five “levels” of arrangement
  –   Repository
  –   Record group/subgroup (organizationally related group)
  –   Record series (set of files or documents maintained as a unit)
  –   File (folder, binder, packs for convenient use)
  –   Item (one document, letter, etc)
            Levels of Arrangement: Examples

Repository       University Archives       Special Collections

Record Group     College of Engineering    Champaign County Republican

Series           Dean‟s Office             Speaker‟s Committee File
                 Correspondence Files

File Unit        Federal Aviation          Barry Goldwater, 1960-70

Item             Letter to FAA Director,   Copy of remarks by Goldwater to
                 June 12, 1968             CCRP, August 23, 1965
               Arrangement of “Papers”

• The mixed repository model
• Term “series” in papers often refers to internal
  divisions in a collection.
• Thurgood Marshall Papers:
   – “The collection is arranged in five series:
      •   United States Court of Appeals File, 1957-1965, n.d.
      •   United States Solicitor General File, 1965-1967, n.d.
      •   Supreme Court File, 1967-1991, n.d.
      •   Miscellany, 1949-1963
      •   Oversize, 1967, 1991”
          Description of Archives
• Establish administrative control over archival
   – Locate collections
   – Identify their source, creators (chain of custody)
   – Outline contents
• Establish intellectual control
   –   General nature of repository
   –   General contents of collection
   –   Detailed information on specific collections
   –   Summarize information across several collections
• Important for both authentication and access
• Internal vs. Public finding aids
       Principles of Description*
• “Multilevel Description”
  – Proceed from general to specific
  – Provide information relevent to the level of
  – Link each level of description to next higher
    unit of description
  – Do not repeat information, provide it only at
    highest appropriate level

* Summarized from ISAD(G) General International Standard Archival Description
                    Finding Aid
• Basic Access Tool is the “Finding Aid” also
  known as „inventory‟ or „register‟.
  –   Prefatory material
  –   Introduction
  –   Biographical sketch/agency history
  –   Scope and content note
  –   Series description (organization)
  –   Container Listing
  –   Index (less used now with electronic finding aids)
         Elements of Description
• 26 in ISAD (G) (
• Identity
    – Reference code, title, dates, level of description
• Context
    – Name of creator, biographical or admin history, source of
• Content/Structure
    – Scope/content, appraisal information, arrangement
•   Conditions of Access/Use
•   Allied Materials (copies, originals, related)
•   Notes
•   Description Control (author of description, revisions)
      Finding Aid Examples
• Reston Papers and Third Armored
  Division Assn (bring along)
• American Crystal Sugar Co.
• Thurgood Marshall Papers
• Next:
  – Overview of standards and tools for
    description of paper and electronic materials,
    and tools for access to electronic collections.
   Establishing a good descriptive
• Takes planning, awareness of resources
• Deciding on „platform‟ or computers should
  be LAST step
• Better to describe all materials at high
  level than put all effort into one collection
• Beware tendency to do lower levels of
  description before higher levels
• Inventory MUST be the key
• Use a content standard
   Describing Archives: A Content
• Provides rules/advice about the quality and
  structure of informational content
  – 8 principles
  – What to put in the 26 elements recommended by
    ISAD (G)
  – Rules for describing creators and forms of names
  – Complement to AACR2
  – Provides mapping to appropriate data structure
• Advantages: Can use regular library
  software, provides integrated access with
  non-archival materials
• Disadvantages: Can undermine
  provenance, relationship to other materials
  may be lost
• Recommendation: USE MARC Cataloging
  as first step in PUBLIC finding aids
Cataloging Archival Materials
MARC 21 Sample
          Typical Fields for Cataloging
               Archival Materials
Personal Name                               100
Corporate Name                              110
Title                                       245a,b
Inclusive Dates                             245f
Physical Description (volume)               300
Arrangement/Organization                    351
Biographical/Historical Note                545
Scope/content note                          520
Restrictions on Access                      506
Terms of Use                                540
Provenance                                  561
Subject added entry                         650s
Personal name added entry                   700
Personal name as subject                    600
Corporate name as subject                   610
Link to finding aid or digital collection   856
 Word-Processed Finding Aids
• Advantages: Easy to create, maintain
• Disadvantages: Not in standard format,
  cannot exchange with others, lack of
  coded fields
• Recommendation: Very useful for most
  institutions. Can be published to Internet
  via PDF
    Encoded Archival Description
• Data structure standards for descriptions
  of manuscripts or archives-->finding aids
• At any level of granularity
• Typically collection level
• sgml and xml versions of DTD
• <dao> tag for linking to archival surrogates
• Advantages: Best interoperability and data
  exchange, easier to implement with others
• Disadvantages: Tool development still
  weak, steep learning curve.
• Recommendation: If you have good
  technical skills, and a basic archival
  program is in place, and resources are
  available, implement it
                      EAD Samples
• Static:

• Conversion on server:

• PDF:
• In digital library software:

• Other implementations
   – Cheshire:
           EAD Structure 1
• XML: perfect way to implement principles
  of „multi-level description
  – many elements optional
  – most repeatable at any level, nesting can vary
  – Normalization possible, but not common for
    most finding aids
                EAD Structure 2
• <eadheader> (information about EAD File)
   – <eadid> unique id
   – <filedesc>
   – <profiledesc>
   – <revisiondesc>
   – <frontmatter> (deprecated element, repeats info for
• <archdesc> (information about materials being described)
           Common Top-Level <archdesc> Elements

  <did> (descriptive id)
Other elements include <accruals>, <acqinfo>, <altformatavail>, <appraisal>,
<custodhist>, <prefercite>, <processinfo>, <userestrict>, <relatedencoding>,
<separatedmaterial>, <otherfindaid>, <bibliography>, <odd>
Linking elements: some based on XLink spec, suite of linking elements includes
<archref> ,<extref>, <daogrp>

All of above elements are repeatable for components of the collection, at any
level in the <dsc> (description of subordinate components)
      Description of Subordinate
• nested components (i.e. <c> [unnumbered] or
  <c01>, <c02>, etc. [numbered]) represent
  intellectual structure of materials being described
• <container> elements (within each level) represent
  physical arrangement
• Maximum depth of 12 levels (not a good idea to use
  all of them)
• All elements available in archdesc top level also
  available in any component (typically not used)
             A “raw” EAD File
          EAD Tools: Creation
• Current options
  – Text editors (cheap, no built in validation,
    transformation or unicode support)
     • Notetab
     • Word Processors
  – XML editors (graphical view, built in validation,
    transformation, unicode support, FOP; tend to be
     • XML Spy
     • oXygen
     • XMetal (not recommended)
  – EAD Cookbook highly recommended, templates for
    Notetab, oXygen
         EAD Tools: Display
• Most common to transform to HTML
  – Static via xsl stylesheet on command line or in
    authoring software, then upload files to server
  – Client-side via link to css or xsl (dicey)
  – Server side transform engine (saxon, msxml,
    xalan, etc) via servlets
• Dynamic (searchable)
  – dlxs findaid class
      XML Transformations

        XSLT1                 HTML1

        XSLT2                 HTML2


        XSLT4                 HTML4

        XSL-FO                PDF
Typical XSL file
 Collection Management Tools
• Advantages: Software tailored for
  Archives, easy data entry
• Disadvantages: Few options currently
  exist. May be difficult to „migrate‟ forward
  at a future point. Also not automatically
               “CMT” Examples
• Past Perfect
• Archivist Toolkit
• UIUC “Archival Information System”
             AIS Demo
• Login: guest
• Password: guest
        Break for Questions
• Next: Digital Archives Standards and
         Digital Libraries or Archives?
                  Libraries                          Archives
Nature            Published items, each item         Unpublished, grouped with
                  discrete, make sense on own,       related items, make no sense on
                  multiple copies                    own
Creator           Many different                     One parent organization

Method of         Each created separately            Organically produced as part of
                                                     normal business or life
How Received      Selected as items                  Appraised as groups

How Arranged      By subject classification          Provenance and original order
                                                     (structure and function)
How described     By item                            In aggregate (record group,
                                                     series, collection)
Where described   Built into item itself (provided   Prepared by archivist (e.g.
                  title, author, CIP data), in       supplied title) in „finding aids,
                  catalog                            guides, inventories, databases
How accessed      Items circulate                    No circulation
The “on a horse” problem
         • Best systems mix archival and
           library approaches
         • Complete item description AND
         • Full context AND
         • Link to complete collection
           (including description of off line
    Sample of Digital Library/Archive
    Digital Library/Archive Standards
•   Background on Metadata
•   For images: Dublin Core
•   For texts: TEI
•   For information exchange: METS, OAI
•   For Digital Preservation: OAIS Reference
      Archivists and Metadata
• Structured data about an information resource
• Metadata by itself doesn‟t “do” anything.
• Metadata schemas provide “buckets” for information
  about resources.
• Metadata needs to be interpreted by a system or
• Metadata provides context to help machines (and
  more importantly people) interpret content
• People usually talk about applying metadata to
  digital materials, but. . . . . .
                   These are metadata

This is Metadata
same thing electronically
    Metadata Fields

              The metadata itself
Now as xml “metadata”

                Descriptive and
This is Not Metadata

     This is!
Metadata is about context and
                    This is metadata,
                      but. . .
                       Incomplete
                       Embedded in
                       Not self-
 More complete
 Not embedded

 Relational

 Not self-explaining
 Metadata and
 Code and

 human user
  beginning to do
  something with
 But. . .

    Not self-
   Can‟t be
    now as xml metadata

    Non-embedded
    Self-explaining
    But relationships lost
              Dublin Core
• Developed in 1995 for authors to describe
  own web resources
• Very simple, only 15 broad categories in
  the “simple” version
• Advantages: commonly held set of
  elements is easy to understand, built into
  many current tools
• Disadvantages: loss of specificity
              The 15 elements:
• Content             • Intellectual Prop
  –   Coverage           –   Contributor
  –   Description        –   Creator
  –   Title              –   Publisher
  –   Type               –   Rights
  –   Relation        • Instantiation
  –   Source             –   Date
  –   Subject            –   Format
  –   Audience           –   Identifier
                         –   Language
      Dublin Core Resources
        Text Encoding Initiative
• Encode any text with structural markup,
  deep semantic markup, or any
  combination of the two
• Section for metadata in <teiHeader>
• Typically need xml editor to create,
  software such as DLXS to display
          OAIS Reference Model
• Based on Archival Principles
• Three parties involved with digital
    – Producers; SIP: Submission Information Packet
    – Managers; AIP: Archival Information Packet
    – Consumers (Users); DIP: Dissemination Information
“Simple” OAIS Model
• Metadata Encoding and Transmission Standard
• Standard for encoding descriptive,
  administrative, and structural metadata
  regarding objects within a digital library
• Outgrowth of Making of American II project
• Provides metadata for compound text and
  image-based works
• Need purpose-built software to display and
           METS: Why bother?
• Based on the OAIS Reference Model. It Includes
  support for:
   – Submission Information Packet
   – Archival Information Packet
   – Dissemination Information Packet
• Not only for transfer and archival management, but for
  giving access to, navigating an object
• It “plays well” with other systems (EAD, MARC, TEI,
  VRA etc)
• Software will be coming (support in Archivist Toolkit,
  NDIIPP projects)
• BUT. . . . It is currently very complex.
• Open Archives Initiative Protocol for
  Metadata Harvesting
• Not cross-database searching
• metadata harvesting
• Data Providers (expose collections in a
  common syntax)
• Service Providers (use metadata
  harvested via the OAI-PMH as a basis for
  building value-added services)
              OAI Example

• OAIster:
   Tools for Digital Library/Archive
   – Very good, support for dublin core, OAI
   – Con: expensive
   – Recommendation: Skip it
• Greenstone
   – Pros: Free, (relatively) easy to configure, low
     hardware requirements, can run on internet or publish
     to CD, supported by UNESCO, targeted at
     developing nations
   – Con: tends to be „item-centric‟, difficult to aggregate
   – Recommendation: Use it, but as part of large
     descriptive system
• This powerpoint online at: