Standards for Long-Term Retention of Digital Information Can

Document Sample
Standards for Long-Term Retention of Digital Information Can Powered By Docstoc
					Standards for Long-Term Retention of
        Digital Information:
       Can Ontologies Help?
                        Joshua Lubell
      National Institute of Standards and Technology

           Collaborative Expedition Workshop
              National Science Foundation
                      July 18, 2007
   The Problem
• Too much digital data!
    – It takes about 15 minutes for the world to churn out new digital
      information equivalent to the entire collection in US Library of
• Proprietary file formats
    – Expected lifetime of typical manufacturing software application only
      3 years
• Short-lived Computing hardware and software
    – Expected lifetime of today’s storage/retrieval technologies only 10
• Products often outlive computer software/hardware by an order
  of magnitude
    – Aircraft can last 50 years or more
    – Healthcare records should be preserved through the patient’s
      lifetime, and perhaps beyond
• Methods/tools address preservation, but not reuse or re-
  engineering requirements
  Data Standards

• Necessary to avoid being locked into a vendor
  format or application that could disappear in
  the near future
• Likely to be more stable than proprietary
• But data standards are only part of the solution
  – Information is more than just data!
       Information = Data + Interpretation
from Reference Model for an Open Archival Information System (ISO 14721:2003)



An Information Package
          Information Objects



  Tools for Tackling Long-
  Term Retention
• Standards for representing digital artifacts
   –   STEP – ISO 10303 (product data)
   –   XML (documents)
   –   Graphics, audio, video, multimedia standards
   –   Scientific modeling standards
• Methods for representing preservation information
   – Digital object typing/packaging
        • METS (Metadata Encoding and Transmission Standard)
        • MPEG-21
        • DOPs (Digital Object Prototypes)
   – Ontology languages
   – Rules languages
        • Schematron (ISO 19757-3:2006)
• Digital format registries (UK Archives, Harvard, Univ.
  of Maryland)
   Sustaining Digital Information
What is sustainability?
From The Free Dictionary:
• Noun - the act of sustaining life by food or providing a means of
   subsistence; "they were in want of sustenance"; "fishing was their main
• Transitive verb
    – 1. To keep in existence; maintain.
    – 2. To supply with necessities or nourishment; provide for.
    – 3. To support from below; keep from falling or sinking; prop.
    – 4. To support the spirits, vitality, or resolution of; encourage.
    – 5. To bear up under; withstand: can't sustain the blistering heat.
    – 6. To experience or suffer: sustained a fatal injury.
    – 7. To affirm the validity of: The judge has sustained the prosecutor's
    – 8. To prove or corroborate; confirm.
    – 9. To keep up (a joke or assumed role, for example) competently.
     Sustaining Digital Information
• Minimal
   – “Prop up”
   – Prevent destruction
• Better
   – Preserve
   – Ensure authenticity, availability
• Ideal
   – Nurture
   – “Care and feeding”
   – Enable reuse
       Sustainability Metrics
• Library of Congress digital format sustainability factors
   –   Disclosure
   –   Adoption
   –   Transparency
   –   Self-documentation
   –   External dependencies
   –   Impact of patents
   –   Technical protection mechanisms
• What are the sustainability factors for an archiving and/or
  records management strategy?
OAIS Functional Model
  Access Scenarios: The Three Rs

• Reference
   – Preserve information in its original state
   – Example (product data engineering): 3D visualization
• Reuse
   – Allow for future modification, re-engineering
   – Example: ISO 10303-203:1994 (STEP AP203)
• Rationale
   – Encode construction history, design intent, tolerancing info,
     lifecycle management info, etc.
   – Example: STEP AP203 ed.2 ++
   – Ontologies and/or other representations needed
Extended Functional Model
  So How Can Ontologies Help?

• Digital object type classification
• Prediction of records management policy
• Evaluating a records management system
  based on sustainability criteria
• Tailoring repository access according to the
  Three Rs
• Measure long-term sustainability based on the
  Three Rs

Shared By: