Digital Preservation by liaoqinmei

VIEWS: 6 PAGES: 103

									                 IT Research Challenges in
                     Digital Preservation

                                   Andreas Rauber
                       Department of Software Technology and
                                 Interactive Systems
                          Vienna University of Technology
                          http://www.ifs.tuwien.ac.at/~andi


.................................................
                                                Overview


          Why do we need Digital Preservation?

          Digital Preservation Projects in Europe

          IT-oriented Challenges in Digital Preservation

          Some Digital Preservation Research at TUWIEN

          Conclusions


.................................................
                        Why do we need Digital Preservation?




.................................................
                        Why do we need Digital Preservation?




.................................................
                        Why do we need Digital Preservation?


        Digital Objects require specific environment to be
         accessible :
             - Files need specific programs
             - Programs need specific operating systems (-versions)
             - Operating systems need specific hardware components
        SW/HW environment is not stable:
             -   Files cannot be opened anymore
             -   Embedded objects are no longer accessible/linked
             -   Programs won‘t run
             -   Information in digital form is lost
                 (usually total loss, no degradation)
        Digital Preservation aims at maintaining digital objects
         authentically usable and accessible for long time
         periods.
.................................................
                             Strategies for Digital Preservation

      Strategies
      (grouped according to Companion Document to UNESCO Charter
         http://unesdoc.unesco.org/images/0013/001300/130071e.pdf)

       Investment strategies:
            - Standardization, Data extraction, Encapsulation, Format limitations
       Short-term approaches:
            - Museum, Backwards-compatibility, Version-migration, Reengineering
       Medium- / long-term approaches:
            - Migration, Viewer, Emulation
       Alternative approaches:
            - Non-digital Approaches, Data-Archeology


       No single optimal solution for all objects
.................................................
                                                    Migration


   Transformation into different format, continuous or
    on-demand (Viewer)
  + Wide-spread adoption
  + Possibility to compare to un-migrated object
  + Immediately accessible
  - Unintended changes, specifically over sequence of
    migrations
  - Cannot be used for all objects
  - Requires continuous action to migrate



.................................................
                                                    Emulation

      Emulation of hardware or software
       (operating system, applications)
     + Concept of emulation widely used
     + Numerous emulators are available
     + Potentially complete preservation of functionality
     + Object is rendered identically
     - Object is rendered identically
     - Requires detailed documentation of system
     - Requires knowledge on how to operate current systems in
       the future
     - Complex technology
     - Emulators must be emulated or migrated themselves
     - Emulators potentially erroneous/incomplete
.................................................
                                              Digital Preservation

 Affects all domains
      -   Cultural heritage
      -   eGovernment
      -   Primary data: Sensor data, experiment data
      -   Industry: production processes, workflows, monitoring
      -   Medical, Insurance/Banking,
      -   Society: photos, communications

 Test:
      - Trying to repeat / verify “old” experiments
      - Problems with
         • Data Management: original test data, parameters,
           preprocessing,…
         • Code: compilability, change of libraries/functionality
         • interpretability of results, know-how
.................................................
                                              Digital Preservation


      Is a complex task
      Requires a concise understanding of the objects, their
       intellectual characteristics, the way they were created and
       used and how they will most likely be used in the future
      Requires a continuous commitment to preserve objects to
       avoid the „digital dark ages“
      Requires a solid, trusted infrastructure and workflows to
       ensure digital objects are not lost
      Is essential to maintain electronic publications, research
       data, … accessible
      Will become more complex as digital objects become more
       complex

.................................................
                                                Overview




          Why do we need Digital Preservation?

          Digital Preservation Projects in Europe

          IT-oriented Challenges in Digital Preservation

          Some Digital Preservation Research at TUWIEN



.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
                                    DPE
                          FP6 Coordinating Action
                  http://www.digitalpreservationeurope.eu




.................................................
                      What is DPE?

                      FP6 Coordinating Action,
                      Digitalpreservationeurope (DPE) intends to create
                         a coherent platform for proactive cooperation,
       Vision



                         collaboration, exchange and dissemination of
                         research results and experience in the
                         preservation of digital objects
                      Digital Preservation: ensuring long-term
                         accessibility of digital objects
                      Mitigating the risk of a “digital dark age”

                                      http://www.digitalpreservationeurope.eu
.................................................
                                            Two macro objectives:

                         1. to foster collaboration and synergies among
                            on-going projects and existing initiatives
       Objectives


                            across the ERA [repositories and audit and
                            certification tools]

                         2. to raise up awareness on digital preservation
                            challenges among different user
                            communities [different level of awareness on
                            the subject and its strategic significance]




.................................................
                                                    DPE Activities

                     •     Range of activities to foster research and take-up in
                           digital preservation
       Activities



                     •     Research Roadmap

                     •     Digital Preservation Challenge

                     •     Researcher and Practitioner Exchange

                     •     DPE Videos




.................................................
       Research Roadmap          Preservation Research Roadmap

                          The Roadmap aims at contributing to the planning
                             of our future R&D in Digital Preservation by
                             means of different actions:

                             Analysing the state of the art in Digital Preservation
                              research and already existing research agendas on
                              a global level;
                             Researching the needs and demands from the
                              point of view of the Digital Preservation user
                              communities and their leading experts;
                             Researching the needs and demands of future
                              markets for technology and service providers
.................................................
                                         DPE Recommended Research
       Research Roadmap

                                    Restauration
                                    Conservation
                                    Collection and repository management
                                    Preservation as risk management
                                    Preserving the interpretability and
                                     functionality of digital objects
                                    Collection cohesion and interoperability
                                    Automation in preservation
                                    Preserving the context
                                    Storage technologies


.................................................
                                                    DPE Challenge
                      •    Promotion of innovation in DP
       DP Challenge

                      •    Targeted at students
                      •    Main Goal:
                           Provide access to and make digital objects useable
                      •    Open to participants world-wide
                      •    Submission deadline: May 30 2008
                      •    http://www.digitalpreservationeurope.eu/challenge

                      •    Different tasks, eg.
                            • Assessment of Submission by an International
                                 Panel of Experts in the field
                            • Access Data in a Legacy Client-Server System
                            • Proprietary File Format
                            • Preservation of Multimedia Art
.................................................
                               Raising Awareness of DP Issues
                          Experts & Practitioners:
                           Briefing Papers, Seminars
                          General Public:
       DPE Videos


                           little awareness, everybody afected
                          DPE Videos:
                           series of short cartoons highlighting DP issues
                           aimed at non-experts
                           trying to communicate challenges in simple style
                           Videos available on YouTube:
                           http://www.youtube.com/user/wepreserve




.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
                                       CASPAR
                            http://www.casparpreserves.eu




.................................................
                                                    CASPAR

      How can digital data still be used and understood in the
       future when systems, software, and everyday knowledge
       continues to change? This is the CASPAR challenge.
      The CASPAR project is mainly based on the OAIS
       standard ISO:14721:2003
      Its Architecture is defined for
        - Managing key concepts of the OAIS reference model
        - Supporting main functionality identified in the OAIS
           functional model
      CASPAR aims to define and implement interfaces and
       functionally independent components

.................................................
                                         Preservation Issue 1


      Users may be unable to understand or use the data e.g.
       the semantics, format, processes or algorithms involved
           - How to guarantee digital information may be accessed and
             understood in the future?
           - How to guarantee retrieval of Archival Information?
           - How to guarantee intelligibility of digital information within
             heterogeneous Designated Communities?
      Non-maintainability of essential hardware, software or
       support environment may make the information
       inaccessible
           - How to guarantee preservation actors are informed about
             change events?
           - How to guarantee appropriate actions are undertaken to
             preserve Archival Information against change events?
.................................................
                                          Preservation Issue 3

      The chain of evidence may be lost and there may be
       lack of certainty of provenance or authenticity
           - How to guarantee an adequate integrity and identity for any
             Archival Information?
      Access and use restrictions may make it difficult to reuse
       data, or alternatively may not be respected in future
           - How to guarantee an adequate security access with the proper
             rights to any resource and functionality within an Archive?
      The current custodian of the data, whether an
       organisation or project, may cease to exist at some point
       in the future
           - How to guarantee a proper information package management
             within and Archive?
           - How to guarantee long-time preservation maintenance of any
             information package?
.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
                                                    Planets

                         http://www.planets-project.eu




.................................................
                                           The Planets project

      4-year research and technology development project
       co-funded by the European Union
      Addresses core digital preservation challenges
      Started June 2006 with €15m budget
      Coordinated by the British Library
      16 partners
           - national libraries and archives
           - leading technology companies
           - research universities
      Builds on strong digital archiving and preservation
       programmes


.................................................
                                               Planets partners

                                                      The British Library
                                                      National Library, Netherlands
                                                      Austrian National Library
                                                      State and University Library,
                                                       Denmark
                                                      Royal Library, Denmark




                                                      National Archives, UK
                                                      Swiss Federal Archives
                                                      National Archives, Netherlands



.................................................
                                               Planets partners

                                                           Tessella Plc
                                                           IBM Netherlands
                                                           Microsoft Research
                                                           Austrian Research
                                                            Centers GmbH



                                                         Hatii at University of
                                                          Glasgow
                                                         University of Freiburg
                                                         Vienna University of
                                                          Technology
                                                         University of Cologne

.................................................
                                              The Planets team




                                       All Staff Meeting, February 2007



.................................................
                                   Planets Architecture


                                                                        Digital
                       Preservation                                     Content
                       Planning                       Preservation
                       Services                             Action
                                                          Services   Organisational
                                            Test Bed:                   Context
                                          evaluation and
                                            validation
                                             services                  External
                                                                       Context
                       Characterisation
                       Services
                                                                       Technical
                                                                      Environment
                                Interoperability Framework
.................................................
                           Preservation Action


      Transform content
           - Pluggable infrastructure for third-party
             migration tools

      Transform environment
           - Dioscuri:
             Modular emulation of the full hardware/software environment

           - Universal Virtual Computer (UVC):
             provides a layered durable approach to emulation

      Preservation Action Tools registry
      XML language for describing preservation action tools


.................................................
                   Preservation Characterisation

      Characterisation framework
           - Unifies tools for identifying file formats
             and extracting object properties
      Characterisation registry
           - Based on the file format registry PRONOM
      eXtensible Characterisation
       Languages (XCL)
           - Family of XML languages
             for characterising digital objects
      Comparator verifies effects of preservation actions


.................................................
                        Infrastructure and Testbed

      Interoperability Framework provides
       common basis
           -   JBoss Application Server
           -   Logging, Security Services
           -   Registry services
           -   User management and Single-Sign-On


      Planets Testbed
           - Controlled environment for the execution of experiments
           - Accumulated experience base collected in registry




.................................................
                              Preservation planning


        Collection profiling services

        Technology watch services

        Risk assessment of digital objects

        Preservation planning methodology

        Tool support: Plato, the Planning Tool




.................................................
                               Preservation planning

         Evaluating preservation strategies
         Variety of solutions and tools exist
         Each strategy has unique strengths and weaknesses
         Requirements vary across settings
         Decision on which solution to adopt is complex
         Documentation and accountability is essential

      Preservation planning assists in decision making
      Evaluation of strategies on representative sample content
       according to specific requirements


.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
   Securing
   Communication
   with
   the future




                             Research & Development Project
                                 in Digital Preservation
.................................................
                             SHAMAN Objectives

  SHAMAN will establish an Open Distributed Resource
   Management Infrastructure Framework enabling Grid-
   based Resource Integration, that is firmly grounded in a
   conceptual and technical reference architecture.
  SHAMAN will develop and integrate technologies to
   support Contextual and Multivalent Archival and
   Preservation Processes to enable proper preservation
   management and policies.
  SHAMAN will support Managing of Future Requirements by
   safeguarding Interoperability with Future Environments
   based on evidence gathered through the characterisation of
   digital objects, their (metadata) context and their
   preservation environment, resulting in the evolution of
   preservation policies.
.................................................
                                 SHAMAN Outputs



                                    SHAMAN will deliver a next-generation
                                    Digital Preservation framework, with
                                    three prototypical applications.


                                     scientific publishing in libraries and
                                      documents in governmental archives
                                     digital objects used in industrial design
                                      and engineering
                                     data resources used in e-Science
                                      applications
.................................................
                              SHAMAN Framework




.................................................
                            SHAMAN Consortium




                                                    SHAMAN Collaborators:



.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
        FP7 project funded by the European Commission
        Started in Feb 2008
        EA, L3S, Max Planck, Hungarian accademy of science,
         Hanzo Archives, libraries and archives




.................................................
                         Users and challenges identified

    User type                      Main concern              Locker

    National Libraries             Size of archives          No control of size and its evolution with
                                                             time with implication on costs control.
    Other Libraries                Coherence                 Selecting and keeping appropriate
                                                             content for their user community is
                                                             difficult on the web
    Institutional Archives         Fidelity                  Lack of fidelity to the original

    TV and radio Archives Variety of content type            Impossibility to archive streaming

    Museums                        Variety of content type   Difficulty to archive non-standard
                                                             formats
    Corporate Archives             Fidelity                  Fidelity to the original and temporal
                                                             coherence for compliance
    Researchers                    Fidelity                  Difference between original web and
                                                             what current WA can deliver
    End Users                      Interpretability          Impression of getting lost in WA


.................................................
                                  Technology concerned




.................................................
                                                    Approach

 Example: Semantic Evolution Detection
  Time-Specific Term Contexts
   Leningrad@1970 (Soviet Union, Hermitage, Moscow,
   Neva River, Baltic Sea,…)
   Saint Petersburg@2009 (Russia, Hermitage, Moscow,
   Neva River, Baltic Sea,…)
  Across-Time Semantic Similarity compares term contexts
   and shows high similarity between Leningrad@1970 and
   Saint Petersburg@2009
  Term Coherence analyzes term contexts and shows that
   Saint Petersburg@2009 and Hermitage@2009 are
   commonly used together


.................................................
                                                    Approach


 Good query reformulations contain query terms similar to
  the original query terms that are commonly used together

 Examples
  Saint Petersburg Museum Leningrad Museum                   ✔

   Leningrad Cowboys Saint Petersburg Cowboys                ✖

   iPod Hearing Damage Walkman Hearing Damage                ✔

   disabled / handicapped / special needs


.................................................
                                                Overview


   Digital Preservation Projects in Europe
    large number, small selection provided below
         - DPE: Digital Preservation Europe, EU, FP6
         - Caspar: Cultural, Artistic and Scientific Knowledge for
           Preservation, Access and Retrieval
         - Planets: Preservation and Long-term Access Networked Services:
         - Shaman: Sustaining Heritage Access through Multivalent Archiving
         - LIWA: Living Web Archives
         - Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It
   IT-oriented Challenges in Digital Preservation
   Some Digital Preservation Research at TUWIEN
   Conclusions

.................................................
                                                    KeepIt
       Kultur, eCrystals, EdShare (and NECTAR) –
                       Preserve It!




.................................................
                                   Project Overview


   Aim: To create a number of exemplar preservation
    repositories from which others can learn
   Small number of very diverse repositories




      Training                                        Deployment
                                    Development
.................................................
                                                                   52
                                        Preservation




         Long Term
                                                       Risk Analysis
       Reliable Storage


                       Mitigation / Action

.................................................
                      Long Term Reliable Storage

                             EPrints is expanding the number places
                                in which plug-ins can be utilised.


                 Import Plug-ins                                Export Plug-ins



                                               EPrints Core
                                     Interfaces, Submission Manager

                                     Database
                                                     Storage Controller
                                     Controller


                                                                            CLOUD
                                                                          (Amazon S3)




.................................................
                                                Overview




          Why do we need Digital Preservation?

          Digital Preservation Projects in Europe

          IT-oriented Digital Preservation Challenges

          Some Digital Preservation Research at TUWIEN



.................................................
                                                    DP Research
   Some provocative (?) observations
   IT R&D frequently suffers from disconnect between
    academia and practice
     - research independent of development
     - theoretical results that cannot be applied to practice
   DP R&D driven strongly by practice
     - many good and useful results
     - reactive instead of proactive
     - results need to be applicable now
     - lacks creative prospect into problems of the future
     - lacks acceptability for non-perfect solutions
     - little real IT research by IT experts
.................................................
                                                    DP Research
   DP research requires several IT sub-disciplines
   IT research in DP needs to
     - build its own research agenda
     - live in an open-minded environment allowing
        (initially) non-perfect solutions
     - be evaluated following stringent standards of
        empirical evidence, validation and benchmarking
     - needs to be pro-active, foreseeing challenges
        of the future
     - address a broader scope of topics that go beyond
        migration/emulation, metadata and data management
        and similar currently dominant issues
   DP an integral issue of all IT systems design
.................................................
                                                    DP Research

  Urgently needed within DP community:
   Identify IT areas that need to contribute to DP research
   For each area, come up with the top-5 research questions
   These research questions should be concrete
     - formulated as a research hypothesis
     - formulated as a PhD topic
   How can we get these IT-disciplines involved?
   How can we get IT researchers motivated?
     - e.g. DPE research challenge
     - remember the “test” (repeating old experiments)!

.................................................
                                                    DP Research

  Potential areas:
   Databases:
     - split of data and function and its description
     - PP-aware design and description
     - modeling data semantics for DP
   IT security
     - secure documents, save formats
     - Signatures, long-term key management
     - DRM
     - long-term non-disclosure

.................................................
                                                    DP Research
  Potential areas:
   Information Retrieval:
     - large-scale indexing and retrieval
     - evolution of semantics and spelling
     - modeling forgetting
   Ethics
     - privacy, digital personalities and forgetting
     - information types and usage + IT support to enforce
   Software Engineering
     - DP as systems engineering
     - secure workflows and trust
     - certification of system for DP fitness
.................................................
                                                    DP Research

  Potential areas:
   Algorithms:
     - semantics from code
     - cross-compilation
     - support for digital archeology
     - evolution of the concept of file formats
   Storage:
     - advanced storage technologies
     - management of large storage systems
     - hybrid analog/digital storage
     - self-describing/monitoring storage systems
.................................................
                                                    DP Research

  Potential areas:
   User interfaces
     - Interfaces of the future
     - How to preserve/communicate interfaces long gone by
   Application domains
     - effect of the quantum computer on DP of conventional
       systems
     - mash-ups and distributed applications
     - pervasive computing and sensor networks
     - virtual worlds
     - threat scenarios in DP
     - home users
.................................................
                                                    DP Research


   Many further areas –
    basically: all sub-disciplines of IT affected?

   What would be the most challenging research questions in
    each of these?

   How can we get experts in these disciplines get involved
    with DP research?

   How can we make DP research more solid research by
    IT standards?

.................................................
                                                Overview




          Why do we need Digital Preservation?

          Digital Preservation Projects in Europe

          IT-oriented Digital Preservation Challenges

          Some Digital Preservation Research at TUWIEN



.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                  Preservation Planning

 Why Preservation Planning?
  Several preservation strategies developed
         - For each strategy: several tools available
              - For each tool: several parameter settings available
  How do you know which one is most suitable?
  What are the needs of your users? Now? In the future?
  Which aspects of an object do you want to preserve?
  What are the requirements?
  How to prove in 10, 20, 50, 100 years, that the decision was
   correct / acceptable at the time it was made?
.................................................
                                           Preservation Planning




.................................................
                                    Preservation Planning




.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                                    Hoppla
    Archiving Solutions for
      - SME
      - SOHO
      - Private Users
    No/little expertise
    Service-oriented concept
    Similar to Antivirus Software                           Home
                                                             Office
    User sends collection profile                           Painless
    Experts perform Pres. Planning                          Persistent
    Rules for Preservation Actions are                      Long-term
     provided                                                Archiving
    Combines back-up and migration
.................................................
                                  HOPPLA Principles

   Need for bit-stream and logical object preservation
         - combine back-up and migration
   No expertise on and effort for digital preservation issues
         - fully automatic solution outsourcing DP expertise,
           inspired by current antivirus solutions
   Stability and system independence
         - rely on plain file system storage with redundant XML metadata
   Trust and accountability
         - aim to fulfill core requirements of audit and certification
           initiatives
   Privacy
         - data resides with users, control over information sent to server


.................................................
                                  HOPPLA Architecture




.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                           Context of Information

   Digital information objects are not isolated
         - Exist in a specific context (to other objects)
   Context is important for
         - Correct interpretation
         - Establishing authenticity
         - Ensuring appropriate use
   Context is difficult to establish/document
   Often missing / incomplete / incorrect when manually
    entered
   Automatically extract context of objects
     - Establish contextual relations between them, generate
       new meta-data
   Visualisation/interaction tool

.................................................
                                           Context of Information
   Context Dimensions:
    Currently establishing context along
          - Time (creation, modification, ...)
          - Type, e.g. MIME types
          - Contributors / Social: people involved
                • Creators, Modifiers, Users
          - Content related features
                • e.g. same images embedded, same keywords

    Other types of dimensions possible, e.g. concurrent usage
     of documents, ...
   Applications:
    Ingest of donations, disaster recovery, IR
.................................................
                                           Context of Information

   Data Warehouse – Snowflake schema




.................................................
                                           Context of Information




.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                  Evaluation of Emulation


        Testing if significant properties stay intact
        It is well known how to extract and compare significant
         properties for migrated objects
        With emulation original object is unchanged,
         comparison of a rendered version is necessary
        Detection of a change in behaviour of object
        Interactivity has to be considered (applications, video
         games, interactive art)




.................................................
                                         Evaluation of Emulation


       Goals
        Perform repeatable experiments
        Extract significant properties from the rendering process
        Automatically compare significant properties extracted
         from different emulation environments
        Allow preservation planning for emulation environments
        Automate parts of the process of testing emulators




.................................................
                                         Evaluation of Emulation

        Different significant states
             - target state, series of states, continuous stream
        Extracting properties from emulation environment
             - in characterization language (e.g. XCL)
             - e.g. cycles, frame rate (average/min/max) number of files/bytes
               accessed on I/O devices, event logs, screenshots, video streams
             - not supported yet by emulators
        Deterministic behaviour of object necessary
             - identify and keep constant causes of non-deterministic behaviour
             - e.g. user input, hardware timer values, random seed generation
        Extracting rendered object from emulation environment
             - from different levels: system memory, video memory, output
               device
.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                          Recovering Digital Objects from
                                Audio Wave Form
  Original system Philips G7400 from 1983 encodes data in
   audio streams for recording on audio tapes
  Migration Tool to extract the encoded data from the audio
   stream and migrate to non-obsolete formats
  Extracted data: Software, screenshots, text & numeric data
  Can read data that is unreadable with original system




.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                  Preserving Virtual Worlds


        Alternative strategy: not the objects and world data are
         extracted but scenes of interaction are recorded
        Drone that moves inside Second Life and video records
         areas with user action
        Besides technical difficulties ethical and legal issues




.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                                    Ethics & Web Archiving


       Web is very volatile
       Web archiving is an essential activity to ensure
        valuable content is being preserved
       Web Archives contain a wealth of extremely valuable
        information

      But:

       Currently most archives are closed to public
       Mostly due to legal reasons
       Need a legal solution

      Is this all?
.................................................
                                    Ethics & Web Archiving


    What should such a legal solution look like?
    Is it only a legal problem?

    There are things that are legal, but ethically dubious
    (There are things that are illegal, but ethically acceptable)

    Privacy is an essential good
    Most societies are increasingly privacy-aware
    Are there ethical concerns, and if so
          - Are we aware of them?
          - Can we do something to address them?

.................................................
                                    Ethics & Web Archiving



       Assumptions and a number of questions:

        The Web is a new publication medium – Is it?

        The ephemeral nature of Web pages is a “design fault” -
         Is it?

        A Web Archive is merely a collection of publicly
         available information – Is it?



.................................................
                                    Ethics & Web Archiving


 Assumptions underlying Web Archiving:

  The Web is a new publication medium?
        - Are people “publishing”
          (conscious decision, effort invested,…)
        - If so, are they aware of it?
        - Are kids allowed to publish?
        - Which parts of the Web are publishing,
          which are communication?
          (ako chatting-in-the-bus?)
        - Do we have a choice of NOT putting some things on the
          Web?


.................................................
                                    Ethics & Web Archiving


  Assumptions underlying Web Archiving

   The ephemeral nature of Web pages is a “design fault”?
         - Post-it notes are based on a “faulty” glue
           -> should we put real glue onto them?
         - If the Web is a publication medium: may there be some who use it
           as such BECAUSE it is ephemeral?
           (art, temporary announcements, CV, …)
         - Does being ephemeral make it more a communication medium in
           the perception of some people?
         - Does society need en ephemeral way of communicating with larger
           communities in an ephemeral manner?
           (speaker’s corner, graffitti, …)



.................................................
                                    Ethics & Web Archiving


  Assumptions underlying Web Archiving:

   A Web Archive is merely a
    collection of publicly available information
         - True, but what about Holism?
           (The whole is more than the sum of it’s parts)
         - Does the ease of use, or the new possibilities of use, change the
           nature of an information collection?
           (full-text search, semantic analysis, IR as opposed to conventional
           archive catalogs)
         - Specialized person profile search engines, used by HR departments
           (special profile generation services to counter-act this)
         - Technical possibilities will increase in the future
           (video analysis, semantic analysis, reasoning, …)
.................................................
                                    Ethics & Web Archiving

  Research Issues:
   What are the ethical constraints, and how they can be
    more precisely defined or formalized,
   Which approaches users of Web archives with potentially
    dubious intentions might employ to obtain information that
    should not be provided by privacy-respecting archives,
   In how far technological solutions such as query analysis,
    machine learning and data mining can help in identifying
    potentially harmful queries, potentially incriminating
    content on Web pages, information worth of protection, or
    combinations thereof,
   How legal regulations might be formulated in order to
    allow (partial) access to Web archive content in a save,
    ethically correct, and useful manner

.................................................
                                                Overview


          Some Digital Preservation Research at TUWIEN
            - Preservation Planning: PLATO
            - Small Home Office Archiving: HOPPLA
            - Establishing Context of Digital Information
            - Evaluating Emulators
            - Recovering Digital Objects from Audio Wave Form
            - Preserving Virtual Worlds
            - Ethical Issues in Web Archiving
            - Digital Preservation Time Capsule
          Conclusions


.................................................
                     Digital Preservation Time Capsule


       Digital Preservation suffers from
        a lack of public awareness
        solid understanding of the levels of complexity
        being abstract / intangible
        failing to graps people’s imagination
        seeming to be rather simple (only storage?)
         even among some experts




.................................................
                     Digital Preservation Time Capsule


     The Planets Digital Preservation TimeCapsule
      is a scientifically solid & visually appealing showcase
       demonstrating general DP challenges & Planets solutions
      is a tangible and exciting demo showing the level of
       complexity & the amount of information involved in
       preserving a few selected objects
      aims at capturing the public’s and experts imagination,
       benefitting from a leveraging effect by involving media
      May serve a basis for training, exhibitions and future
       research


.................................................
                     Digital Preservation Time Capsule


       The Planets TimeCapsule is inspired by

          Voyager Golden Record
          Rosetta Stone
          Long Now Rosetta Project
          Clock for the Long Now

       and other initiatives aimed at
       making long-term thinking
       graspable


.................................................
                     Digital Preservation Time Capsule

      Pick a set of Source Objects
      Describe them with PC-tools and PREMIS metadata
      Add representation information
           - file format standards & documentation
           - programming language definitions, compiler info
             (also for secondary objects)
      Add viewer (binary + source + OS + PREMIS + PC)
      Migrate them to more stable formats
           - PA tools: description + source (+ PREMIS + PC)
           - PP: plan and evaluation of loss (+ PREMIS + PC)
      Store them on different data carriers
           - Carrier description
           - Device description
           - File system description
.................................................
                                                Overview




          Why do we need Digital Preservation?

          Digital Preservation Projects in Europe

          IT-oriented Digital Preservation Challenges

          Some Digital Preservation Research at TUWIEN



.................................................
                                                    Summary

      Digital Preservation is an important issue
      Affects everybody and in all domains
           - cultural heritage, industry, science, society at large, you and me!
        Significant research & development efforts
        Number of solid solutions
        Number of challenging open research issues
        Need to involve core IT experts from different domains
        Need to change perspective on DP research:
           - from ex-post to pro-active
           - from external system to integrated part of all IT system design



.................................................
.................................................
                                   iPRES 02010 Dates



    Paper, Tutorial, Panel & Workshop Submission              5 May 2010
    Notification of Acceptance                                18 Jun 2010
    Submission of Final Versions                              11 Jul 2010

    iPRES 02010                                    September 19-25 2010



                    http://www.ifs.tuwien.ac.at/dp/ipres2010



.................................................
                                              Thank you!




                         http://www.ifs.tuwien.ac.at/dp




.................................................

								
To top