Storage by pengxuebo


   Why is storage an issue?
       Space requirements
       Persistence
       Accessibility
   Needs depend on purpose of storage
       Capture/encoding
       Access/delivery
       Preservation
Storage: Working Space
   Space for storage of digital files during
    capture/encoding/quality control process
   Possibilities
       PC hard drive
       File server, e.g. marengo (LIT)
       DLP file server
   Issues
       Capacity, backup, speed, accessibility
Storage: Access/Delivery
   Storage for web delivery of images, audio,
    text, etc.
   Possibilities
       UITS web server, under library account
       UITS streaming media server (audio/video)
       DLP web server
   Issues: capacity, backup, performance,
    software integration, maintenance/migration
Storage: Preservation
   Much harder problem
   Longer term
       Issues of longevity of media, hardware, file format
       Where are the files?
   Larger files
       Hard disk storage, traditional backup methods not
   Infrequency of access
       Problems do not become immediately evident
Long-Term Storage Options
   Removable media
       e.g. CD-R, DVD-R
       Pros: cheap, easy, produces tangible item
       Cons: Low capacity, physical space
        requirements, unknown longevity,
   Nearline storage
       UITS Massive Data Storage Service
   Massive Data Storage Service
   HPSS (High Performance Storage System)
       Developed as collaboration of IBM and US national
   Four tape robots (two at IUB, two at IUPUI)
       Data can be mirrored
   540 TB total storage
       ~75 TB used as of April 2001
MDSS – A Sense of Scale
   2 Kilobytes : A typewritten page
   5 Megabytes : Complete works of Shakespeare
    OR 30 seconds of TV quality video
   1 Gigabyte (1000MB) : 1 pickup truck filled with paper
    OR a symphony in hi-fi sound
   1 Terabyte (1000GB) : All the X-ray films in a large hospital OR
    paper from 50,000 trees
   10 Terabytes : The printed collection of the US Library of
   50 Terabytes : The contents of a large mass store system
   8 Petabytes (8000TB) : All information available on the web
   200 Petabytes : All the printed material (in the world!)
MDSS Storage Infrastructure
   Access
       FTP/PFTP: (Parallel) File Transfer Protocol
       DFS: Distributed File System (being phased out)
       HSI
   Not practical for delivery
       Hierarchical storage (metadata on disk, data on
        tape -> 30-90 second to start transfer.)
       File size – chunks of 50 MB or greater work best
            Small files aggregated into larger .tar or .zip files
DL Objects
   Digital library “objects” have many parts
       Metadata
       Preservation files
       Delivery files
   How do we keep them connected?
       Now: Good practice in file naming, directory
        organization, project documentation -not scalable!
       Future: Digital object repository
Data Persistence
   Key is migration
   Keeping the bits alive - MDSS responsibility
       Physical media
       Logical media format
   Keeping the bits understandable - MDSS
    user responsibility
       File format
       Metadata
   Small “pockets” of digital content pose a
    problem for migration
      DL Object Repository

                            Preservation version in MDSS

 Users and     Repository
applications    System      Delivery version on web server

                                  Metadata records
DL Repository Models
   OAIS: Open Archival Information
    System Reference model
   Fedora: Flexible and Extensible Digital
    Object and Repository Architecture
       Developed at Cornell and UVa
       IU DLP in deployment group
DLP Storage Services
   Consulting
   Server space for production and access
   Persistent naming service (PURL server)
   Facilitation of access to UITS services
       Streaming media
       MDSS
   Developing repository service

   Contact:

To top