persistent-objects by ashrafp

VIEWS: 3 PAGES: 11

									                      The San Diego Project: Persistent Objects

                                       Reagan W. Moore
                                San Diego Supercomputer Center
                               University of California, San Diego
                                 9500 Gilman Drive, MC-0505
                                    La Jolla, CA 92093-0505
                                        moore@sdsc.edu

Abstract:
The long-term preservation of digital entities requires management of technology evolution. In the
San Diego Project, digital entities are turned into archival objects through the application of
archival processes. Archival objects are organized into collections that provide the context for
discovery and subsequent reuse. The archival processes are managed using data grid technology to
ensure infrastructure independence, and to ensure that the archival processes can be applied in the
future. Traditional preservation approaches of emulation and migration are shown to be variants of
the same technique, differing in the level of software infrastructure and the point in time of the
archival object life cycle at which the preservation steps are applied. We discuss how data grid
technology can be used to build persistent archives.

I. Introduction

The long-term preservation of digital entities is a challenging problem that requires the
characterization of data. One can preserve the digital bits that comprise the digital entity quite
straight forwardly using data grid technology. The ability to manipulate the digital entity at some
point in the future is much more difficult. Either the relationships that are embedded within the
digital entity must be described, and a new application created that understands how to manipulate
the relationships, or the original application must be preserved. The former approach is called
migration and the latter approach is called emulation. Both approaches attempt to accomplish the
same task, the manipulation and presentation of a digital entity that has been turned into an archival
object.

Data grids provide infrastructure independence for the preservation of digital bits through
abstraction mechanisms. The abstractions are used to define the fundamental operations that are
needed on storage repositories to support access and manipulation of data files. The data grid maps
from the storage repository abstraction to the protocols required by a particular vendor product. By
adding drivers for each new storage protocol as they are created, it is possible for a data grid to
manage digital entities indefinitely into the future. Each time a storage repository becomes
obsolete, the digital entities can be migrated transparently to a new storage repository. The
migration is feasible as long as the data grid uses a logical name space to create global, persistent
identifiers for the digital entities. The logical name space is managed as a collection, independently
of the storage repositories. The data grid maps from the logical name space identifier to the file
name within the vendor storage system.

Data grids work by applying mappings to the logical name space and by using abstractions for the
operations on physical resources. We will look at how similar concepts can be used to characterize
and manage archival processes. Archivists apply archival processes to turn digital entities into
archival objects. At the San Diego Supercomputer Center, multiple research efforts are examining
the issues behind the creation of archival objects, and the fundamental concepts that are needed for
long-term preservation. In this paper, we will look at the continuum of approaches that encompass


                                                  1
the traditional archival mechanisms of emulation and migration, examine the levels of abstraction
that are needed to manage technology evolution, and consider self-instantiating archives [4], which
provide a characterization of the archival processes themselves. If we are able to describe how
archival processes can be turned into infrastructure independent abstractions, we should be able to
apply the archival processes in the future on the original digital entities as well as at the time of
accession. The abstraction of archival processes requires a careful assessment of exactly what a
digital entity represents, and what actually occurs when a digital entity is manipulated.

II. Persistent Archive Project

The preservation of digital entities is being explored at the San Diego Supercomputer Center
(SDSC) through multiple projects that examine the application of data grid technology. In
collaboration with the United States National Archives and Records Administration (NARA),
SDSC is developing a prototype persistent archive. The Storage Resource Broker (SRB) data grid
[11] is being used to implement capabilities needed by the traditional archival processes:

       Appraisal        - assignment of a logical name space for the registration of digital entities.
       Accession        - characterization of digital entity structures and validation of digital entity
                           integrity
       Description      - extraction of descriptive metadata and assignment of provenance
                           metadata
       Arrangement      - logical organization of digital entities into collections and physical
                           aggregation into containers.
       Preservation     – support for storage repository and information repository abstractions,
                            and support for replication across archives for disaster back-up
       Access           –dynamic instantiation of collections for web-based browsing and retrieval
                           of archived material.

In collaboration with the University of Maryland, the prototype persistent archive is being
implemented as three federated collections, housed at NARA, U Md, and SDSC. Each collection is
implemented as a separate data grid, controlling and managing the digital entities housed within that
site. Digital entities will be cross-registered between the collections, and replicated onto resources
at other sites to ensure preservation of both descriptive metadata and the digital entities. In
collaboration with the University of California at Berkeley, technologies for the presentation of
digital entities from a variety of data formats are being investigated through the application of
multi-valent document abstractions for the display, annotation, and browsing of documents.

The project is a multi-year effort aimed at demonstrating the viability of the use of data grid
technology to automate all of the archival processes. While the initial sample collections are being
used predominantly to demonstrate automated processing of digital entities, the technology is
scalable to manage collections of millions of digital entities. The underlying data grid technology is
in production use at SDSC to manage over 40 Terabytes of data comprising over 6.5 million files.
The ultimate goal of the prototype persistent archive is to identify the key technologies that
facilitate the creation of a persistent archive of archival objects.

SDSC is aggressively applying the technology in additional projects. For the National Science
Foundation, SDSC is collaborating on the development of a persistent archive [8] for the National
Science Digital Library. Snapshots of digital entities that are registered into the NSDL repository
are harvested from the web and stored into an archive using the SRB data grid. As the digital
entities change over time, versions are tracked to ensure that an educator can find the desired



                                                   2
version of a curricula module. The system is focused on the archival processes of preservation and
access. For the NSDL, educational material is relevant if it can be displayed and manipulated by
current technology.

Both of these projects rely upon the ability to create archival objects from digital entities through
the application of archival processes. We differentiate between the generation of archival objects
through the application of archival processes, the management of archival objects using data grid
technology, and the characterization of the archival processes themselves, so that archived material
can be re-processed (or re-purposed) in the future.

III. Archival Objects

When we apply archival processes, we characterize digital entities in terms of their data,
information, and knowledge content. Digital entity components include:

       Bits of data, the zeros and ones that are written to digital media. The bits are stored as files
        in a storage repository.
       Information, the semantic tags that are applied as labels to the bits. The semantic tags and
        the associated data are treated as metadata and stored in an information repository. Typical
        attributes include provenance and descriptive metadata.
       Knowledge, the relationships that are applied to the bits to define data structures, time
        stamps, and even assign semantic labels.

Operations can be performed upon each component of a digital entity. We use Unix file operations
to read and write bit streams, we use database query mechanisms to do joins across attribute values,
and we can apply logic rules to test for the presence of a relationship within a digital entity. We
differentiate between the characterization of the relationships embedded within a digital entity, and
the operations that can be applied to the relationships to manipulate and display the digital entity.

An archival object is created by applying archival processes to a digital entity. The result is an
infrastructure independent representation of the associated information and knowledge content. The
operations that are performed by the archival processes identify and characterize the relationships
that are present within the digital entity. The embedded relationships can be named and organized
as ontologies. Each archival process creates a corresponding ontology that represents the semantic
terms and structures that are embedded in the digital entity or collection of digital entities, as shown
in Table 1.
                                            Table 1. Naming ontologies for archival processes
The archival processes can be              Archival Processes             Archival Ontologies
characterized as the set of               Appraisal              Logical name space
operations needed to create the           Accession              Data model characterization
associated archival ontology. The         Description            Collection attributes
application of an archival process        Arrangement            Collection organization
results in the creation of both           Preservation           Name space mappings
semantic terms that are used to           Access                 Semantic ontology
describe the digital entities, and
relationships that can be assigned between the semantic terms. The terms and relationships
comprise an ontology that can be stored with the digital entity. If we know what operations can be
applied on the relationships present within the ontology, we then have a characterization that can be
migrated forward into the future.




                                                   3
The archival ontologies listed in Table 1 have simple interpretations that are based upon assignment
of semantic/logical relationships, structural/spatial relationships, temporal/procedural relationships,
or functional relationships. The relationships themselves can be named (reified). Each level of the
archival ontologies characterizes an additional set of relationships:

       Appraisal ontology – the assignment of unique identifiers to each digital entity that can be
        used as global, persistent names, and the mapping of the persistent names to physical file
        names. The mapping is implemented as a logical name space. The fundamental
        relationship is the functional criterion used to associate the unique identifier with the digital
        entity. In the appraisal process, an assessment is made for whether to build archival objects
        out of the named digital entities. The appraisal process may require application of the
        accession, description, and arrangement processes before the assessment can be completed.
       Accession ontology – the characterization of the data structure within the digital entity. The
        fundamental relationships are the structural criteria used to organize bits into bytes, bytes
        into ASCI text or binary floating-point words, words into arrays, etc. The parts of the data
        structure can be named and counted. In the accession process, the data structures are
        checked to ensure the integrity of each digital entity. The structural relationships are
        typically characterized by assigning a mime type to the digital entity.
       Description ontology – the assignment of semantic labels to structures within the digital
        entity and to attributes associated with the digital entities to create descriptive and
        provenance metadata. The fundamental relationships are the functional criteria for
        assigning the semantic labels.
       Arrangement ontology – the characterization of the logical hierarchy used to organize
        multiple digital entities into a collection. The fundamental relationships are the functional
        criteria used to sort digital entities into similar categories, and to chose the semantic labels
        that are applied to the categories. The arrangement process may require repeated
        application of the description process to characterize differences between digital entities.
       Preservation ontology – the creation of mappings on the logical name space to support
        replication, aggregation of digital entities into physical containers (files), access control
        lists, etc. The fundamental relationships are the logical links between semantic terms used
        to characterize each of the mappings. The semantic terms are typically stored as metadata
        in the logical name space. Preservation also requires the characterization of the collection
        that comprises the logical name space. The collection characterization is used to
        dynamically instantiate the collection on new database technology. The fundamental
        relationships are structural organizations of attributes into tables, and identification of
        foreign keys between tables that are needed to support queries.
       Access ontology – the creation of logical mappings between semantic terms used to support
        discovery. The meaning of a given semantic term is defined by an associated context that
        varies with the level of sophistication of the user, the user’s social community, the source of
        the archived material, etc. The fundamental relationships are logical associations of
        semantic terms with their explanatory contexts.

There is a hierarchy of ontologies for the relationships within a digital entity, within collections of
digital entities, and between collections. By assembling digital entities into collections, it is
possible to express relationships that have no meaning within a single digital entity. Examples are
the relationships that are present between different descriptive metadata attribute values. The
attribute values may be themselves be English words that have a logical semantic relationship that is
independent of the logical relationships between the descriptive metadata names.




                                                   4
We have characterized archival objects as digital entities and their associated archival ontologies.
We also need to characterize the software infrastructure that is used to hold and preserve the
archival objects. We note that there is a tight relationship between the ability to apply archival
processes and the ability to preserve archival objects. The preservation of archival objects can be
accomplished by applying the archival processes at some future point in time, instead of at the time
of accession. This capability is called a self-instantiating archive. The archive migrates forward
into the future the processes that are needed to create archival objects. We can ask what software
infrastructure layers are needed to migrate both the archival objects, and the archival processes.

IV. Persistent Archives

The mechanisms provided by a persistent archive need to manage technology evolution. Since
every component of a persistent archive will evolve, a persistent archive must provide true
infrastructure independence for all components:

       Storage system evolution. This includes the ability to migrate to new media and the ability
        to migrate to new storage repositories.
       Information repository evolution. This includes the ability to use new information catalogs
        to manage the descriptive metadata, and the ability to add new attributes over time.
       Presentation environment evolution. This includes the ability to apply new display tools to
        the relationships that are present within the digital entity.

Current archival software environments are constructed out of storage systems, display systems,
operating systems, and display applications. The storage systems provide the ability to manipulate
bit streams. The display devices map data onto a display system. The operating systems manage
system calls that are used to control the storage systems and display devices. The application
interprets the bit streams, infers the relationships present within the digital entity, applies operations
on the relationships, and manages the display device. In current approaches, the knowledge needed
to manipulate the digital entity is entirely encapsulated within the display application. In the terms
of the prior section, current display applications implement the archival ontologies for a given type
of archival object. The software infrastructure components are illustrated in Figure 1.

                                                      Old Application

                                                      Wrap Application




                                                   Wrap Operating System

                                                  New Operating System




                             Wrap Storage System                           Wrap Display System

                             Old Storage System                            Old Display System




                            Migrate Encoding Format

                                 Digital Entity




                Figure 1. Infrastructure component interaction points for preservation


                                                              5
It is possible to build a preservation environment by wrapping any component of the infrastructure.
Emulation focuses on the display application, and tries to preserve the original execution
environment that the display application was designed to use. Emulation accomplishes this by
wrapping the display application to transform the original system calls issued by the display
application to the system calls that a new operating system requires. As new operating systems are
created, the display application must be wrapped again to issue the new system calls. Emulation
corresponds to the migration of the display application wrapping technology forward in time.

It is also possible to wrap the operating system, such that it is able to respond to prior versions of
system calls. From this perspective, the application remains the same, but future operating systems
are wrapped to ensure they can respond to obsolete system calls. Since the number of types of
operating systems is smaller than the number of applications, this approach requires fewer wrapping
systems. The challenge is that the modification of operating systems is much more complicated
than the modification of applications, requiring consistency mechanisms not typically used in
applications.

If the relationships that are implicit within digital entities are characterized as ontologies, and the
operations that can be performed on a given type of relationship are defined, one can then build a
generic emulation environment. In this case, the generic application operates on the defined
relationships. What is kept invariant is the mapping from a defined set of operations to a defined set
of relationships present within the digital entity. Emulation then consists of building tools that map
the defined operations to current operating system technology. The emulation tools can then be
migrated forward into the future, and implemented in current technology that can be processed by
current operating systems. Instead of migrating the original software code that was used to
manipulate the digital entity, a characterization of the operations that would have been performed
by the software code is migrated into the future.

The difficulty is that current applications tightly integrate the interpretation of the relationships that
are present within a digital entity, with the operations that they support on the relationships. To
build a robust emulation environment that is infrastructure independent, the characterization of the
relationships within an archival object should be expressed explicitly as an archival ontology. The
display application would use the archival ontology to define the structures present within the
archival object. Given the structures present within the digital entity, one would then apply the
operations that are needed to manipulate, query, and display the archival object. In one sense, this
is a generalization of the concept of mime types. The ontology defines how a mime type can be
interpreted and manipulated. The display application then maps from the structures defined by the
ontology to the standard set of operations that can be performed on the structures. New display
applications can be used as long as they can apply the standard operations on the ontology
structures.

The remaining layers of the software infrastructure can also be wrapped. Data grids wrap storage
repositories by mapping from a standard set of data manipulation operations to the operations
implemented by a particular vendor product. For each new storage repository protocol that is
created, a new storage repository wrapper is created. The operating system interacts with the
storage repository abstraction, and is shielded from the idiosyncrasies of a particular vendor
product.

The same storage repository abstraction can be accessed by the display application. In this case, the
display application becomes a service that:



                                                    6
       reads an ontology defined for an archival object,
       applies the structural relationships listed by the ontology
       issues I/O calls at the bit level to the storage repository abstraction to import data into the
        digital entity structures
       applies the semantic relationships listed by the ontology to name the structures
       applies temporal relationships to define time stamps
       and displays operations to the user that can be applied to structural, semantic, and temporal
        relationships

The interpretation of the archival object ontology actually requires the same processing steps that
are used for the digital entity. Standards are used to simplify the process, such as XML[12] for
annotating information content, and the Resource Description Format [10] for annotating
relationships. The Semantic Web provides mechanisms to associate meaning with the semantic
labels annotated through XML. The expectation is that the standards for manipulating the archival
object ontology will change slowly in time, and that mechanisms will be developed to migrate
archival object ontologies to new information and knowledge annotation standards. A persistent
archive needs to manage evolution of encoding standards as well as evolution of software
infrastructure.

The archival process of migration explicitly manages the evolution of encoding standards by
applying new encoding standards to the digital entities as they become available. The original
migration concept applied transformative migrations to each of the original digital entities [7]. This
made it possible to manipulate the digital entity with current more sophisticated technology. New
technology provides a much richer manipulation environment by supporting operations on more of
the relationships that are present within a digital entity. Migration is an attempt to wrap digital
entities by explicitly changing their encoding format so the digital entities can be manipulated by
new technology.

Transformative migrations can also be accomplished by migrating the archival ontology to the new
encoding standards. If archival ontologies are defined that characterize the information and
knowledge content of archival objects, the transformative migration of the archival ontology is
equivalent to the transformative migration of the digital entity. The original digital entity can be
kept unchanged, and the characterization of the information and knowledge content within the
digital entity can be migrated onto new standards.

Emulation based on the interpretation of archival ontologies then becomes equivalent to the
migration of archival ontologies. Both approaches can isolate the management of standards
evolution to transformative migrations on the archival ontology that characterizes the relationships
and semantics within the digital entity. The use of archival processes to create archival ontologies
unifies the approaches of emulation and migration. Archival ontologies can be viewed as a
generalization of the OAIS archival information packages [9] to represent the knowledge content
within digital entities.

V. Data Grids as Preservation Mechanisms

Persistent archives work by creating abstractions that represent the operations that can be performed
upon digital entities, and the operations that can be sustained by archival software infrastructure.
The role of archival processes is to facilitate the mapping of manipulations of digital entities to
manipulations on storage repositories through the creation of archival ontologies. The archival
ontologies are used as the abstractions for the management of the evolution of encoding standards.



                                                   7
Data grids provide the abstraction mechanisms needed to manage the evolution of software
infrastructure [6]. The abstraction mechanisms implement “grid transparencies” that facilitate the
discovery, access, and retrieval of digital entities distributed across multiple storage repositories.
The transparencies include:

       Name transparency – find a digital entity without knowing its name. This is typically
        accomplished by queries on attributes that are associated with the digital entity. The
        attributes are mapped to a global persistent identifier that is maintained for each digital
        entity and managed in a logical name space.
       Location transparency – access a digital entity without knowing where it is stored. A
        mapping is applied to the logical name space to associate a physical file at a specific storage
        repository with the logical name. Additional mappings can be imposed on the logical name
        to support replicas, multiple copies of the digital entities, each stored at a different location.
       Access transparency – retrieve a digital entity without knowing the access protocol required
        by the remote storage repository. A storage repository abstraction is used to convert from a
        standard set of file manipulation operations to the operations supported by the remote
        storage system. An access abstraction is also provided that characterizes a standard set of
        operations that are performed by applications. The data grid maps from the application
        operation standard to the storage repository standard. This makes it possible for all types of
        applications to manipulate data stored on any type of storage system.
       Authorization transparency – apply access control mechanisms independently of the storage
        repository. Operating systems typically associate access control mechanisms with the
        storage repository through the creation of User identifiers, and specification of access
        restrictions relative to the User identifiers. Each file is owned by an individual who assigns
        access permissions. Authorization transparency can be achieved by providing an
        abstraction for the ownership of the archived data. Data grids use collection identifiers as
        the owners of the files. The data grid supports access control lists for each digital entity.
        Note that the access control lists now are associated with the digital entity, not the storage
        repository. Access to the data is achieved by authenticating a user to the data grid, the
        checking of access controls on the requested digital entity, and the retrieval by the data grid
        of files that are stored under the collection identifier managed by the data grid.

These transparencies are used in persistent archives to implement authenticity mechanisms to track
all manipulations that are done on the original digital entities. A mapping can be imposed on the
logical name space to implement audit trails, recording all accesses to the archival objects by person
and date. A mapping can be imposed on the logical name space to maintain checksums or digital
signatures to prove that the bits have not been modified. Mappings can be imposed on the logical
name space to drive transformative migrations of archival ontologies as encoding standards change.
Mappings can also be imposed on the logical name space to support replication of digital entities for
disaster recovery in case of catastrophic failure of a storage system [5].

Authenticity requirements can be implemented as mappings on the logical name space used to
organize archival objects.

Data grids are in production use across a wide variety of scientific disciplines and user
communities. The evolution of data grids has been driven by explicit project requirements for data
sharing, data publication, and data preservation. The communities that have been implementing the
associated software infrastructure include the digital library community, the data grid community,




                                                    8
and the persistent archive community. Each community is providing essential technology for the
long-term preservation of data:

       Data sharing (data grid community) – providing the ability to access data across distributed
        resources through the use of logical name spaces and storage repository abstractions
       Data publication (digital library community) – providing the standards for the
        characterization of provenance and descriptive metadata, and the manipulation mechanisms
        for annotating and displaying digital entities
       Data preservation (persistent archive community) – providing the archival processes to
        convert digital entities into archival objects characterized by archival ontologies.

The San Diego Supercomputer Center Storage Resource Broker (SRB) [1, 2, 11] has been used as a
test platform for the development of data grid transparencies, digital library collection management
systems, and persistent archive authenticity mechanisms. The SRB implements a logical name
space that can be organized as a collection hierarchy, with different metadata attributes at each level
in the hierarchy. Mappings are imposed on the logical name space to provide naming, location,
access, and authorization transparencies.

The SRB provides support for infrastructure independence through the use of storage repository
abstractions and information repository abstractions.

The SRB storage repository abstraction is based upon standard Unix file system operations, and
supports drivers for accessing digital entities stored in Unix file systems (Solaris, SunOS, AIX, Irix,
Unicos, Mac OS X, Linux), in Windows file systems (98, 2000, NT, XP, ME), in archival storage
systems (HPSS, UniTree, DMF, ADSM), as binary large objects in databases (Oracle, DB2, Sybase,
SQLServer, PostgresSGL), in object ring buffers, in storage resource managers, in FTP sites, in
GridFTP sites, on tape drives managed by tape robots, etc. The SRB has been designed to facilitate
the addition of new drivers for new types of storage systems.

The SRB information repository abstraction supports the manipulation of collections stored in
databases. The manipulations include the ability to add user-defined metadata, import and export
metadata as XML files, support bulk registration of digital entities, apply template-based parsing to
extract metadata attribute values, and support queries across arbitrary metadata attributes. The SRB
automatically generates the SQL that is required to respond to a query, allowing the user to specify
queries by operations on attribute values.

The challenge for the persistent archive community is the demonstration that data grid technology
provides the correct set of abstractions for the management of software infrastructure. The
Persistent Archive Research Group [3] of the Global Grid Forum is exploring this issue, and is
attempting to define the minimal set of capabilities that need to be provided by data grids to
implement persistent archives. A second challenge is the development of archival ontologies that
characterize the relationships present within digital entities. Is it sufficient to characterize a digital
entity by a mime type, or is there a generic ontology that can be used across mime types? An early
example of this is the METS standard for multi-media objects, which attempts to define the
structural relationships within a document. A third challenge is the specification of a standard set of
operations that can be applied to the relationships within a archival object. For a given type of
relationship, there are associated permissible operations. Thus time stamps can be compared for
casual relationships, structural arrays can be displayed against implicit coordinate systems, and
semantic labels can be queried. Can the permissible operations be quantified for each type of
relationship?



                                                    9
VI. Summary

Persistent archives manage archival objects by providing infrastructure independent abstractions for
interacting with both archival objects and software infrastructure. Archival objects are digital
entities that have been characterized through archival ontologies. The archival ontologies define the
structural, semantic, and temporal relationships that are present within the digital entity. The
archival ontology is an abstraction of the information and knowledge content of a digital entity.
The approaches of emulation and migration for managing the evolution of encoding standards can
be unified through the creation of archival ontologies. The new encoding standards are applied to
the archival ontology, making it possible to keep the digital entity unchanged. The display
application uses the archival ontology to define how to manipulate the original digital entity,
independently of the original display application. Migration applies transformative migrations to
the archival ontology. Emulation builds a generic application that applies operations to the
relationships specified within the archival ontology. Data grids provide the abstraction mechanisms
for managing evolution of storage and information repositories. Persistent archives use the
abstractions for digital entities and software infrastructure to preserve the ability to manage, access
and display archival objects while the underlying technologies evolve.


VII. Acknowledgements

The concepts presented here were developed by members of the Data and Knowledge Systems
group at the San Diego Supercomputer Center. Michael Wan developed the data management
systems, Arcot Rajasekar developed the information management systems, Bertram Ludasecher and
Amarnath Gupta developed the logic-based integration and knowledge management systems, and
Ilya Zaslavsky developed the spatial information integration and knowledge management systems.
Chaitan Baru developed information mediation systems. Richard Marciano created digital library
and persistent archive prototypes. This research was supported by the NSF NPACI ACI-9619020
(NARA supplement), the NSF NSDL/UCAR Subaward S02-36645, the DOE SciDAC/SDM DE-
FC02-01ER25486 and DOE Particle Physics Data Grid, the NSF National Virtual Observatory, the
NSF Grid Physics Network, and the NASA Information Power Grid. The views and conclusions
contained in this document are those of the authors and should not be interpreted as representing the
official policies, either expressed or implied, of the National Science Foundation, the National
Archives and Records Administration, or the U.S. government.

VIII. References

[1] Baru, C., R, Moore, A. Rajasekar, M. Wan, "The SDSC Storage Resource Broker,” Proc.
    CASCON'98 Conference, Nov.30-Dec.3, 1998, Toronto, Canada.

[2] Baru, C., R. Moore, A. Rajasekar, W. Schroeder, M. Wan, R. Klobuchar, D. Wade, R. Sharpe,
    J. Terstriep, (1998a) “A Data Handling Architecture for a Prototype Federal Application,”
    Sixth Goddard Conference on Mass Storage Systems and Technologies, March, 1998.

[3] Grid Forum Remote Data Access Working Group.
    http://www.sdsc.edu/GridForum/RemoteData/.

[4] Ludäscher, B., R. Marciano, R. Moore, “Towards Self-Validating Knowledge-Based Archives,”
    pp. 9-16, RIDE-DM 2001.

[5] MCAT - “The Metadata Catalog”, http://www.npaci.edu/DICE/SRB/mcat.html


                                                  10
[6] Moore, R., C. Baru, A. Rajasekar, R. Marciano, M. Wan: Data Intensive Computing, In ``The
    Grid: Blueprint for a New Computing Infrastructure'', eds. I. Foster and C. Kesselman. Morgan
    Kaufmann, San Francisco, 1999.

[7] Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A.
    Gupta, “Collection-Based Persistent Digital Archives – Parts 1& 2”, D-Lib Magazine,
    April/March 2000, http://www.dlib.org/

[8] Moore, R. (2000a), “Knowledge-based Persistent Archives,” Proceedings of La Conservazione
    Dei Documenti Informatici Aspetti Organizzativi E Tecnici, in Rome, Italy, October, 2000.

[9] Reference Model for an Open Archival Information System (OAIS). submitted as ISO draft,
    http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-1.pdf, 1999.

[10] Resource Description Framework (RDF). W3C Recommendation www.w3.org/TR/REC-rdf-
     syntax, Feb. 1999.

[11] SRB - “The Storage Resource Broker Web Page, http://www.npaci.edu/DICE/SRB/

[12] XML - Extensible Markup Language. http://www.w3.org/XML/, 1998.




                                               11

								
To top