Slides - pantherFILE by liuhongmeiyes


									                       714: Metadata

   Conceptual Models of Metadata

Margaret Kipp - -
●   Part 1: Entity                    −   4. Levels of
    Relationship Model                    Granularity
                                      −   5. Metadata Sources
    −   1. entities, attributes
        and relationships         ●   Part 3: Administrative
    −   2. E-R models                 and Structural
                                      Metadata Models
●   Part 2: Conceptual
    Models                            −   1. EAD
    −   1. Local Context              −   2. Preservation
    −   2. Conceptual Models
                                      −   3. Rights Models
    −   3. FRBR
Part 1: Entity Relationship Modelling
●   A data model is a way of organising the data
    based on the relationships between the data
●   Metadata data modelling involves identifying
    important entities, attributes and relationships
    of the items in the collection
●   Also need to consider work/item and
    work/image dichotomies. What is being
Example Data Model
     Benefits of Entity-Relationship
●   Communicate concepts in collective
    organisational memory
●   Collecting and documenting an organisation's
    information requirements
●   Provide a pictorial map of the system
●   Can be developed and refined
●   Clearly defines the scope of the project
●   Separate information from the activities and
    work practices
Example Conceptual Model
         What is Entity-Relationship
●   Entity-Relationship (E-R) modelling is a tool for
    planning the design of a structured data system
●   E-R analysis allows us to:
    −   Identify the major ingredients of complex situations:
        Entities or Elements
    −   Identify the important aspects of these entities:
        Attributes or Subelements
    −   Discover and assess important aspects of
        interconnections between entities: Relationships
    −   Discover important rules concerning all three
        aspects: Constraints
                    E-R Modelling
●   E-R models consist of:
    −   Entities
    −   Attributes
    −   Relationships
    −   Constraints (validation)
●   E-R analysis is:
    −   moving from a description of metadata we want to
        develop to a model of relationships between
        elements (entities)
    −   examining a situation to discover how entities,
        attributes and relationships interconnect to build a
        realistic model of the situation
                  Business Rules
●   Business rules are statements derived from a
    description of an organization's work practices
●   Business rules define one or more of the
    following modeling components:
    −   Entities
    −   Relationships
    −   Attributes
    −   Cardinalities (one-to-many, one-to-one, many-to-
    −   Constraints
        Business Rules: Examples
●   An author has a name, birth date, and set of
    associated works.
●   A work has a title, associated author(s),
    associated subject(s).
●   An item has a title, associated author(s),
    subject(s), format, publisher, etc. and is related
    to a work.
     Creating an Entity-Relationship
      Model from Business Rules
●   Create an entity relationship diagram from
    business specifications or narratives.
●   "A book has a publisher. Publishers publish
    many books."

        BOOK            publishes   PUBLISHER

        * title                     * name
        * ISBN          has a       * address
                                    0 URL
E-R Modelling Conventions (cont.)
  defined by a solid connecting line, temporary
   relationships may be denoted by a dotted line
  end of line indicates degree of relationship (single line
   - one-to-one, crow's foot - one-to-many or many-to-
  it is possible for an entity to be related to itself
     e.g. MODS record can contain a related MODS record
  Each relationship has:
     a name, e.g. written by, published by, is about
     an optionality, e.g. mandatory, if available, optional
     a degree, e.g. one-to-one, one-to-many, many-to-many
CDWA Entity-Relationship Model

 Represents the relationships between a work and the elements describing a
   Relationship Types (Cardinality)
One-to-one (rare)
   Have a degree of one and only one in both directions
  e.g. library director and library
Many-to-one or one-to-many (very common)
   have a degree of one or more in one direction and only
     one in the other
  e.g. users and a library
   have a degree of one or more in both directions
   resolved with an intersection entity (join table)
   e.g. employees and skills
          Entity Relationship Models
●   Start from a description or narrative or a set of
    −   "Each book has a publisher. A book can have many
        authors. Authors can write many books. Publishers
        can publish many books. Each book has a title.
        Each book can be translated into other language(s)
        and published in multiple formats."
●   What are the major entities?
●   What are the important relationships?
●   How would you draw this as an E-R model?
In Class Exercise: Modelling the Toy
         Metadata Schema
●   Develop a model for the Toy Metadata
●   Write a short paragraph describing common
    interactions a child may have with a toy.
●   Need to consider:
    −   groups of related toys
         ●   related by brand, by maker, by activity, etc.
    −   related people
    −   concepts and objects
    −   attributes of objects
●   Compare your results to our current elements
          Part 2: Conceptual Models
●   basic unit of management and exchange is the
    metadata record
●   metadata records may be defined based on:
    −   composition and relationship to other data
    −   types of records (e.g. descriptive, administrative or
        intellectual content)
    −   physical storage of records or presentation
    −   requirements of minimal and full records (Zheng
        and Qin p. 149)
          Metadata Requirements
●   conforms to community standards based on
●   supports interoperability
●   uses authority control and content standards for
    description and collocation
●   clear statement of conditions and terms of uses
    of digital objects
●   supports long term curation and preservation
●   metadata surrogate records are themselves
    objects (NISO Framework Advisory Group)
    Beyond Descriptive Cataloguing
●   metadata creators need to consider digital
    rights and preservation issues from the
    beginning of record creation just as archivists
    and museum curators do
●   metadata creators also need to consider the
    encoding methods used for the metadata:
    −   encoding format
    −   metadata embedded with item or separately
    −   interoperability/sharing
Framework for Shareable Metadata
●   Content: optimise content for sharing
●   Consistent: records within a set should be
    consistent both semantically and syntactically
●   Coherence: records should be self-explanatory
●   Context: information provided by records
    should have the appropriate context
●   Communication: providers and aggregators
    should maintain good communication
●   Conformance: conform to established
    standards (Shreeves, Riley and Milewicz 2006)
          Omission of Local Context
●   a common problem to librarians: in searching a
    special collection, certain terms are so common
    as to be useless:
    −   e.g. search for health or medicine in Pubmed
    −   e.g. picture of Theodore Roosevelt on a horse in a
        museum dedicated to him vs in a natural history
        museum (p. 150-1)
    −   e.g. assume use of LC subject headings
                 Local Context 2
●   context may be unnecessary or redundant in
    defining items in a special collection but
    necessary when sharing metadata (e.g. most
    Pubmed items are about medicine or health)
●   DCMI suggests use of the scheme attribute in
    XML to identify specific schemes such as
    controlled vocabularies or formats for
    −   e.g. dcterms.LCSH
    −   dcterms.W3CDTF
    Conceptual Models of Metadata
●   the basic unit of metadata is a statement
●   a statement consists of a property (element)
    and a value
●   metadata statements describe resources
    (Sutton 2007)

     DCMI Abstract Model

        resource              statement
                                            Jane Q. Public
     Metadata for          creator = Jane
     Everybody             Q. Public           value
Description Sets, Descriptions and
●   a description set is a set of one or more
    descriptions which describes a single resource
●   a description is made up of one or more
●   a statement consists of (instantiates) a
    property-value pair made up of a property and a
    value (p. 152)
     DC one-to-one principle: create one metadata
     description for one and only one resource
     and group related descriptions (1,2) into a set

Metadata                            1           2      Jane Q. Public
Subject                                                      Name

                                   written by                  Birthplace
          Metadata for Everybody                             Hometown
               Levels of Granularity
●   item vs collection records, are we describing an
    item or a collection of items?
    ●   many metadata schemas provide support for both
        types of records (e.g. CDWA, EAD)
●   at the repository level, would a collection level
    record describing the entire repository be
    useful? (e.g. EAD header)
    ●   in a multi-use repository it may be necessary to
        describe entire collections as well as individual
        items (e.g. American Memory Project)
         Repository Level Records
●   collection registration: register the repository
    with search and retrieval tools
●   network discovery: provide information to
    network search agents about collection
●   user documentation: provide users of the
    collection with collection information
●   management: manage the collection/collections
    at a higher level
           Resource Decomposition
●   some elements in a metadata schema are
    atomic (have little to no internal structure)
●   other elements have a great deal of internal
    −   e.g. a paper has an introduction, background or
        literature review, methodology, data collection
        and/or analysis, discussion, conclusion and
    −   a digital archive of articles may find it useful to
        separate some elements such as abstract and
        references and encode them for search or citation
                         Other Issues
●   Semantic Decomposition
    −   a combination of record level metadata and text
        level markup
         ●   e.g. DC record for a TEI encoded text
●   Degree of Granularity
    −   choice of the level of description
    −   use of structural metadata to describe a complex
        resource (e.g. individual pages of a scanned book)
    −   how much access is necessary/essential?
                  CIDOC CRM
●   "The CIDOC Conceptual Reference Model
    (CRM) provides definitions and a formal
    structure for describing the implicit and explicit
    concepts and relationships used in cultural
    heritage documentation."
●   common language for the formulation of
    requirements for metadata systems and a guide
    for good conceptual modelling
CIDOC CRM General Model

                   Doerr 2003, 81
CIDOC CRM General Categories

                      Doerr 2003, 85
CIDOC CRM Properties

                 Doerr 2003, 86
        Methods of Metadata Creation
●   manual
    −   metadata created manually by a human
    −   the traditional library method
●   automatic
    −   metadata created by the system creating the item
        (e.g. digital camera)
    −   metadata generated automatically by analysis of
        the item
●   combination
          Manual Metadata Creation
●   cataloguing of pre-internet resources
    −   metadata created by professionals
●   internet metadata creation using metadata
    editors and web based metadata creation forms
●   metadata created by submitters/authors of
    −   webpages may contain author created metadata
    −   article repositories may also contain author created
        Automatic Metadata Creation
●   automatic creation at same time as object
●   metadata extraction
    −   uses layout and other document structures to select
        appropriate metadata via machine learning
●   natural language processing
    −   uses preexisting knowledge systems to select
        important elements of the document
●   challenges: automatic indexing/classification
    −   uses statistical learning methods
         Harvesting and Converting
●   metadata can be harvested from other systems
    using a standard protocol such as the Open
    Archives Initiative - Protocol for Metadata
    Harvesting (OAI-PMH)
●   metadata can also be automatically or manually
    converted from one format to another using a
●   in either case, it is usually necessary to "clean"
    the data to check for conversion errors
    −   e.g. due to elements which do not match between
Part 3: Administrative and Structural
                Archival Metadata
●   EAD (Encoded Archival Description)
    −   encodes archival finding aids which describe an
        archival collection
    −   e.g.
●   XML based encoding for archival finding aids
●   archival material may be stored in boxes or
    folders and is organised by series, box, folder,
●   EAD is hierarchical since archival data is
    −   e.g. personal papers may include manuscripts,
        financial records, correspondence, journals, etc.
●   contains description of provenance and access
●   includes information about metadata creator
                Some EAD Entities
●   <eadheader>
    −   wrapper element for metadata about the metadata,
        including the metadata creator
●   <archdesc>
    −   wrapper element for the archival description itself
    −   describes the content, context and extent of the
        archival materials
●   <frontmatter>
    −   information about creation, publication and use of
        the finding aid
                Digital Preservation
●   digital records introduce new issues in
    −   need to preserve software/hardware required to
        read/display digital files or upgrade files to new
    −   digital information growing exponentially
●   3 major areas:
    −   long term maintenance of digital files
    −   ongoing accessibility of contents
    −   ensure viability of data contents/organisation
                   Structural Metadata
●   metadata does not just describe the item
●   also describes:
    −   how the digital object was created
         ●   e.g. digitised, born digital, etc.
    −   structural aspects
         ●   e.g. table of contents, links between scanned pages of an
             item, etc.
         ●   often neglected in large scale projects as this is time
             consuming, but extremely important to long term
             preservation as it maintains the linkages between items
             and parts of items
             Preservation Metadata
●   includes: descriptive, administrative (e.g.
    rights), technical and structural metadata
●   information necessary to maintain existing links
    between electronic files that are related:
    −   e.g. individual pages from a scanned document
    −   volumes of a series
    −   links between a digital scan and the object
   Reference Model for an Open
Archival Information System (OAIS)
●   common framework of terms and concepts for
    recording preservation metadata
    −   what to record and how to store it
●   defines functional areas
    −   storage, access, preservation planning, etc.
●   defines interfaces between functional areas
●   defines classes for use in archiving
                     OAIS Model
●   taxonomy of information objects and packages
    for archived objects and the structure of their
    associated metadata (p. 61)
●   preservation issues:
    −   media migration, compression, format conversions,
        access service preservation
●   major categories:
    −   Content Information, Representation Information,
        Preservation Description Information, Provenance
        and Fixity Information
            Preservation Metadata:
           Implementation Strategies
●   an implementation of the OAIS model
●   records information that supports and documents the
    preservation process
●   five major categories:
    −   object: information about the item
    −   intellectual entity: information about the work
    −   event: actions associated with preservation
    −   agent: person or organization related to the preservation
    −   rights: rights pertaining to the object
       Metadata Encoding and
    Transmission Standard (METS)
●   XML schema designed to store and transmit all
    metadata associated with an item
●   a framework for storing metadata in different
             Rights Metadata Elements
●   What elements are               ●   Rights elements:
    rights metadata                     −   Author/creator
                                        −   Nationality of creator
    −   Element must provide            −   Date of birth and/or
        information about:                  death
         ●   the creator
                                        −   Title
         ●   temporal or spatial
             location of creation       −   Date created/modified
         ●   creation date, etc.        −   Publication status
                                        −   Date of rights
 ●    California Digital Library (CDL) Rights
      Management Group developed a list of rights
      metadata from existing schemas (Zeng & Qin)
Standard       Category        Element                              Subelement
DCMES (DC)                     rights                               accessRights
DCTERMS                        rightsHolder
CDWA Lite                      Rights for Work
                               Rights for Resource
LOM            6. Rights       6.1 Cost
                               6.2 Copyright & other restrictions
VRA Core 4.0                   Rights                               rightsHolder text
PBCore         18.00 pbcore-   18.01 rightsSummary
MODS                           accessCondition
    Digital Rights Management (DRM)
●   designed to replicate the difficulties of copying print
    materials and enforce control on access and use of
●   designed to enforce use restrictions set by rights
    holders on materials
●   designed to track rights digitally and thus more
●   consequences include:
     −   limits on use of materials even where such uses would
         qualify as fair use
     −   limits on which devices can be used to access and use
     −   limits on tools which alter or remove DRM
●   interoperability of data in e-commerce systems
●   a metadata framework for storing and sharing rights
    metadata in e-commerce with nearly 400 elements
●   XML based
●   consists of:
    −   metadata model
    −   high level metadata dictionary
    −   principles for mapping to other schemas
    −   directory of relevant parties
     well formed <indecs> metadata
●   principles:
     −   unique identification
     −   functional granularity
     −   designated authority
     −   appropriate access
●   views entities as related to: commerce,
    intellectual property or general issues
        DOI (Digital Object Identifier)
●   an initiative to create a unique, persistent digital
    identifier for content objects
●   the DOI points to the location of an item on the
    web (the URL) but this location can be changed
    without changing the DOI
●   e.g. DOIs for journal articles:
    −   10.1007/s11192-007-1707-y
    −   10.1145/1255175.1255284
         DOI Model and Elements
●   14 high level elements
●   e.g. DOI, DOIGenre, Identifier, Title, Descriptor,
    Type, Origination, Mode, Form, Extent,
    Context, Subject, Event, CreationLink
●   described in Paskin and Rust 1999:
                            Parts of a DOI
●    DOI has two parts: prefix and suffix
●    prefix has two parts: directory (where DOI is
     registered) and registrant prefix (publisher, etc)
●    suffix: optional code ID to identify a known
     identification scheme and an ID number

        ONIX (ONline Information
●   developed by EDItEUR ( an
    international group for e-commerce in books
    and serials
●   XML based standard for storing marketing and
    distribution information for publishers and
●   standards for books and serials
          ONIX Model and Elements
●   ONIX for Books:
●   Example records:
●   ONIX to MARC crosswalk:
●   Translating between ONIX and MARC:
                                  ONIX Examples
●    <ProductIdentifier>                                         ●     <productidentifier>
●    <ProductIDType>02</ProductIDType>                           ●     <b221>02</b221>
●    <IDValue>0816016356</IDValue>                               ●     <b244>0816016356</b244>
●    </ProductIdentifier>
●    <ProductForm>BB</ProductForm>
                                                                 ●     </productidentifier>
●    <Title>
                                                                 ●     <b012>BB</b012>
●    <TitleType>01</TitleType>                                   ●     <title>
●    <TitleText textcase = “02”>British English,                 ●     <b202>01</b202>
     A to Zed</TitleText>                                        ●     <b203 textcase = “02”>British English,
●    </Title>                                                          A to Zed</b203>
●    <Contributor>                                               ●     </title>
●    <SequenceNumber>1</SequenceNumber>                          ●     <contributor>
●    <ContributorRole>A01</ContributorRole>
●    <PersonNameInverted>Schur, Norman
                                                                 ●     <b035>A01</b035>
                                                                 ●     <b037>Schur, Norman W</b037>
●    <BiographicalNote>A Harvard graduate in                     ●     <b044>A Harvard graduate in Latin
     Latin and Italian literature, Norman Schur                        and Italian literature, Norman Schur
     attended the University of Rome and the                           attended the University of Rome and
     Sorbonne before returning to the United                           the Sorbonne before returning to the
     States to study law at Harvard and                                United States to study law at Harvard
     Columbia Law Schools. Now retired from
     legal practise, Mr. Schur is a fluent speaker                     and Columbia Law Schools. Now
     and writer of both British and American                           retired from legal practise, Mr. Schur is
     English</BiographicalNote>                                        a fluent speaker and writer of both
●    </Contributor>                                                    British and American English </b044>
                                                                 ●     </contributor>
     Open Digital Rights Language
●   an attempt to develop an open standard for
    policy expressions (including rights)
●   extensible model, intended to be encoded in
●   core entities: assets, rights and parties and
    relationships between them
●   model has been incorporated into other
    metadata schemas since it is an open standard
ODRL Model
         ●   this model uses an
             model to describe
             the relationships
             between entities
         ●   suggests XML
    Discussion: Metadata and Rights:
      Google Books and HathiTrust
●   HathiTrust and Google Books Settlements

●   Google Books Metadata Issues


To top