Docstoc

Slides - pantherFILE

Document Sample
Slides - pantherFILE Powered By Docstoc
					                       714: Metadata




   Conceptual Models of Metadata
              Records


Margaret Kipp - kipp@uwm.edu - https://pantherfile.uwm.edu/kipp/public/courses/714/
                            Outline
●   Part 1: Entity                    −   4. Levels of
    Relationship Model                    Granularity
                                      −   5. Metadata Sources
    −   1. entities, attributes
        and relationships         ●   Part 3: Administrative
    −   2. E-R models                 and Structural
                                      Metadata Models
●   Part 2: Conceptual
    Models                            −   1. EAD
    −   1. Local Context              −   2. Preservation
                                          Models
    −   2. Conceptual Models
                                      −   3. Rights Models
    −   3. FRBR
Part 1: Entity Relationship Modelling
●   A data model is a way of organising the data
    based on the relationships between the data
●   Metadata data modelling involves identifying
    important entities, attributes and relationships
    of the items in the collection
●   Also need to consider work/item and
    work/image dichotomies. What is being
    described.
Example Data Model




   http://pbcore.org/elements/
     Benefits of Entity-Relationship
               Modelling
●   Communicate concepts in collective
    organisational memory
●   Collecting and documenting an organisation's
    information requirements
●   Provide a pictorial map of the system
●   Can be developed and refined
●   Clearly defines the scope of the project
●   Separate information from the activities and
    work practices
Example Conceptual Model
         What is Entity-Relationship
                 Modelling?
●   Entity-Relationship (E-R) modelling is a tool for
    planning the design of a structured data system
●   E-R analysis allows us to:
    −   Identify the major ingredients of complex situations:
        Entities or Elements
    −   Identify the important aspects of these entities:
        Attributes or Subelements
    −   Discover and assess important aspects of
        interconnections between entities: Relationships
    −   Discover important rules concerning all three
        aspects: Constraints
                    E-R Modelling
●   E-R models consist of:
    −   Entities
    −   Attributes
    −   Relationships
    −   Constraints (validation)
●   E-R analysis is:
    −   moving from a description of metadata we want to
        develop to a model of relationships between
        elements (entities)
    −   examining a situation to discover how entities,
        attributes and relationships interconnect to build a
        realistic model of the situation
                  Business Rules
●   Business rules are statements derived from a
    description of an organization's work practices
●   Business rules define one or more of the
    following modeling components:
    −   Entities
    −   Relationships
    −   Attributes
    −   Cardinalities (one-to-many, one-to-one, many-to-
        many)
    −   Constraints
        Business Rules: Examples
●   An author has a name, birth date, and set of
    associated works.
●   A work has a title, associated author(s),
    associated subject(s).
●   An item has a title, associated author(s),
    subject(s), format, publisher, etc. and is related
    to a work.
     Creating an Entity-Relationship
      Model from Business Rules
●   Create an entity relationship diagram from
    business specifications or narratives.
●   "A book has a publisher. Publishers publish
    many books."



        BOOK            publishes   PUBLISHER

        * title                     * name
        * ISBN          has a       * address
                                    0 URL
E-R Modelling Conventions (cont.)
Relationships
  defined by a solid connecting line, temporary
   relationships may be denoted by a dotted line
  end of line indicates degree of relationship (single line
   - one-to-one, crow's foot - one-to-many or many-to-
   many)
  it is possible for an entity to be related to itself
     e.g. MODS record can contain a related MODS record
  Each relationship has:
     a name, e.g. written by, published by, is about
     an optionality, e.g. mandatory, if available, optional
     a degree, e.g. one-to-one, one-to-many, many-to-many
CDWA Entity-Relationship Model




 Represents the relationships between a work and the elements describing a
 work.
 http://www.getty.edu/research/publications/electronic_publications/cdwa/
   Relationship Types (Cardinality)
One-to-one (rare)
   Have a degree of one and only one in both directions
  e.g. library director and library
Many-to-one or one-to-many (very common)
   have a degree of one or more in one direction and only
     one in the other
  e.g. users and a library
Many-to-many
   have a degree of one or more in both directions
   resolved with an intersection entity (join table)
   e.g. employees and skills
          Entity Relationship Models
●   Start from a description or narrative or a set of
    elements:
    −   "Each book has a publisher. A book can have many
        authors. Authors can write many books. Publishers
        can publish many books. Each book has a title.
        Each book can be translated into other language(s)
        and published in multiple formats."
●   What are the major entities?
●   What are the important relationships?
●   How would you draw this as an E-R model?
In Class Exercise: Modelling the Toy
         Metadata Schema
●   Develop a model for the Toy Metadata
    Schema.
●   Write a short paragraph describing common
    interactions a child may have with a toy.
●   Need to consider:
    −   groups of related toys
         ●   related by brand, by maker, by activity, etc.
    −   related people
    −   concepts and objects
    −   attributes of objects
●   Compare your results to our current elements
          Part 2: Conceptual Models
●   basic unit of management and exchange is the
    metadata record
●   metadata records may be defined based on:
    −   composition and relationship to other data
    −   types of records (e.g. descriptive, administrative or
        intellectual content)
    −   physical storage of records or presentation
    −   requirements of minimal and full records (Zheng
        and Qin p. 149)
          Metadata Requirements
●   conforms to community standards based on
    materials
●   supports interoperability
●   uses authority control and content standards for
    description and collocation
●   clear statement of conditions and terms of uses
    of digital objects
●   supports long term curation and preservation
●   metadata surrogate records are themselves
    objects (NISO Framework Advisory Group)
    Beyond Descriptive Cataloguing
●   metadata creators need to consider digital
    rights and preservation issues from the
    beginning of record creation just as archivists
    and museum curators do
●   metadata creators also need to consider the
    encoding methods used for the metadata:
    −   encoding format
    −   metadata embedded with item or separately
    −   interoperability/sharing
Framework for Shareable Metadata
●   Content: optimise content for sharing
●   Consistent: records within a set should be
    consistent both semantically and syntactically
●   Coherence: records should be self-explanatory
●   Context: information provided by records
    should have the appropriate context
●   Communication: providers and aggregators
    should maintain good communication
●   Conformance: conform to established
    standards (Shreeves, Riley and Milewicz 2006)
          Omission of Local Context
●   a common problem to librarians: in searching a
    special collection, certain terms are so common
    as to be useless:
    −   e.g. search for health or medicine in Pubmed
    −   e.g. picture of Theodore Roosevelt on a horse in a
        museum dedicated to him vs in a natural history
        museum (p. 150-1)
    −   e.g. assume use of LC subject headings
                 Local Context 2
●   context may be unnecessary or redundant in
    defining items in a special collection but
    necessary when sharing metadata (e.g. most
    Pubmed items are about medicine or health)
●   DCMI suggests use of the scheme attribute in
    XML to identify specific schemes such as
    controlled vocabularies or formats for
    interoperability
    −   e.g. dcterms.LCSH
    −   dcterms.W3CDTF
    Conceptual Models of Metadata
              Schemas
●   the basic unit of metadata is a statement
●   a statement consists of a property (element)
    and a value
●   metadata statements describe resources
    (Sutton 2007)

     DCMI Abstract Model
                                             property

                                              creator
        resource              statement
                                            Jane Q. Public
     Metadata for          creator = Jane
     Everybody             Q. Public           value
Description Sets, Descriptions and
           Statements
●   a description set is a set of one or more
    descriptions which describes a single resource
●   a description is made up of one or more
    statements
●   a statement consists of (instantiates) a
    property-value pair made up of a property and a
    value (p. 152)
                               Example
     DC one-to-one principle: create one metadata
     description for one and only one resource
     and group related descriptions (1,2) into a set

Metadata                            1           2      Jane Q. Public
Subject                                                      Name

                Book
                                                    Author
                                   written by                  Birthplace
                     Title
          Metadata for Everybody                             Hometown
               Levels of Granularity
●   item vs collection records, are we describing an
    item or a collection of items?
    ●   many metadata schemas provide support for both
        types of records (e.g. CDWA, EAD)
●   at the repository level, would a collection level
    record describing the entire repository be
    useful? (e.g. EAD header)
    ●   in a multi-use repository it may be necessary to
        describe entire collections as well as individual
        items (e.g. American Memory Project)
         Repository Level Records
●   collection registration: register the repository
    with search and retrieval tools
●   network discovery: provide information to
    network search agents about collection
    contents
●   user documentation: provide users of the
    collection with collection information
●   management: manage the collection/collections
    at a higher level
●   http://memory.loc.gov/cgi-bin/oai2_0?verb=Identify
           Resource Decomposition
●   some elements in a metadata schema are
    atomic (have little to no internal structure)
●   other elements have a great deal of internal
    structure
    −   e.g. a paper has an introduction, background or
        literature review, methodology, data collection
        and/or analysis, discussion, conclusion and
        references
    −   a digital archive of articles may find it useful to
        separate some elements such as abstract and
        references and encode them for search or citation
        analysis
                         Other Issues
●   Semantic Decomposition
    −   a combination of record level metadata and text
        level markup
         ●   e.g. DC record for a TEI encoded text
●   Degree of Granularity
    −   choice of the level of description
    −   use of structural metadata to describe a complex
        resource (e.g. individual pages of a scanned book)
    −   how much access is necessary/essential?
                  CIDOC CRM
●   "The CIDOC Conceptual Reference Model
    (CRM) provides definitions and a formal
    structure for describing the implicit and explicit
    concepts and relationships used in cultural
    heritage documentation."
●   common language for the formulation of
    requirements for metadata systems and a guide
    for good conceptual modelling
●   http://www.cidoc-crm.org/
CIDOC CRM General Model




                   Doerr 2003, 81
CIDOC CRM General Categories




                      Doerr 2003, 85
CIDOC CRM Properties




                 Doerr 2003, 86
        Methods of Metadata Creation
●   manual
    −   metadata created manually by a human
    −   the traditional library method
●   automatic
    −   metadata created by the system creating the item
        (e.g. digital camera)
    −   metadata generated automatically by analysis of
        the item
●   combination
          Manual Metadata Creation
●   cataloguing of pre-internet resources
    −   metadata created by professionals
●   internet metadata creation using metadata
    editors and web based metadata creation forms
●   metadata created by submitters/authors of
    material
    −   webpages may contain author created metadata
    −   article repositories may also contain author created
        metadata
        Automatic Metadata Creation
●   automatic creation at same time as object
●   metadata extraction
    −   uses layout and other document structures to select
        appropriate metadata via machine learning
●   natural language processing
    −   uses preexisting knowledge systems to select
        important elements of the document
●   challenges: automatic indexing/classification
    −   uses statistical learning methods
         Harvesting and Converting
                 Metadata
●   metadata can be harvested from other systems
    using a standard protocol such as the Open
    Archives Initiative - Protocol for Metadata
    Harvesting (OAI-PMH)
●   metadata can also be automatically or manually
    converted from one format to another using a
    crosswalk
●   in either case, it is usually necessary to "clean"
    the data to check for conversion errors
    −   e.g. due to elements which do not match between
        schemes
Part 3: Administrative and Structural
             Metadata
                Archival Metadata
●   EAD (Encoded Archival Description)
    −   encodes archival finding aids which describe an
        archival collection
    −   e.g. http://hdl.loc.gov/loc.mss/eadmss.ms008069
                            EAD
●   XML based encoding for archival finding aids
●   archival material may be stored in boxes or
    folders and is organised by series, box, folder,
    etc.
●   EAD is hierarchical since archival data is
    hierarchical
    −   e.g. personal papers may include manuscripts,
        financial records, correspondence, journals, etc.
●   contains description of provenance and access
    rights
●   includes information about metadata creator
                Some EAD Entities
●   <eadheader>
    −   wrapper element for metadata about the metadata,
        including the metadata creator
●   <archdesc>
    −   wrapper element for the archival description itself
    −   describes the content, context and extent of the
        archival materials
●   <frontmatter>
    −   information about creation, publication and use of
        the finding aid
                Digital Preservation
●   digital records introduce new issues in
    preservation
    −   need to preserve software/hardware required to
        read/display digital files or upgrade files to new
        formats
    −   digital information growing exponentially
●   3 major areas:
    −   long term maintenance of digital files
    −   ongoing accessibility of contents
    −   ensure viability of data contents/organisation
                   Structural Metadata
●   metadata does not just describe the item
●   also describes:
    −   how the digital object was created
         ●   e.g. digitised, born digital, etc.
    −   structural aspects
         ●   e.g. table of contents, links between scanned pages of an
             item, etc.
         ●   often neglected in large scale projects as this is time
             consuming, but extremely important to long term
             preservation as it maintains the linkages between items
             and parts of items
             Preservation Metadata
●   includes: descriptive, administrative (e.g.
    rights), technical and structural metadata
●   information necessary to maintain existing links
    between electronic files that are related:
    −   e.g. individual pages from a scanned document
    −   volumes of a series
    −   links between a digital scan and the object
   Reference Model for an Open
Archival Information System (OAIS)
●   common framework of terms and concepts for
    recording preservation metadata
    −   what to record and how to store it
●   defines functional areas
    −   storage, access, preservation planning, etc.
●   defines interfaces between functional areas
●   defines classes for use in archiving
●   http://public.ccsds.org/publications/archive/650x0b1.pdf
                     OAIS Model
●   taxonomy of information objects and packages
    for archived objects and the structure of their
    associated metadata (p. 61)
●   preservation issues:
    −   media migration, compression, format conversions,
        access service preservation
●   major categories:
    −   Content Information, Representation Information,
        Preservation Description Information, Provenance
        and Fixity Information
            Preservation Metadata:
           Implementation Strategies
                  (PREMIS)
●   an implementation of the OAIS model
●   records information that supports and documents the
    preservation process
●   five major categories:
    −   object: information about the item
    −   intellectual entity: information about the work
    −   event: actions associated with preservation
    −   agent: person or organization related to the preservation
        events
    −   rights: rights pertaining to the object
       Metadata Encoding and
    Transmission Standard (METS)
●   XML schema designed to store and transmit all
    metadata associated with an item
●   a framework for storing metadata in different
    formats
●   http://www.loc.gov/standards/mets/
             Rights Metadata Elements
●   What elements are               ●   Rights elements:
    rights metadata                     −   Author/creator
    elements?
                                        −   Nationality of creator
    −   Element must provide            −   Date of birth and/or
        information about:                  death
         ●   the creator
                                        −   Title
         ●   temporal or spatial
             location of creation       −   Date created/modified
         ●   creation date, etc.        −   Publication status
                                        −   Date of rights
                                            research
                           copyrightMD
 ●    California Digital Library (CDL) Rights
      Management Group developed a list of rights
      metadata from existing schemas (Zeng & Qin)
Standard       Category        Element                              Subelement
DCMES (DC)                     rights                               accessRights
                                                                    license
DCTERMS                        rightsHolder
CDWA Lite                      Rights for Work
                               Rights for Resource
LOM            6. Rights       6.1 Cost
                               6.2 Copyright & other restrictions
VRA Core 4.0                   Rights                               rightsHolder text
PBCore         18.00 pbcore-   18.01 rightsSummary
               rightsSummary
MODS                           accessCondition
    Digital Rights Management (DRM)
●   designed to replicate the difficulties of copying print
    materials and enforce control on access and use of
    materials
●   designed to enforce use restrictions set by rights
    holders on materials
●   designed to track rights digitally and thus more
    efficiently
●   consequences include:
     −   limits on use of materials even where such uses would
         qualify as fair use
     −   limits on which devices can be used to access and use
         materials
     −   limits on tools which alter or remove DRM
                       <indecs>
●   interoperability of data in e-commerce systems
●   a metadata framework for storing and sharing rights
    metadata in e-commerce with nearly 400 elements
●   XML based
●   consists of:
    −   metadata model
    −   high level metadata dictionary
    −   principles for mapping to other schemas
    −   directory of relevant parties
     well formed <indecs> metadata
●   principles:
     −   unique identification
     −   functional granularity
     −   designated authority
     −   appropriate access
●   views entities as related to: commerce,
    intellectual property or general issues
●   http://xml.coverpages.org/indecs2rdd.html
●   http://www.doi.org/topics/indecs-rdd-white-paper-may02.pdf
        DOI (Digital Object Identifier)
●   an initiative to create a unique, persistent digital
    identifier for content objects
●   the DOI points to the location of an item on the
    web (the URL) but this location can be changed
    without changing the DOI
●   e.g. DOIs for journal articles:
    −   10.1007/s11192-007-1707-y
    −   10.1145/1255175.1255284
●   http://www.doi.org/
         DOI Model and Elements
●   14 high level elements
●   e.g. DOI, DOIGenre, Identifier, Title, Descriptor,
    Type, Origination, Mode, Form, Extent,
    Context, Subject, Event, CreationLink
●   described in Paskin and Rust 1999:
    http://www.doi.org/P2VER3.PDF
                            Parts of a DOI
●    DOI has two parts: prefix and suffix
●    prefix has two parts: directory (where DOI is
     registered) and registrant prefix (publisher, etc)
●    suffix: optional code ID to identify a known
     identification scheme and an ID number




    from http://quod.lib.umich.edu/cgi/t/text/text-idx?
    c=jep;view=text;rgn=main;idno=3336451.0003.204
        ONIX (ONline Information
              eXchange)
●   developed by EDItEUR (www.editeur.org) an
    international group for e-commerce in books
    and serials
●   XML based standard for storing marketing and
    distribution information for publishers and
    bookstores
●   standards for books and serials
          ONIX Model and Elements
●   ONIX for Books:
    http://www.editeur.org/15/Previous-Releases/
●   Example records:
    −   http://roytennant.com/proto/onix/
●   ONIX to MARC crosswalk:
    http://www.loc.gov/marc/onix2marc.html
●   Translating between ONIX and MARC:
    http://journal.code4lib.org/articles/54
                                  ONIX Examples
●    <ProductIdentifier>                                         ●     <productidentifier>
●    <ProductIDType>02</ProductIDType>                           ●     <b221>02</b221>
●    <IDValue>0816016356</IDValue>                               ●     <b244>0816016356</b244>
●    </ProductIdentifier>
●    <ProductForm>BB</ProductForm>
                                                                 ●     </productidentifier>
●    <Title>
                                                                 ●     <b012>BB</b012>
●    <TitleType>01</TitleType>                                   ●     <title>
●    <TitleText textcase = “02”>British English,                 ●     <b202>01</b202>
     A to Zed</TitleText>                                        ●     <b203 textcase = “02”>British English,
●    </Title>                                                          A to Zed</b203>
●    <Contributor>                                               ●     </title>
●    <SequenceNumber>1</SequenceNumber>                          ●     <contributor>
●    <ContributorRole>A01</ContributorRole>
●    <PersonNameInverted>Schur, Norman
                                                                 ●     <b035>A01</b035>
     W</PersonNameInverted>
                                                                 ●     <b037>Schur, Norman W</b037>
●    <BiographicalNote>A Harvard graduate in                     ●     <b044>A Harvard graduate in Latin
     Latin and Italian literature, Norman Schur                        and Italian literature, Norman Schur
     attended the University of Rome and the                           attended the University of Rome and
     Sorbonne before returning to the United                           the Sorbonne before returning to the
     States to study law at Harvard and                                United States to study law at Harvard
     Columbia Law Schools. Now retired from
     legal practise, Mr. Schur is a fluent speaker                     and Columbia Law Schools. Now
     and writer of both British and American                           retired from legal practise, Mr. Schur is
     English</BiographicalNote>                                        a fluent speaker and writer of both
●    </Contributor>                                                    British and American English </b044>
                                                                 ●     </contributor>
    http://www.niso.org/standards/resources/Metadata_Demystified.pdf
     Open Digital Rights Language
               (ODRL)
●   an attempt to develop an open standard for
    policy expressions (including rights)
●   extensible model, intended to be encoded in
    XML
●   core entities: assets, rights and parties and
    relationships between them
●   model has been incorporated into other
    metadata schemas since it is an open standard
●   http://www.w3.org/community/odrl/
ODRL Model
         ●   this model uses an
             entity-relationship
             model to describe
             the relationships
             between entities
         ●   suggests XML
             elements
             http://www.w3.org/community/odrl/two/xml/
    Discussion: Metadata and Rights:
      Google Books and HathiTrust
●   HathiTrust and Google Books Settlements
    −   http://www.michigandaily.com/news/10-hathitrust-ruling-11
    −   http://newsbreaks.infotoday.com/NewsBreaks/HathiTrust-Lawsuit-
        Decision-Reaffirms-Libraries-in-the-Digital-Age-85546.asp
    −   http://en.wikipedia.org/wiki/Google_Book_Search_Settlement_Agreement

    −   http://www.googlebooksettlement.com/
    −   http://money.cnn.com/2012/10/04/technology/google-books-settlement/
●   Google Books Metadata Issues
    −   http://languagelog.ldc.upenn.edu/nll/?p=1701
    −   http://chronicle.com/article/Googles-Book-Search-A/48245/
    −   http://www.insidehighered.com/news/2011/12/08/scholar-continues-find-flawed-metadata-google-books

    −   http://www.salon.com/2010/09/09/google_books/

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/11/2013
language:Unknown
pages:62