Metadata Registries Workshop by malj

VIEWS: 7 PAGES: 21

									Metadata Registries Workshop

U.S. Bureau of Labor Statistics Conference Center

April 15-17, 1998

SPONSORS


National Committee for Information Technology Standards (NCITS) L8, Data Representation
U.S. Environmental Protection Agency U.S. Census Bureau U.S. Bureau of Labor Statistics U.S. Department of Transportation Intelligent Transportation Systems Joint Program Office U.S. Department of Defense - Health System, Health Data Administration Program

   





National Institute of Standards and Technology

ORGANIZERS
       

Bruce Bargmeyer - U.S. Environmental Protection Agency Cathryn Dippo - U.S. Bureau of Labor Statistics Daniel Gillman - U.S. Census Bureau William P. LaPlant, Jr. - U.S. Census Bureau Douglas Mann - Battelle Memorial Institute

Judith Newton - National Institute of Standards and Technology Phong Ngo - SAIC CDR. Robert W. Mayes, R.N. - Health Care Financing Administration (HCFA)
Burton Parker - Paladin Integration Engineering Andrew M. Shoka - MITRETEK Systems

 

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

Share knowledge and experience


Focus on metadata registration standards


IO

EN

Workshop Goals

VIRONM

E



ISO/IEC 11179, Specification and Standardization of Data Elements DpANS X3.285, Metamodel for the Management of Sharable Data



Discuss implementations based on these standards

SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

Facilitate collaborative efforts Metadata Registry Development  Metadata exchange between registries  Standardize Content


 

IO

EN

Workshop Goals

VIRONM

E

Traditional data Terminology Unify text and data XML, RDF Schema, XML - Data (Content model?)



Next generation registry standards


SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

S

N

TA T L PROTEC

IO

NA GENCY

EN

VIRONM

E

SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

S

N

TA T L PROTEC

Standards for Data Administration Data Element Definitions
ISO/IEC 11179, Part 4

Bruce Bargmeyer U.S. Environmental Protection Agency Tel: (202) 260-5306 Internet: bargmeyer.bruce@epa.gov WWW: http://sdct-sunsrv1.ncsl.nist.gov/~bargmeye
SDC-0055-057-JE-7031

IO

NA GENCY

EN

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

IO

EN

Challenges

VIRONM

E



Data element definitions and descriptions are not sufficient to support reuse or multiple users of data
Finding one standard data element among thousands is difficult or impossible without classification schemes, thesaurus structures and other reference guides





Need to focus data standardization on the definition and domain values rather than names
EPA Information and Data Management

SDC-0055-057-JE-7031

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

Definitions can be:  Stipulative  Precising  Persuasive  Intensional, Extensional, Lexical, ...

A type of definition for data elements:
A word or phrase expressing the essential nature of a person or thing or class of person or things: an answer to the question “what is x?” or “what is an x?”...(Webster’s Third New International
Dictionary Unabridged, 1986)
SDC-0055-057-JE-7031

IO

EN

Types of Definitions

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

A data definition shall:  Be unique (within a data dictionary)  Be stated in the singular  State what the concept is, rather than what it is not  Be stated as a descriptive phrase or sentence(s)  Contain only commonly understood abbreviations  Be expressed without embedding definitions of other data elements or underlying concepts
SDC-0055-057-JE-7031

IO

EN

Data Definition Rules

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

A data definition should:  State the essential meaning of the concept  Be precise and unambiguous  Be concise  Be able to stand alone  Be expressed without embedding rationale, functional usage, domain information or procedural information  Avoid circular reasoning  Use consistent terminology and structure for related definitions
SDC-0055-057-JE-7031

IO

EN

Data Definition Guidelines

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

ISO 11179, Part 4 - Rules and Guidelines for the Formulation of Data Definitions Passed International Standard Ballot in 1994  Published as International Standard 1995


IO

EN

Status

VIRONM

E

SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

There is useful information that is not included in the definition.  Purpose of collection  Statistical method of collection  Data values (domain), usage, ….

DpANS X3.285 extends data attribution to include some of the useful information left out of a definition.  Basic attributes  Extensible set of attributes
SDC-0055-057-JE-7031

IO

EN

Epilog

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

N

TA T L PROTEC

CASE Tools and Metadata Registries

S

Many CASE tools do not have a place to store the definition as a separate attribute.


IO

NA GENCY

EN

VIRONM

E

“Description” can be a jumble of things

We are working to include the X3.285 metamodel into the designs of CASE Tools and Registries.

SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

S

N

TA T L PROTEC

Standards for Data Administration Data Element Classification
ISO/IEC 11179, Part 2
Bruce Bargmeyer U.S. Environmental Protection Agency Tel: (202) 260-5306 Internet: bargmeyer.bruce@epa.gov WWW: http://sdct-sunsrv1.ncsl.nist.gov/~bargmeye
SDC-0055-057-JE-7031

IO

NA GENCY

EN

VIRONM

E

EPA Information and Data Management

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

IO

EN

Data Elements-Fundamentals
Object Class

VIRONM

E

Property

Representation

Data Element Concept

Data Element

Value Domain

Core Data Element
SDC-0055-057-JE-7031

Application Data Element
EPA Information and Data Management

S ITED TATE UN

N

TA T L PROTEC

Utility of Data Element Classification

S

IO











NA GENCY

EN

VIRONM

E

Helps to locate one data element among many (thousands)

Helps to design similar data elements in uniform manner Helps to resolve synonym and homonym problems
Provides context not possible to put into a definition

Provides definitions for words found in data element definitions and names
EPA Information and Data Management

SDC-0055-057-JE-7031

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

What forms can classification take?



IO

EN

Classification Structures

VIRONM

E

Keywords
Controlled word lists

 


Terms from models Thesaurus
Taxonomy



Ontology
 

Acyclic directed graph, lattice Multiple inheritance
EPA Information and Data Management

SDC-0055-057-JE-7031

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

IO

EN

Schemes

VIRONM

E




Library of Congress keywords General European Multilingual Environmental Thesaurus (GEMET) Integrated Taxonomic Information System (ITIS) - biological Bill Kenworthey’s taxonomy of common abstract unit nouns





SDC-0055-057-JE-7031

EPA Information and Data Management

S ITED TATE UN

N

TA T L PROTEC

Classification Fundamental Notions

S

Each node in a classification structure is a taxon (plural: taxa).


IO

NA GENCY

EN

VIRONM

E

Given a classification structure, any taxa relating to a data element can be recorded
The taxa can be recorded in a separate “classification” attribute





With adequate software, users could access and navigate the classification structure A nonintelligent identifier for each taxon helps to deal with change
EPA Information and Data Management



SDC-0055-057-JE-7031

S ITED TATE UN

S

NA GENCY

N

TA T L PROTEC

ANSI & ISO


IO

EN

Status
Final committee draft is out for JTC1 ballot
Concept is evolving
  

VIRONM

E

Continuing R&D


Search engines Middleware - agents, mediators, request brokers XML tags



Relationship to terminology management

SDC-0055-057-JE-7031

EPA Information and Data Management


								
To top