3rd International Digital Curation Conference Washington, DC

Document Sample
3rd International Digital Curation Conference Washington, DC Powered By Docstoc
					             3rd International Digital Curation Conference
                      Washington, DC, Dec 2007

       Paper Presentations: Interoperability, Metadata & Standards
     Data Documentation Initiative:
Toward a Standard for the Social Sciences

                   Mary Vardigan, Pascal Heus, Wendy Thomas
ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center
    vardigan@umich.edu / pheus@opendatafoundation.org / wlt@pop.umn.edu
What is Metadata?

• Common definition: Data about Data

   Unlabeled stuff                                          Labeled stuff

                                                         The bean example is taken from: A Manager’s
                                                         Introduction to Adobe eXtensible Metadata Platform,

                     DDI Alliance – http://www.ddialliance.org
Managing data and metadata is
                         We are in charge of the
                         We want easy access
                              need support our
                         data. We to collect the
                          We quality and
                         to highhave an well
                         information from the to
                         users but also need
                         documented data! it,
                         producers, preserve
                         protect our
                         and provide access to
                         our users!                                            Academic
       Producers                                                       Users



       Policy Makers

                       General Public                   Media/Press

                           DDI Alliance – http://www.ddialliance.org
Metadata issues

• Without producer / archive metadata
   – researchers can’t work discover data or perform efficient
• Without researcher metadata
   – Research process is not documented and cannot be
     reproduced (Gary King  replication standard!)
   – Other researchers are not aware of what has been done
     (duplication / lack of visibility)
   – Producer don’t know about data usage and quality issues
• Without standards
   – Such information can’t be properly managed and
     exchanged between actors or with the public
• Without tools:
   – We can’t capture, preserve or share knowledge

                      DDI Alliance – http://www.ddialliance.org
XML to the rescue!

• XML stands for eXtensible Markup Language
• Technology that is driving today’s web service
  oriented architecture of the Internet and Intranets
• Using XML, we can capture, structure, transform,
  discover, exchange, query, edit and secure
  metadata and data
• XML is platform & language independent and can
  be used by everyone
• XML is both machine and human readable
• XML is non-proprietary, public domain and many
  open tools exist
• Domain specific standards are available!

                   DDI Alliance – http://www.ddialliance.org
Suggested XML metadata specifications for
socio-economic data
• Statistical Data and Metadata Exchange (SDMX)
   – Macrodata, time series, indicators, registries
   – http://www.sdmx.org
• Data Documentation Initiative (DDI)
   – Microdata (surveys, studies)
   – http://www.ddialliance.org
• ISO 11179
   – Semantic modeling, concepts, registries
   – http://metadata-standards.org/11179/
• ISO 19115
   – Geography
   – http://www.isotc211.org/
• Dublin Core
   – Resources (documentation, images, multimedia)
   – http://www.dublincore.org

                      DDI Alliance – http://www.ddialliance.org
The Data Documentation Initiative (DDI)
• International XML based specification for the
  documentation of social and behavioral data
   – Started in 1995, now driven by DDI Alliance (30+ members)
   – Became XML specification in 2000 (v1.0)
   – Current version is 2.1 with focus on archiving
• New Version 3.0 (2008)
   – Focus on entire survey “Life Cycle”
   – Provide comprehensive metadata on the entire survey
     process and usage
   – Aligned on other metadata standards (DC, MARC, ISO
     11179, SDMX, …)
   – Include machine actionable elements to facilitate
     processing, discovery and analysis
• DDI is being adopted by producers/archives but
  needs to extends to the researchers (who are using
  the data!)

                     DDI Alliance – http://www.ddialliance.org
DDI 3.0 and the Survey Life Cycle

 •   A survey is not a static process: It dynamically evolved across time and
     involves many agencies/individuals
 •   DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”
 •   3.0 focus on metadata reuse (minimizes redundancies/discrepancies,
     support comparison)
 •   Also supports multilingual, grouping, geography, and others
 •   3.0 is extensible

                          DDI Alliance – http://www.ddialliance.org
Metadata Components

• Producer metadata:
  – Codebook, questionnaires, reports,
    methodologies, processing, scripts, quality,
    admin, etc.
• Research metadata
  – Recodes, analysis, table, scripts, papers, logs,
    data quality, usage
  – Citations, references
  – Activities, discussions, knowledge base
• Outputs
  – Papers, presentations, tables, reports

                  DDI Alliance – http://www.ddialliance.org
When to capture metadata?

• Metadata must be captured at the time the event
  occurs! (not after the facts)
• Documenting after the facts leads to considerable
  loss of information
• This is true for producers and researchers
                  DDI Alliance – http://www.ddialliance.org

• Simple solutions: use good practices
   – File and variable naming conventions, sound
     statistical methods (metadata in names!)
   – Comment source code
   – Document your work
• Adopt DDI & other standard based metadata
   – DDI tools, citation database, source code level
     metadata capture, variable recodes, table
     disclosure, data quality feedback, comparability
• Take advantage of web based collaborative
   – Wiki, blogs, discussion groups, lists

                   DDI Alliance – http://www.ddialliance.org

• Comprehensive data documentation
   – Through good metadata practices, comprehensive
     documentation captured by producers, librarians and users
     is available to ALL researchers
• Preservation, integration and sharing of knowledge
   – Research process is captured and preserved in standard
   – Research knowledge becomes integrant part of the survey
     and available to all
   – Reduce duplication of efforts and facilitates reuse
   – Producer gets feedback from the data users (usage, quality
     issues), which lead to better and more relevant data
• Research outputs and dissemination
   – Facilitate production of research outputs
   – Facilitate dissemination and fosters broader visibility of
     research results

                       DDI Alliance – http://www.ddialliance.org

• Metadata is a crucial component of social and
  behavioral science
• The Data Documentation Initiative (DDI) is a globally
  accepted specification for capturing microdata
  documentation and knowledge
• Latest version 3.0 extends into the entire survey Life
• Producers and data archives are rapidly adopting
  metadata standards.
• This adoption process should extend into the
  research community
• Best practices in data and metadata management
  benefit all users and have the potential to change
  the way we conduct research
• http://www.ddialliance.org or ddi@ddialliance.org
                   DDI Alliance – http://www.ddialliance.org

Shared By: