Release of Statistical Data and Metadata Exchange (SDMX) Standards, Version 1.0
Arofan Gregory Aeon LLC
Outline
• • • • Background Business Scope and Requirements Technical Design Approach SDMX Version 2.0 and Beyond
Background
• SDMX is a joint initiative of seven international and regional organizations: – Bank for International Settlements (BIS) – European Central Bank (ECB) – European Statistical Commission (EUROSTAT) – International Monetary Fund (IMF) – Organization for Economic Cooperation and Development (OECD) – United Nations (UN) – The World Bank • Initial meeting in June, 2002 • Version 1.0 Standards Released September 30, 2004 • Will be put forward to ISO for international standardization
Goal
“To explore common e-standards and ongoing standardization activities that could allow us to gain efficiency and avoid duplication of effort in our own work and possibly for the work of others in the field of statistical information.”
Initial Projects
• Case Study – Examination of XML, registry, and web services technologies • Batch Data Exchange – EDIFACT and XML formats for exchange of large databases • Metadata Common Vocabulary – Harmonized definitions of common terms and statistical concepts • Metadata Repositories – Metadata reporting and resources on the web
Clarification
• “Statistical Information” refers primarily to aggregated statistical data and metadata:
– Not survey results – Not quality assurance test results
• In future, this may be expanded to include “microdata”/”raw data”
– Not currently in scope
Examples
• • • • • • • • Financial and economic data Trade data Development data Health data Education Environmental Population And so on…
Statistical Information Flows
Internet, Search, Navigation
International Organisations accounts Regional Organisations statistics
www.z.org www.hub.org
180 + Countries
National Statistical Organisations
accounts statistics
www.y.org
Banks, Corporates Individual Households
transactions accounts
www.x.org
Challenges
• • • • • • • Duplication of data Multiplicity of formats Timeliness of data reporting Quality of reported data “Pushing” data is inefficient Inconsistent or missing metadata Lack of semantic agreement
Solutions
• Step One: Standardize and information model and formats • Step Two: Standardize architecture, services, and metadata
SDMX v 1.0 Package
• • • • • • Framework Document Information Model (UML Conceptual Design) SDMX-ML SDMX-EDI Implementor’s Guide for Format Standards Web Services Guidelines
• 200+ pages of comments from public review in summer of 2004
Overall Technical Approach
• Model-driven
– The SDMX information model is a metamodel – All formats are derived from the information model, and are equivalent
• Requirements-driven
– Different formats created for different use cases – Different but consistent formats for each domain
Meta-Models
• Each domain uses one or more models for their data • The data is just tables of numbers
– Common structure
• Each model describes how metadata is attached to that domain’s structure • A meta-model describes how the domain models can be described
Key Families
• The information model describes how metadata is attached to multi-dimensional “cubes” of data
– The structural description is termed a “key family” – Each axis has an associated concept and representation (“dimensions”) – Additional metadata can be attached and represented at different levels
Country
Stock/Flow Unit Multiplier Unit
Key Family Example
Time/Frequency
Topic
SDMX Formats
• SDMX-EDI
– EDIFACT format for describing key families, codelists, and concepts – EDIFACT format for generically describing data
• SDMX-ML
– XML schema for key families, concepts, codelists – XML schema for generically describing data – XML schema for shared constructs – XML schema for common administrative data – XML schema for data and metadata queries PLUS…
Key-Family Specific SDMX-ML Formats
• Utility Schemas (“typical” XML schemas for validation and guided tools) • Compact Data Schemas (Large databases, full and partial datasets, incremental updates) • Cross-Sectional Schemas (non-timeseries data) • Each meets a different use case
Key-Family-Specific Schemas
• Each domain model (“key family”) is mapped to a namespace which is owned by the creator of the key family • Mappings are made from a standard XML expression of the model in a standard fashion
– If you can process the key family XML, you can predict exactly what each derived schema will look like – If you can predict what each schema looks like, you can generate a lot of the code needed to process it
Key Family X Compact Data XML Schema For X Compact Data XML Instance of X
(structures) (equivalent)
Utility Data XML Schema For X
(structures) (equivalent)
Utility Data XML Instance of X
Key Family In Generic Structure XML
Cross-Sectional (structures) Data XML Schema For X
Cross-Sectional Data XML Instance of X
Data in Generic Data XML
(equivalent)
An Example
KEY FAMILY:
UTILITY INSTANCE:
Q …
COMPACT INSTANCE:
Other Info About SDMX-ML
• “Venetian Blind” style generally used
– Type-rich schemas – A pinch of “Garden of Eden”
• OASIS UBL Naming and Design Rules
– Not slavishly followed, but used in most cases – XML Namespaces used to “package” schema modules – We *did* use substitution groups, though…
• Emphasis on simplicity:
– “As simple as possible and no simpler” by use case
Early Adopters - examples
• • • • Federal Reserve (many financial data sets) UN/TRADECOM (commodity trade data) NAWWE (national accounts data) External Debt Joint Hub (external debt)
Web Services Guidelines
• Suggested set of services for:
– Obtaining metadata – Obtaining data
• Advocates use of WS-Interoperability profiles for:
– SOAP – WSDL
• Will be expanded in version 2.0
“Starter” Toolkit (v 1.0)
• Simple freeware tools:
– – – –
– – – –
Key family creation and management SDMX-ML SDMX-EDI transforms Key Family Standard schema transforms Transforms between different types of XML for a single key family Data publishing tools (to HTML, CSV) Data validation tools Data creation tools Conformance testing tools
Version 2.0 and Beyond
• SDMX “Content” Standards
– SDMX Core Statistical Concepts: A set of universal concepts and rules for their use in key families (eg, Frequency, Reference Country) plus a system for describing domain “core” concept sets – Metadata Common Vocabulary: Harmonized definitions of terms and concepts – SDMX Core Statistical Subject-Matter Domains: A harmonized categorization of all statistical domains
• These will be published and maintained by SDMX, not put forward to ISO
Version 2.0 and Beyond (cont.)
• SDMX Registry Services: Standard services interfaces for registration, navigation, and querying of SDMX registries • SDMX Web Services: Specifications for creating interoperable web services using SDMX standards • SDMX “Pure” Metadata Reporting: Formats for metadata reporting independent of data reporting flows • Enhanced formats for existing data and metadata formats • Will also have a starter toolkit, including a registry implementation based on FreebXML Registry/Repository
SDMX Reference Implementation
Get creditor data Get debtor data
Creditor Data
Debtor Database
Joint External Debt Hub SDMX Registry
Creditor Data
Creditor Data
Creditor Data
Target Timeline
• SDMX version 1.0 standards available now • Toolkit for v 1.0 over next 6 months • Version 2.0 standards Q2/3 of 2005 • Version 2.0 toolkit Q3/4 of 2005
Summary
• • • • Increased access to data – more usable Increased efficiency in processing Greater transparency through metadata Reduce reporting errors, higher quality
• Version 2.0:
– Process efficiency gain (“pull”not “push”) – Greater visibility through registry
For More Information
• SDMX website: http://www.sdmx.org
– Sign-up for e-alerts on site – Join contact group for public reviews
• Questions: stuart.feder@bis.org, agregory@aeon-llc.com