Corporate Data Model CSO - Ireland - CROS-portal

Document Sample
Corporate Data Model CSO - Ireland - CROS-portal Powered By Docstoc
					ESSnet on microdata linking
and data warehousing
in statistical production

Best practice case:

Comparing the implementations of
the Irish CDM and the Dutch DSC


Harry Goossens – Statistics Netherlands
Head Data Service Centre / ESSnet Coordinator
hct.goossens@cbs.nl
The CSO Corporate Data Model (CDM)

Underlying principle:         4 datastores

•       INPUT                 -    raw data
•       CLEAN UNIT            -    cleaned data
•       AGGREGATE             -    aggregated data
•       DISSIMINATION         -    published data



        à CDM was seen as ≈ active DWH




    ESSnet Data Warehousing
                                                     2
The CSO Corporate Data Model (CDM)

Main characteristics:

§       All (statistical) processes must use the 4 datastores
§       Processing systems interact on the data stores
§       At some moments: snap shots,
        which build next data store
§       It is possible to work further on the same
        (snap shotted) data store
§       Simultanious updating of / on data is mainly
        organisational issue




    ESSnet Data Warehousing
                                                                3
The CSO Corporate Data Model

                               CLEANED         AGGREGATE
                    INPUT      DATASETS         DATASETS   DISSEMINATION




                                     Surveys
          DATA
     MANAGEMENT
          STORE                   2 OPERATIONAL
                                 IMPLEMENTATIONS




   ADMINISTRATIVE
    DATA CENTRE
                                     Admin data




     ESSnet Data Warehousing
                                                                     4
Data Management Store (DMS)

§       First implementation of CDM
§       Only survey data
§       Data tables are created and populated through
        the DMS applications.
§       Metadata must be entered as the data tables
        are created.
§       Metadata capturing = minimal
        à bottleneck
§       BR outside DMS (stand alone)




    ESSnet Data Warehousing
                                                        5
CDM – Data Management Store

                                          CLEANED         AGGREGATE
                           INPUT          DATASETS         DATASETS       DISSEMINATION



                  Mainly surveys



                             APP – layer, incl. I/O interfaces

                                                                              SYS 1
                                          SNAPSHOTS

D
                                                                      B       SYS 2
    COLLECTION
     ACTIVITIES




M
       DATA




                       SHARED
                        INPUT
                                          SHARED
                                       CLEANED UNIT
                                                         AGGREGATE
                                                           STORE
                                                                      I
S
                                                                              SYS n


                         DMS meta layer – Basic descriptions


     ESSnet Data Warehousing
                                                                                      6
    Administrative Data Centre (ADC)

§      Developed for organisational reasons
§      Only Admin data
§      A catalyst to exploit administrative data for
       statistical purposes
§      Interface with public authorities on admin data
       flows to CSO
§      Clearing house inside CSO for admin data
§      Data governance with respect to admin data




    ESSnet Data Warehousing
                                                         7
    Administrative Data Centre (ADC)

§      Has an analysis layer
§      R&D on available data
§      To develop new datasets
§      Without specific needs / demands from
       statistics




    ESSnet Data Warehousing
                                               8
CDM – Administrative Data Centre

                                         CLEANED        AGGREGATE
                             INPUT       DATASETS        DATASETS       DISSEMINATION



                   Only Admin Data



                                     ADC meta layer

                                                                            SYS 1

A                                                           ADC
     COLLECTION




                                                                    B
      ACTIVITIES




                                     E                                      SYS 2
D
        DATA




                                     T       Data
                      SOURCES                               Front   I
                                           Products
C                                    L                      Door
                                                                            SYS n

                                           LEAN INTERFACE



     ESSnet Data Warehousing
                                                                                    9
Corporate Data Model CSO - Ireland

                                       CLEANED        AGGREGATE
                     INPUT             DATASETS        DATASETS        DISSEMINATION


                        APP – layer, incl. I/O interfaces
                                       SNAPSHOTS
D
    COLLECTION
     ACTIVITIES
       DATA




                   SHARED               SHARED       AGGREGATE
M                   INPUT            CLEANED UNIT      STORE
                                                                           SYS 1
S
                      DMS meta layer – Basic descriptions          B       SYS 2

                                   ADC meta layer                  I
A                                                          ADC
    COLLECTION




                                                                           SYS n
     ACTIVITIES




                               E            Data
D
       DATA




                  SOURCES      T          Products
                                                           Front
C                              L                           Door

     ESSnet Data Warehousing              LEAN INTERFACE
                                                                               10
The CBS Data Service Centre (DSC)

The concept:

§       No data without metadata
§       Dedicated metadata model as basis
§       Strict distinction between:
           Ø    Statistical data (facts & figures)
           Ø    Conceptual metadata (definitions, description of
                quality,process activities etc.)
§       Steady states explicitly designed for re-use.
§       All metadata (of steady states) are generally accessible
        and are standardised as much as possible



    ESSnet Data Warehousing
                                                                   11
The CBS Data Service Centre (DSC)


What is it ?

§       Fundamental corner stone of the CBS
        Business Architecture
§       Central ‘vault’ with Steady States, linking:
           Ø    statistical data (facts & figures)
           Ø    conceptual metadata (description)
           Ø    technical metadata (user’s guide)\
           Ø    Documentation
§       Implementation of the Dutch metadata model


    ESSnet Data Warehousing
                                                       12
The CBS Data Service Centre (DSC)


What offers it ?

Generic services:
    Ø    Metadata coordination
    Ø    Centralised data distribution
    Ø    Authorisation management
    Ø    Automatic process interfacing (in developement)
    Ø    Archiving of statistical dataset




 ESSnet Data Warehousing
                                                      13
The CBS Data Service Centre (DSC)

Why do we do it ?
§       Data-sharing / re-using data
        Intermediary, archive and distribution, CBS data-vault. Maximum
        efficient use of data en metadata
§       Process guarantee / security
        Safety net in case of calamity, static ‘froozen’ data
§       Process standardization
        Transparancy & efficiency
§       Coordination of metadata & classificaties
        One, single source with elements for the statistical process
§       Process chain support
        Steady States as data hubs
§       Generic process for data linking
        DSC structure enables linking datasets with equal object type



    ESSnet Data Warehousing
                                                                          14
  CBS Business Architecture: Layers


Strategy

                      DSC – Metadata Catalogue
 Design



  Chain
management




 Statistics
Production



  Steady
                              DSC - Data Storage
  States


    ESSnet Data Warehousing
                                                   15
CBS Business Architecture: Steady States




ESSnet Data Warehousing
                                           16
 DSC: What are Steady States ?

 §      A steady state is a dataset together with information
        for its correct interpretation.
 §      Rectangular
          Ø Rows represent units (micro) or classes of units (macro)
          Ø Columns represent variables
 §      Heading: population, time
 §      Dataset design is like a template of a table:
        only borders and heading
 §      1 Dataset design, n Datasets




     ESSnet Data Warehousing
Data Service Centre - DSC
                                                                 17
DSC: Why Steady States ?

§      Reduce storage:
        Ø Store once
        Ø Re-use many times
§      Secure the statistical proces:
        Ø Each steady state is a guaranteed fall back
           point
§      Improve consistency:
        Ø Every following process uses the same dataset
§      Improve flexibility:
        Ø Enables independent, generic proces design




    ESSnet Data Warehousing
                                                          18
Conclusions

Both CSO & CBS
§   Use the same basic principle of 4 (static) stages/bases
§   had the same 'drivers' to start DWH:
    - re-use of data,
    - deconnecting input - output (= getting rid of stove pipes)
CSO
§   strong focus on practical results, (succesfull) quick wins;
§   2 different implementations of the CDM
§   organisational driver for ADC
CBS
§   Strong focus on metadata model
§   DSC = essential element of the business architecture
§   1 implementation supporting all processes

 ESSnet Data Warehousing
                                                             19
Conclusions

Regarding the DWH ESSnet

Ø   S-DWH architecture covers both best practices
Ø   ESSnet indicated right issues to focus:
       -    metadata
       -    role/position BR
Ø   strong desire for knowledge exchange,
    learning from other NSIs
Ø   CSO = very helpful best practice case
Ø   CSO acknowledges importance of ESSnet,
    wants to stay closely involved



ESSnet Data Warehousing
                                                    20

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:7/10/2014
language:English
pages:20