Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Data Warehousing Architecture and Techniques in Statistical

VIEWS: 7 PAGES: 21

									   USING THE METADATA IN
STATISTICAL PROCESSING CYCLE
   – THE PRODUCTION TOOLS
          PERSPECTIVE
     Matjaž Jug, Pavle Kozjek, Tomaž Špeh
  Statistical Office of the Republic of Slovenia
                 Overview

   Current statistical production cycle in SORS
   Using the metadata in Blaise applications
   The role of metadata in automatic editing
    system in SAS
   Metadata connected with the data in Oracle
    data warehouse
   Lessons learnt
   Questions
Current statistical production cycle

   Entry and micro editing (Blaise)
   Macro and statistical editing (SAS)
   Storing and analysis (Oracle)
   Dissemination (PC-Axis)
   Central metadata stores (Klasje & Metis)
       Using the metadata in Blaise
               applications

   Generation of (high speed) data-entry
    applications using Gentry (using by non-
    IT personnel)
   Metadata-based transformations between
    different data structures (EXTRA-FAT,
    FAT, THIN)
      Gentry – tool for generation of the
        Blaise data-entry application

   Questionnaire
    structure and
    layout (name,
                          Data type
    blocks, tables,
    routing etc.)
   Field
    characteristics
    (length, data type,
    constants, other
    parameters)

                          Field characteristics
 Gentry – example of
generated application

           header




 section




  Data entry for table 12
                              Transformations

All data for one unit (provider) in one row (EXTRA FAT): suitable for micro editing
PROVIDER Industry SizeClass    QuantityProduct1 ValueEURProduct1 QuantityProduct2 ValueEURProduct2
Provider 1 A      big                      150               300               200             400

                                                            Metadata-based transformation in Blaise

 Classification and continuous variables in the columns (FAT): suitable for analysis
           PROVIDER PRODUCT Industry SizeClass              Quantity         ValueEUR
           Provider 1 Product 1 A    big                               150              300
           Provider 1 Product 2 A    big                               200              400
                                                            Metadata-based transformation in SAS

  Classification variables in the columns and continuous variables in the rows (THIN)
            PROVIDER     PRODUCT     Industry   SizeClass   ContVariables ContObservations
            Provider 1   Product 1   A                big   Quantity                    150
            Provider 1   Product 2   A                big   ValueEUR                    300
            Provider 1   Product 1   A                big   Quantity                    200
            Provider 1   Product 2   A                big   ValueEUR                    400
    The role of metadata in automatic
          editing system in SAS

   General system for automated editing
   Process metadata
    The role of metadata in automatic
          editing system in SAS

   In order to be general the tool must be able
    to:
    -   recognize the data which are due to be subjected
        to editing and/or imputation;
    -   recognize which editing method should be
        applied,
    -   and with what parameters
        Process indicators – level 1

   Mode of data collection
    -   1 data provided directly by reporting unit
    -   2 data from administrative source
    -   3 data computed from original values
    -   4 imputed data – imputation of non-response
    -   5 imputed data – imputation due to invalid
        values detected through the editing process
    -   6 data missing because the unit is not eligible for
        the item (logical skip)
        Process indicators – level 2

   Data status
    -   1 original value
    -   2 corrected value
        Process indicators – level 3

   Method of data correction
    -   11 correction after telephone contact
    -   12 data reported at a later stage
        Process indicators – level 3

   Reporting methods
    -   11 reporting by mail questionnaire
    -   12 computer assisted telephone interview(CATI)
    -   13 telephone interview without computer assistance
    -   14 paper assisted personal interview (PAPI)
    -   15 computer assisted personal interview (CAPI)
    -   16 paper assisted self interviewing
    -   17 computer assisted self interviewing
    -   18 web reporting
        Process indicators – level 3

   Imputation methods
    -   10 method of zero values
    -   11 logical imputation
    -   12 historical data imputation
    -   13 mean values imputation
    -   14 nearest neighbour imputation
    -   15 hot-deck imputation
    -   16 cold-deck imputation
    -   17 regression imputation
    -   18 method of the most frequent value
    -   19 estimation of anual value based on infraanual data
    -   21 stochastic hot-deck (random donor)
    -   22 regression imputation with random residuals
    -   23 multiple imputation
Process indicators examples - xy.zz

   11.15 means:                    42.19 means:
    1 - data provided directly       4 - imputed data –
       by reporting unit                imputation of non-
    11 - original value                 response
    11.15 - computer assisted        42 - corrected value
       personal interview            42.19 - estimation of anual
       (CAPI)                           value based on
                                        infraanual data
               Statistical process

                                  Blaise



Blaise                                     Oracle
         SAS   Key responders



                    Other units




                                  SAS
Metadata connected with the data in
     Oracle data warehouse

   On-line access to:
    -   Historical data
    -   Data from different phases (not only final data)
    -   Data for multiple surveys (not only data marts)
    -   Statistical (variables & classifications) and
        process (time stamps, status indicators...)
        metadata connected with the data
   ...accessible for third-party tools
 Conceptual star scheme for SBS
                                DIM_TIME
     DIM_NACE                                       DIM_VARIABLE




     DIM_NUTS
                               FACTS_SBS            DIM_SOURCE



                               OBS_VALUE
                                WEIGHT

           ...              THIN table design


   DIM_ORG_FORM                                    DIM_INDICATOR
                              DIM_OBS_UNIT




                                                METADATA DIMENSIONS
CLASSIFICATION DIMENSIONS
               Metadata                         Classification
                               Input tables
                server                             server

                                                                                       Business
                                Input data
                                                                                        register
                Variables
                Sources...                     Classifications
Imputation       Imputed                                          Editing
                   data
                               Loading        Corrected data                                   Editing
  table                                       Uncorrect data
                                                                   table        Searching
                                                                                                form
                                                                  Automatic
                                 Clean                            corrections
 Imputations                     data

                                                                                              Manual
                                                                     Editing                corrections
                Parameter
                             Transactional
Imputations       query          star              Control
                                                    query
                             Extracted data

                                                                                   Oracle
                              Extractions            Analysis       Results      Discoverer

                               Analytical
                                 data


                              Analytical             Analytical
                               cubes                  query
                   Lessons learnt

   The role of central repositories for metadata
    -   Natural source of conceptual metadata
    -   Metadata have to be exact, complete and consistant
    -   Process metadata should be connected with the data
   Harmonisation of metadata concepts
    -   Local metadata vs. global metadata
    -   The cultural change is needed
   Technical considerations
    -   The possibilities for metadata exchange and system
        integration are good (XML, SQL)
Questions

								
To top