Docstoc

Conceptual and physical model of the integrated database

Document Sample
Conceptual and physical model of the integrated database Powered By Docstoc
					                              EUROPEAN COMMISSION
                        INFORMATION SOCIETY TECHNOLOGIES
                                 (IST) PROGRAMME




                                                                      Deliverable 2.1

                                Conceptual, logical and physical
                                             model of datamarts

    Work Package No. 2: Concept of IT framework issues
           and development of software for the creation
                           of a multi-source data base

Alberto Sorce                                             Filippo Oropallo
ISTAT - National Statistical Institute of Italy        ISTAT – National Statistical Institute of Italy
Information System                                     Economic Research and Microsimulation
Via Balbo, 16 - 00184 ROMA                             Via Torino, 6 - 00184 ROMA
Tel: +39 06 4673 2575                                  Tel: +39 06 4673 4186
E-Mail: sorce@istat.it                                 E-Mail: oropallo@istat.it


                                        ISTAT - September 2002
25             / Deliverable 2.1                                      1




IST Programme, contract No 2000-31125

Development of a System of Indicators on Competitiveness and Fiscal
Impact on Enterprises Performance (DIECOFIS)




PROJECT PARTNERS
Coordinator
Istituto Nazionale di Statistica (I)

Principal Contractors
Board of Inland Revenue (UK)
London School of Economics (UK)
University of Cambridge (UK)
Università di Roma Tor Vergata (I)
Università di Firenze (I)
Centro di Ricerca Economica e Sociale (I)
European Commission Joint Research Centre (I)
Informer SA Computing Information Systems and Management Cons. (EL)
Wirtschaftsuniversität Wien (A)
25           / Deliverable 2.1                                               2




IST Programme, contract No 2000-31125

Development of a System of Indicators on Competitiveness and Fiscal
Impact on Enterprises Performance (DIECOFIS)




SUMMARY OF DELIVERABLE

The development of the integrated database was concerned with the study
of the architecture of the information system. The logical and physical
schemes have been produced. In these, substantially, we can distinguish
three main groups of tables:

- Enterprise life (demography, activity, size)
- Nomenclatures and definitions to harmonize different sources (Metadata)
- Tax oriented tables containing account, production and other variables
needed by the modules of the microsimulation model for the estimation of
tax burden and for the construction of indicators.

The construction of indicators will be possible with the implementation of
Data marts for specific analysis, where granular data are prepared with
aggregations by size, or geographical area, economic activity etc.
25               / Deliverable 2.1                                                               3




IST Programme, contract No 2000-31125

Development of a System of Indicators on Competitiveness and Fiscal
Impact on Enterprises Performance (DIECOFIS)




CONTENTS OF DELIVERABLE


Introduction ............................................................................... 4
ISTAT experiences ...................................................................... 5
Diecofis Database ....................................................................... 6
Data model ................................................................................ 7
The identifying variables .............................................................. 8
The analysis of the Logical model ............................................... 10
Logical Model ........................................................................... 12
Physical Model.......................................................................... 16
Data marts for specific analysis .................................................. 20
Select Bibliography ................................................................... 23
25             / Deliverable 2.1                                                         4




Introduction
Workpackage 2 is concerned with the development of the integrated database and
with the building of several collections of datasets to serve specific micro-analysis.
Integration will take place by means of the development of software containing
matching techniques of different sources of data and re-weighting units of firms’
samples. A software user manual will guide in the integrating process.


Objectives of WP2:
•      Creation of a multi-source integrated and systematised database
•      Collection of datasets for micro-analysis


Work of WP2:
•      Development of the logical and physical scheme of the data base
•      Analysis of the software for the integration and the collection of datasets
•      Code for data downloading and user interface
•      Software user manual


Deliverable:
The deliverable contains:
•      A description of the ISTAT experience concerning integrated databases.
•      Features of Diecofis Database, with a look at the identifying variables.
•      Logical and Physical schemes.
•      Data Marts for specific analysis.
25              / Deliverable 2.1                                                           5




ISTAT experiences

In the last years the Italian NSI has started several studies and projects about
integration of data from different sources. One project concern the development of
the business register ―ASIA‖. This archive is the result of an integration process of
different administrative sources. It leads to the creation of the statistical register of
active enterprises. Basically activity is presumed by means of a logistic model
where the probability of existence is a function of various signs of life drawn from
several administrative sources (Revenue Agency, Chamber of Commerce, Social
Security Institute, Insurance Institute at work, yellow pages).
Other projects are attempting the integration of the business register, which
identify the business unit, with other administrative data. In the database Coe are
integrated information about foreign trade. Other projects are attempting the
integration of business register with accounts data from Chambers of Commerce.
An important activity in this direction is the development of the Statistical
Information System on Enterprises and Institutions. The part concerning with
enterprises would integrate all business statistics for the management of business
units and for data analysis. The system is based on database network which
integrate business surveys of the Department of Economic Statistics. This
integrated system would also bring to systematise all information and to the
development of a modern system of dissemination. IT developments, during recent
years, permits to develop an unique environment and to build Data Warehouses for
the dissemination of large databases, which can fulfil different users’ needs.
25             / Deliverable 2.1                                                         6




Diecofis Database

The study of Istat surveys of enterprises within Work Package 1 leads to plan a
data programme. The work deals with the creation of a database in order to gather
information deriving from different statistical surveys. The idea comes out from the
need of having an on-line source able to provide as many information as possible
on structure and activity of Italian enterprises. The project’s first stage deals with
the storage of micro-data of SCI survey which concerns all Italian enterprises with
at least 100 employed; further steps will involve integration with other business
surveys (PMI, COE, PRODCOM, New economy etc.), and of administrative data
(data of Chambers of Commerce and possibly fiscal data).


The specificity of this work can be summarised in the following items:


- A big amount of data (they occupy various dozens of gigabytes on the disk);


- Complete flexibility of the System, it must be free from specific surveys, in order
to include step by step present and future information.;


- Systematic use of the server of Istat by Diecofis consortium.


All this imposes a careful study of the logic structure of the database that, when set
up cannot be modified without a radical change of the whole system. In addition to
that it imposed a new coding of statistical items in order to unify those items so far
codified according to different surveys.
25               / Deliverable 2.1                                                      7




Data model

Groups forming the System are shown in the following picture:




                                                     Statistical Information:
                                                      ISTAT business surveys
                  Business Register
                                                     Foreign trade information
                                                    Annual report and fiscal data




                                           Metadata



                                Figure 1 – Data model


The structure hub is the enterprise list, which for each enterprise has a record with
an internal code (primary key) and codes used in ASIA archive, fiscal code and VAT
number.


General information and statistical data share dimensional information (workers,
employees. turnover). This approach enables to deal in the same way with
information provided by ASIA Archive and statistical surveys, as well as by other
archives. The modularity of the System allows:
     -   Prompt widening in case of new information (new surveys, new sources) to
         be added;
     -   Easy and efficient location of information on various computer systems.
         Information can be filed in various server in order to have a distributed
         architecture.
25               / Deliverable 2.1                                                       8




The identifying variables

The data form was organised considering the following objectives:
     -   To keep the historical memory of changes in the situation of a single unit;
     -   To allow also retroactive changes of specific information, with a new
         reference date;
     -   To collect information on identity up-dating from a wide range of different
         sources, keeping the information on the reference source;
     -   To file information on the reference people of enterprises and on specific
         surveys.


Three options were considered in choosing the logic model:


     -   One single table including all information on the enterprise.
     This solution is little suitable in case of up-dating, and the up-dating of one
     single field requires to file the historical archive of the whole record;


     -   One single table in which each line contains an information, with an
         identifying sign enabling to recognise the information and its format.
     This solution is more flexible: it allows adding all information even-though they
     were not in the original database. In this way research and management of
     information is very complicated. In addition to that the physical room they need
     is about three times the space used by each single field. Each field must have
     identification and we must consider the space required by its index. Due to the
     final dimension of the System, the use of space becomes unacceptable;


     -   Few tables with grouped information according to the area.
     This was the chosen option because of its flexibility and efficiency. Information
     are grouped as follows:
     1. Existence and activity of the enterprise
     2. Fiscal and identity information
     3. Location information
     4. Dimensional information.
25               / Deliverable 2.1                                                      9




These four groups allow flexibility in putting new information as new tables or new
fields, efficiency in management, since no space is used to file identification (the
only identification is the one of the enterprise). Moreover the grouping of variables
was set minimising the space required by historical information:
     -   shift of municipality implies a new address, but not a new enterprise name;
     -   a new enterprise name usually implies a change of fiscal code and VAT
         number, without any change in location.
Each table of the field ―year‖ enables to find immediately the record in force in the
analysed year. Dimensional information (turnover, workers, employees) are
recorded according to the reference year. In this way we have dimensional
variables for each year.
25              / Deliverable 2.1                                                                10




The analysis of the Logical model

In order to set the logic structure of the system, we considered three different
configurations, for each of them we considered qualities and defects. We then
experimented them in order to find out the most efficient configuration. The last
phase involved also the development of data navigator, which was used as an
instrument of performance analysis. The three analysed configurations can be
summarised in the following scheme:
                             Table 1 – Efficiency and flexibility
TYPE          DESCRIPTION             EFFICIENCY                    FLEXIBILITY
Horizontal    Tables contain one      It requires to read one       It is the least flexible
tables        single record for       single record for each        solution, for it requires
              each enterprise and enterprise and for each           to change the structure
              for each year with      year, but this record has     of tables for any
              as many fields as       many empty fields             variation of surveyed
              the number of                                         variables
              surveyed variables


Completely    Each line in a table    It is the solution            It is the most correct
vertical      records the amount requiring the highest              solution from a relational
tables        of variables            number of accesses to         point of view for it leads
              referred to one         the disk to be read: for      to a complete
              year                    each enterprise and year normalisation of tables:
                                      various dozens of             This implies a complete
                                      records have to be read,      independence between
                                      but is not said that it is    tables and the type of
                                      the least efficient for       considered survey.
                                      only fields containing
                                      information are
                                      memorised.


Vertical      Each line of a table    Requires a less number        Also this solution is very
tables with   contains the            of accesses than the          flexible regarding
horizontal    amount of a             previous one, but             surveys and their
years         variable for all        imposes to record also        content; unfortunately it
              memorised years         fields related with years     obliges to establish in
25                 / Deliverable 2.1                                                           11




                                       in which values were not    advance the number of
                                       surveyed.                   years included in the
                                                                   database. Each line of a
                                                                   table contains the
                                                                   amount of a variable for
                                                                   all the years in which
                                                                   values were not
                                                                   recorded.
A     test   was   submitted   to   analyse   the   System   performance    in   the   three
configurations, using data related with SCI surveys 1989-1998. In order to achieve
the best performance, DBMS Oracle requires a synchronic ―end‖ of the all System,
and this procedure was repeated in the three configurations. Some changes in the
physical database structure (i.e. indices structure) or in the System structure (i.e.
areas of memory allocated to DBMS) enabled to improve performances of one rank.
At the end of this matching we saw that there where no relevant differences of
performance. Executing a research by the data browser, we can consider that the
server turnaround1 time is of some dozens of milliseconds (the same of some
access to disks) time to send the answer on the web ranges from some dozens of
milliseconds to (with a 10Mbit LAN no overloaded) to some seconds (for a 64Kbps
CDN line) the rest of the time is used by the client station to organise results.
Once we have checked that a research operation occupies the server for a
negligible time, we decided to adopt the second configuration, the most flexible
and most correct from a relation point of view.
Tables of information coding were kept separated from the whole System. Those
tables are geographical codes (municipality, provinces, regions, geographical areas,
foreign countries), typology codes (legal form of enterprises, type of events, etc.)
and economic codes (ateco 91, metadata of economic variables, etc.).
These tables were memorised in a plan separated from the whole System, for
management of these codes deals with a different organising structure from the
one of single surveys on enterprises, and also because these codes have to be
shared at a reading level by all informative subsystems, in order to have
consistency among all information.
Following figures represent the logic scheme followed by the physical one:




1
    The time between the acceptance of a request by the System and its answer.
25            / Deliverable 2.1                                                                                       12




Logical Model
ENTERPRISE LIFE AND RELATED TABLES: STAFF and TURNOVER
ACTIVITY (Sector of activity – CPA - NACE standard) - LEGAL (Legal form) - SITE (Localisation) - EVENTS
(Transformations, etc.) - (Years 1996-2000)
                                                                                                                              Produce / Produced
                                                       Turnover
                                                                                   Staff
                                                       ID (FK)
                                                       YEAR                         ID (FK)
                                                                                    YEAR                                             s taff
                                                       Value
                                                                                    independents
                                                                                    employee                                           ENTERPRISE
                          carried out or carries out
                                                                                                                                        ID
                                                                                                                                        legal_no
           Activity                                                                     had old s ite or has other s ites               birth
            ID (FK)                                               Site                                                                  death
            ateco5 (FK)                 has or had                 ID (FK)                                                              s tatus
            year                                                   UnitID                                      Endure                   update
                                                                   year
                                                                   Prov (FK)                                                                        of
                          Legal                                    Mun (FK)
                           ID (FK)                                 pcode
                                                                                                            Events
                           lform (FK)                              lform (FK)
                                                                   ateco5 (FK)                               ID (FK)
                           year
                                                                                                             eventID (FK)
                                                                                                             year
                                                                                                             id_old


                                                                             of other
                                                                                                                            before


                      previous                                                   before new

Figure 2 – Logical Model: Diecofis Database (Part 1 of 4)
25           / Deliverable 2.1                                                                     13




ENTERPRISE NOMENCLATURES AND DEFINITIONS:
These tables contain the description of codes
                                                        LegalForm
                                                                                  Countries
                                                        lform                                            Regions
                         S ource                                                     Country
                                                        Des cr                                            Reg
                             ids ourc e                                              EU
                                                        lform1                                            des c r
                             des c ription                                           initials
                             regularity                                              des c r


                                             of other                                           P rovinc es
             of                of                                                                P rov

                                                                   E ventCode                    Reg (FK )
                     V ariables
                                                                                                 des c r
                      vcode                                         eventID
                      ids ourc e (FK)                               Des cr
                      year
                      des c r                               A ctivityClas s
                      c urrenc y                                atec o5
                      annotation
                                                                des c r                         Municipalities
                      measure_unit
                                                                atec o3                         Mun
                                                                lett2                           P rov (FK )
                                                                lett1
                                                                                                pcode
                                                                                                population
                                                                    P roduc tID                 des c r
                                                                     c ipro
                                                                     prodcom_c ode
                                                                     atec o5 (FK )
                                                                     CN8
                                                                     des c r
Figure 3 – Logical Model: Diecofis Database (Part 2 of 4)



          des c ription of
25                     / Deliverable 2.1                                                  14




Enterprise’s accounts data and direct taxation


     Corporate
     year (FK )                                                         Corporate Dataset
     ID (FK)
     vcode (FK)                                                                                                            des c rip
     ids ourc e (FK)
     value

                           Regional
                                                                Regional T ax and Soc ial Security Contributios
                           year (FK )
                           ID (FK)                                                                                    des c ription
                           vcode (FK)
                           ids ourc e (FK)
                           value
                           weight

                                                                                                                  P roduc t Datas e


                                             P ers onal                             P ers onal T ax Dataset                 des c r
                                              year (FK )
                                              ID (FK)
                                              vcode (FK)
                                              ids ourc e (FK)
                                              value




Figure 4 – Logical Model: Diecofis Database (Part 3 of 4)
25                / Deliverable 2.1                    P roduc tID                    15
                                                        c ipro
                                                     prodcom_c ode
                                                     atec o5 (FK )
                                                     CN8
Enterprise’s products and other features (ICT     etc.) c r
                                                     des




              des c ription of




         des c ription of
                                      Other Features             of                         Other
                                                                                            year (FK )
                                                                          des c ripion of
                                                                                            ID (FK)
                                                                                            vcode (FK)
                                                                                            ids ourc e (FK)
                                                       P roduc t
     P roduc t Datas et                                                                     value
                                                        c ipro (FK)                         weight
                                                        ID (FK)
                des c ription of                        year (FK )
                                                        vcode (FK)
                                                        ids ourc e (FK)
                                                        V alue
                                                        weight




Figure 5 – Logical Model: Diecofis Database (Part 4 of 4)
25              / Deliverable 2.1                                                                                    16




Physical Model
ENTERPRISE LIFE AND RELATED TABLES:
STAFF and TURNOVER
ACTIVITY (Sector of activity – CPA NACE standard)
LEGAL (Legal form) - SITE (Localisation) - EVENTS (Transformations, etc.) - (Years 1996-1999)

                                                Turnov er
                                                                          Staf f
                                                ID : NU MBER(9)
                                                YEAR : NU MBER(4)          ID : NU MBER(9)
                                                                           YEAR : NU MBER(4)
                                                v alue: N UMBER (13)
                                                                           independents : NU MBER(9)
                                                                           dependents: N UMBER(9)                          ENTERPR ISE
                                                                                                                           ID : NU MBER(9)              Sourc e
                                                                                                                           legal_no: VAR CH AR2(11)      ids ourc e: VARC HAR 2(9)
           Act iv ity                                                                                                      birth: NU MBER(4)
                                                                                                                                                         desc ription: VARC HAR2(20)
            ID : NU MBER(9)                                 Site                                                           deat h: N UMBER (4)
                                                                                                                                                         regularity : VARC HAR 2(2)
            at ec o5: VARCH AR2(5)                          UnitI D: N UMBER (6)                                           st at us : NU MBER(4. 4)
            y ear: N UMBER (4)                              y ear: N UMBER (4)                                             update: DATE
                                                            ID : NU MBER(9)
                                                            Prov : VARCH AR2(3)
                        Legal                               Mun: VAR CHAR 2(3)
                        ID : NU MBER(9)                     pc ode: VARC HAR 2(5)                                                                     Variables
                                                                                               Ev ents
                        lty pe: VAR CHAR 2(3)               lty pe: VAR CHAR 2(3)                                                                     v code: VARCH AR2(9)
                                                            at ec o5: VARCH AR2(5)             ID : NU MBER(9)
                        y ear: N UMBER (4)                                                                                                            ids ourc e: VARC HAR 2(9)
                                                                                               ev ent ID : VAR CH AR2(3)
                                                                                                                                                      y ear: N UMBER (9)
                                                                                               y ear: N UMBER (4)
                                                                                                                                                      desc r: VARC HAR 2(60)
                                                                                               id_old: NU MBER(9)
                                                                                                                                                      currency : VARC HAR 2(20)
                                                                                                                                                      annotation: VARC HAR2(50)




Figure 6 – Physical Model: Diecofis Database (Part 1 of 4)                                                                                                                    Product 1
                                                                                                                                                                                  nc 8: VAR CHA
                                                                                                                                                                                  at ec o5: VARC
                                                                                                                                                                                  desc r: VARC H
                 25          / Deliverable 2.1                                                                     17




                 ENTERPRISE NOMENCLATURES AND DEFINITIONS:
                 These tables contain the description of codes
                                                                 LegalT ypes
NT ERPRISE                                                                              Countries
                                                                 lform: V ARCHAR2(3)                               Regions
D: NUMBE R(9)                   S ource                                                 c od_paes e: NUMB ER(3)
                                                                 des c r: V ARCHAR2(60)                              Reg: NUMB ER(2)
gal_no: V ARCHAR2(11)            ids ourc e: V ARCHAR2(9)
                                                                 lform1: VA RCHAR2(1) UE: VA RCHA R2(1)              des c r: V ARCHAR2(40)
rth: NUMB ER(4)                                                                         s igla: V ARCHAR2(2)
                                 des c ription: V ARCHA R2(20)
eath: NUMBE R(4)                                                                        des c r: V ARCHAR2(50)
                                 regularity: V ARCHAR2(2)
atus: NUMB E R(4.4)
pdate: DA TE
                                                                                                            P rovinc es
                                                                                                             P rov: V ARCHA R2(3)

                                                                         E ventCode                          Reg: NUMB ER(2)
                            V ariables
                                                                                                             des c r: V ARCHAR2(40)
                             vcode: VA RCHAR2(9)                          eventID: V A RCHA R2(3)
                             ids ourc e: V ARCHAR2(9)                     Des cr: VA RCHAR2(40)
                             year: NUMB ER(9)
                             des c r: V ARCHAR2(60)                   A ctivityClas s
                             c urrenc y: V ARCHAR2(20)                 atec o5: VA RCHA R2(5)
                             annotation: VA RCHA R2(50)
                                                                       des c r: V ARCHAR2(40)               Municipalities
                             measure_unit: VA RCHA R2(20)
                                                                       atec o3: VA RCHA R2(3)                Mun: V A RCHA R2(3)
                                                                       lett2: V ARCHA R2(2)                  P rov: V ARCHA R2(3)
                                                                       lett1: V ARCHA R2(1)
                                                                                                             pcode: VA RCHAR2(5)
                                                                                                             population: NUMB ER(9)
                                                                        P roduc tID                          des c r: V ARCHAR2(50)
                                                                         c ipro: V ARCHAR2(12)
                                                                         prodcom_c ode: V ARCHA R2(8)
                                                                         atec o5: VA RCHA R2(5)
                                                                         CN8: V ARCHA R2(20)
                                                                         des c r: V ARCHAR2(40)
                 Figure 7 – Physical Model: Diecofis Database (Part 2 of 4)
25               / Deliverable 2.1                                          18




Enterprise’s accounts data and direct taxation


     Corporate
     year: NUMB ER(9)
     ID: NUMBE R(9)
     vcode: VA RCHAR2(9)
     ids ourc e: V ARCHAR2(9)
     value: NUMBE R(12)



                    Regional
                     year: NUMB ER(9)
                     ID: NUMBE R(9)
                     vcode: VA RCHAR2(9)
                     ids ourc e: V ARCHAR2(9)
                     value: NUMBE R(9)
                     weight: NUMB ER(9)


                                                P ers onal
                                                 year: NUMB ER(9)
                                                 c odice: NUMB ER(9)
                                                 vcode: VA RCHAR2(9)
                                                 ids ourc e: V ARCHAR2(9)
                                                 value: NUMBE R(12)




Figure 8 – Physical Model: Diecofis Database (Part 3 of 4)
25          / Deliverable 2.1                P roduc tID                     19
                                              c ipro: V ARCHAR2(12)
                                              prodcom_c ode: V ARCHA R2(8)
                                              atec o5: VA RCHA R2(5)
                                              CN8: V ARCHA R2(20)
Enterprise’s products and other features   (ICT etc.)
                                              des c r: V ARCHAR2(40)




                                                                                  Other
                                                                                  year: NUMB ER(9)
                                                                                  c odice: NUMB ER(9)
                                                                                  vcode: VA RCHAR2(9)
                                                                                  ids ourc e: V ARCHAR2(9)
                                               P roduc t
                                                                                  value: NUMBE R(12)
                                                c ipro: V ARCHAR2(12)             weight: NUMB ER(4.4)
                                                ID: NUMBE R(9)
                                                year: NUMB ER(4)
                                                vcode: VA RCHAR2(9)
                                                ids ourc e: V ARCHAR2(9)
                                                V alue: NUMBE R(12)
                                                weight: NUMB ER(4.4)




Figure 9 – Physical Model: Diecofis Database (Part 4 of 4)
25               / Deliverable 2.1                                                     20




Data marts for specific analysis

The Multisource database provide the starting point for data warehousing—the
legacy application systems. In the legacy applications, transactions are executed in
which detailed transaction data is collected. The detailed data that is collected is
more of a by-product of transaction processing than anything else. But this raw
data forms the very detailed foundation needed for the data warehouse. The
transaction data that is collected passes through a layer of programs whose
purpose is to integrate and transform the data to make it fit to enter the data
warehouse.
The layer of programs can be called ―integration and transformation‖ programs or
ETL—extract/transform/load programs.
These programs can be written manually or in an automated manner by program
generators specifically designed to suit the needs of this interface.
Once the legacy data passes through the integration and transformation programs,
the data is fundamentally changed to meet the informational needs of the
microsimulation model.
Keys are altered so that there is one group understanding of data, structures are
changed, and encoding algorithms are unified. The data enters the layer of
integration and transformation programs in an unintegrated state and passes out of
the layer of programs in an integrated state.
Emanating from the entire statistical information system of enterprises is the
collection of data marts. In many ways the data marts appear to be the data
warehouse because it is here that the end user has direct interaction with the data
warehouse environment. The data marts are almost always oriented to the needs
and requirements of the analysis. The data marts are requirements- driven in that
they are shaped by the unique and specific needs of the analysis. The data then is
processed by specific modules of the model and is ready to serve as a basis for all
sorts of decision support systems (DSS) processing, to build statistical studies and
indicators for policy analysis.
The diecofis integrated database would contain all the information about enterprise:
- Identifiers;
- Economic variables and balance sheet variables;
- Foreign trade activity;
- Commercialisation and energetic consumption;
25              / Deliverable 2.1                                                    21




- Innovation features and cost structure.


When we shall have all information we may use it for various purposes. So we build
local view of data to serve various modules of the Microsimulation model and to
build indicators.
We could have data-marts oriented to the needs and requirements of the model.
Data-marts are requirement-driven (cf. W.H. Inmon) in that they are shaped by the
unique and specific needs of every module of the Microsimulation model.
Therefore we can compare data-marts with datasets for tax simulating purpose.




                                                            for specific analysis




ODS: Operational Data Store


              Figure 10 - Today’s mature corporate information factory
                                    (cf. W.H. Inmon)


Microsimulation modules are built to simulate taxation of enterprises, these are:
25               / Deliverable 2.1                                                    22




     -   Corporate tax module
     -   Regional (or local) tax and Social Contribution modules
     -   VAT and Excise modules
     -   Personal tax module


As a consequence of this structure we need specific sub sets of enterprises with an
accurate selection of variables needed
So we may have the following subsets of enterprises for every module of the
model:
     -   Corporate tax dataset
     -   Regional tax and social security dataset
     -   Products dataset
     -   Personal tax dataset


At the final stage, after integration and when tax estimates are produced, granular
data are prepared for the construction of indicators. Analysis will be made by:
     -   Percentiles of enterprises (by size or turnover);
     -   Class of economic activities;
     -   Regions or geographical area;
     -   Legal forms etc.
25              / Deliverable 2.1                                                      23




Selected Bibliography

Eurostat - ―Use of Administrative Sources for Business Statistic Purposes:
Handbook on Good Practices‖ – Theme 4 (Industry, Trade and Services) - 1999
Eurostat Edition


Giovannini, E., Sorce, A. – ―Integration of Statistical (survey) data with registers
(administrative) data‖ – Meeting on the Management of Statistical Information
Technology 2001 – paper.


Giovannini, E., Sorce, A. – ―The Data Warehouse, a Modern System for the
Dissemination of Information‖ – 2001 DISE-ISTAT paper.


Inmon, W. H. – ―Data Marts and Data Warehouse: Information Architecture for the
Millenium‖ – Informix Corporation.


Sorce, A. – ―The Impact of Data Warehousing on the management of Statistical
Offices‖ – New Techniques and Technologies for Statistics 2001 – Exchange of
Technology and Know-how – Pre-Proceedings of the Conference (Crete 18-22 June
2001) – Volume II.


Sorce, A., Capasso, G., del Mondo, G., Grasso, F., Vignola, L., Fanizzi C. –
―Statistical Information System on Enterprises (SISSI)‖ 1999 DISE-ISTAT paper.

				
DOCUMENT INFO