Taking into account quality in the french business surveys

Document Sample
Taking into account quality in the french business surveys Powered By Docstoc
					Proceedings of Q2006
European Conference on Quality in Survey Statistics

        Redesigning French Structural Business Statistics using
           Administrative Data: Principles and First Results
                      of Methodological Studies

                                          Philippe Brion 1

                                          1. Introduction

The French National Institute of Statistics (Insee) started in 2005 the redesigning of
structural business statistics. This project has two main components:
    - use in a more systematic way of some administrative data (accounting data
       through the fiscal source; data concerning employees and salaries through
       declarations made by enterprises to “social” institutions; external trade data);
    - concept of “enterprise group” used in a more important way in business

This paper describes the general principles of the project (part 2), and gives some
ideas about methodological studies to be conducted before implementing the project
(part 3). The few results presented are preliminary results, and have to be
confirmed by following studies.

The project is named RESANE (“Refonte des Statistiques ANnuelles d’Entreprises”;
see Depoutot, 2005), and is aimed at being implemented around year 2010. From a
quality point of view, its main contributions will concern the relevance of statistics
(especially through the use of the “enterprise group” concept), the timeliness, the
coherence of statistics produced, and the global costs of the process (including the
statistical burden for enterprises).

                            2. General principles of the system

2.1 The current device

At present, two “parallel” processes are being used, a statistical survey and another
process using tax data (income annual statements).
The annual enterprise survey is conducted by different statistical departments: Insee
for the economic sectors of services and trade, ministry of industry for industry,
ministry of equipment for transportation and construction, ministry of agriculture for
food industries. Every year, between 150 000 and 200 000 enterprises are surveyed
by mail (Rivière, 1997). Big enterprises are all surveyed, and for small enterprises a
sample is used.

 Philippe Brion, Business Statistics Directorate, Insee, 18 bd A. Pinard, 75675 Paris cedex 14,
The process using the tax data merges the file of annual income returns to the tax
authorities with the file of the annual enterprise survey (Grandjean, 1997), to
“improve” the fiscal data. The final file is then more complete than the file of the
survey. This “double” device - especially the fact to conduct a survey collecting
information available in tax files - was implemented at a moment when the availability
of tax data was not sufficient enough to give early results to users.

However, some administrative data have also been used during the previous years
for the annual enterprise survey (Brion, 2001):
    - in a first step, the tax data, when available individually sufficiently early for
        some units, were used to “impute” the non responses of big enterprises ;
    - then, since tax data files were available earlier, it was decided to send a
        questionnaire only one year every two years to small enterprises of some
        economic sectors, and to use for the other year the tax data ; but the poor
        quality of the tax data (for variables not being accounting variables) required to
        keep a survey every two years ;
    - to improve the treatment of non-responses, and especially to decide if a non-
        response is corresponding to a dead enterprise, or to a “real” non-response,
        infra-annual tax data have also been used : the monthly turnover statements
        (used by fiscal authorities for the VAT) ; if, for one enterprise, more than six
        months of statements are available, this enterprise is considered as alive.

But this device had to be completely reviewed, since administrative data are
available earlier than before.

2.2 The future device

The main idea is to use in a more systematic way the administrative data, including
not only tax data. This use is made easier in France because of the existence of a
unique identification number for enterprises (N°Siren, given by Insee within the
business register, and which use is mandatory for administrations as fiscal authorities
or social protection agencies), and also because of the existence of the French “Plan
Comptable Général” which gives common references to the accounting variables (the
same definitions being used by statistical and fiscal administrations).

Three kinds of administrative data will be used:
   - annual income statements of enterprises: this information is now available at
      the end of June for big enterprises, and also for smaller ones declaring via
      internet; in October, all data should be available ;
   - annual statements of payroll data : these declarations are sent to social
      protection agencies, and give information about the number of employees and
      compensation levels, they should be available in September ;
   - customs data, available in July for the previous year.

However, these administrative data are not sufficient to cover all needs of the users
of structural business statistics. Some information is not available among them, or is
of poor quality, since the administration collecting them does not use them directly for
its specific needs: for example, the classification of the enterprise within the activity
nomenclature does not affect the amount of tax the enterprise will pay.
So, it was decided to keep a statistical survey (lightened compared to the actual
one), particularly to collect information about the breakdown of turnover (see further,
part 3.2).

The objective is to use jointly the administrative data and the statistical survey to
produce results at different periods (figure 1):
   - definitive structural business statistics concerning year n at the end of year
      n+1 ;
   - preliminary results before, for example at the end of October for the
      preliminary SBS data sent by members’ states to Eurostat, or former first
      results in July for macro-economists (limited to few variables).

Figure 1


                                                   First results   Definitive results

01/01                                                                    31/12
                              Tax data

                       Other administrative data

2.3 The use of the concept of enterprise group

At present, French business surveys use legal units as collecting units. More and
more, problems are raised by complex structures within big enterprise groups. Some
production factors may be shared between different legal units of the group,
especially employees; there may also be internal flows generated inside a group, on
the occasion of the restructuring of the group. These internal flows, generated by the
creation of new legal units for example, may cause an apparent increase of variables
such as turnover. The prices used between the units inside the group may also be

To take into account these questions for a few groups, it was created some specific
statistical units, by grouping legal units and asking the group to fill, for the annual
enterprise survey (and for other surveys), just one questionnaire for this specific unit
(Brion, 2004). It is intended to extend this approach to the bigger groups in the future
But other studies will be conducted to take into account the concept of enterprise
group within this system: particularly, what is the relevance of statistics relative to
subjects as financial questions or research and development, and based on legal
units? Should the group be considered as the pertinent unit for these statistics? Work
is in progress within the French CNIS (National Council for Statistical Information,
see Blanc and Desrosières, 2001) to study this question, among meetings gathering
economists, social partners and statisticians.

3. Methodological studies to be conducted

Preparing the new system will need a lot of methodological studies. This paper will
not present all of them, and will focus on two subjects.

3.1 The production of estimates using different sources

Since administrative data are supposed to be exhaustive, some calibration
estimators will be proposed, using classic techniques (Deville and Särndal, 1992).
Studies have to be conducted to decide what are the pertinent variables of the
administrative data to be used to “calibrate” the data coming from the statistical
survey (this survey being made on a sample of enterprises).

The question of data editing is another difficult subject: how to process data coming
from several sources at different times, with strong links existing between them?
Especially, how to control the variables collected during the first part of the year
through the statistical survey, with the “accounting data” arriving later in the fiscal
source? In a first time, it was considered possible to keep on using the data editing
software built for the present “generation” of annual enterprise surveys, by using the
accounting data of the previous year multiplied by a growth rate. But this idea was
abandoned, since it would, however, have meant to review all existing controls,
which would have been a very heavy work.

So, the project aims at using infra-annual administrative data to validate the
information collected within the statistical survey: getting information for accounting
variables (as turnover, number of employees) is very helpful to check the quality of
the survey data. Enterprises send monthly turnover statements to tax authorities (for
the calculation of VAT), and also monthly declarations about employees and wages.
The potential offered by these infra-annual data as “reference information” for the
data editing of the survey will be studied.

Then, the question of “coherence” between the data of the survey, and the
administrative data arriving later, has to be studied: will it be necessary, in some
cases, to “arbitrate” between some conflicting data? The comparison of the present
annual enterprise survey data with fiscal data (or “social” data) will be very
Figure 2: part of the questionnaire of the annual enterprise survey for industrial
sector dedicated to the breakdown of the turnover (survey conducted by the
Sessi, statistical office of ministry of industry)
3.2 A specific variable: the breakdown of the turnover

In the actual French system, this variable is considered as a “cornerstone” of
structural business statistics. Enterprises are asked to fill, within the questionnaire of
the annual enterprise survey, a “table” giving the breakdown of the turnover
according to its different activities.

The extract of the current questionnaire of the “industry” survey ( figure 2) shows the
questions for enterprises belonging to one category (enterprises producing
computers, telephones, or machines as calculators); different lines are proposed for
the breakdown of turnover, and since these enterprises may also have other
activities, as repairing machines, or commercial activities, “blank lines” are also
proposed (lines nc1, nc2,nc3) : the enterprise is asked to describe on these lines of
the questionnaire what these activities are consisting in.

The information given by the enterprise has two main uses:
   - an algorithm computes the value of the principal activity code (APE code),
       referring to the French nomenclature of activities (NAF, derived from the
       European NACE) : so, this value, which is at the moment of the creation of the
       enterprise a declared value, is then, for the surveyed enterprises, a computed
       value, resulting from an “economic analysis” of its activities ; the business
       register is then updated for this variable ;
   - for the national accounts, the information concerning the turnover of each
       activity is very useful, since it is given for “pure” economic branches.

One may note that, within tax data, a value of the principal activity code is also
available, but this latter is just a declared one, and may not be considered as of
sufficient quality. So, this variable has to be asked in the questionnaire of the
statistical survey: it is considered as an essential “contribution” of the survey.

Different kinds of studies will be conducted for this variable:
    - study of the answers to the actual annual enterprise survey (number of
       missing data, number of lines completed for each category of enterprises,
       number of changes between raw and final data, impact on the value of APE
       code) ;
    - study of the questionnaire design (number of items, order of the items, for this
       variable) ;
    - efficiency of selective editing.

Concerning this last point, the data editing of the variable “breakdown of the turnover”
is not easy, since, in fact, it is composed of “n” variables (being the lines proposed on
the questionnaire, each one relative to an activity).

Some reference papers (for example Lawrence and McKenzie, 2000; Hedlin, 2003)
propose methods of selective editing based on scores: the idea is to calculate an
item score (here, relative to every elementary activity) and to combine all item scores
in a global score.
The item score may be, for example, the difference between the raw data and the
value of the same enterprise for the same elementary activity and the previous year,
multiplied by the sampling weight. It is then necessary to standardize item scores,
and to combine them using a distance function, to calculate a global score.

Some simulations are being made to evaluate the impact of using this kind of
selective data editing. Figure 3 gives for example the value of the estimator of
turnover of an economic activity (cars trade), for year 2003, depending on the
number of units for which data editing is applied : the questionnaires are ranked
according to the global score (beginning with big scores) and, on the left part of the
figure, few units are edited (raw data are used for most of enterprises to calculate the
estimate) ; moving towards the right part, more and more units are edited (the raw
data are used only for the “non edited” units, definitive values being used for the
other units), making the estimator converging towards the definitive value (for which
all questionnaires have been controlled). Four different methods (concerning the
distance function to use to combine the item scores) have been tested: the figure
shows that the results are not very different for these four methods, and that
controlling half of the questionnaires would leave the estimator unchanged.

However, one has to notice that these results are only preliminary results: they were
obtained by restricting the file of the survey to units present during two successive
years, and the estimator was not adjusted to take into account this problem. These
studies have then to be continued; they must be conducted on the estimators of all
economic activities set together, and not just on one (figure 3 gives results only for
the “cars trade” sector). They will also have to focus on the production of estimators
of evolutions, more than on estimators of aggregates for a given year.





 49300000                                                                                                                                                                                                                                                              501-methodA1

 49200000                                                                                                                                                                                                                                                              501-methodA3





Figure 3: comparison of the results of four different methods of combining
local scores to produce a global score, for the estimator of the turnover of the
economic branch “cars trade”
Source: data of annual enterprise survey for year 2003

Blanc, M., and A. Desrosières (2001), “France's National Council for Statistical
Information (CNIS): origin, missions, and role for improving quality”, paper presented
at the first international conference on quality in official statistics, Stockholm, Sweden

Brion, Ph. (2001), “Use of administrative data for structural business surveys in
France”, paper presented at the 15th roundtable on business survey frames,
Washington, U.S.A.

Brion, Ph. (2004), “The management of quality in French business surveys”, paper
presented at the second international conference on quality in official statistics,
Mainz, Germany

Depoutot, R. (2005), “Refonte des statistiques annuelles d’entreprises”, powerpoint
presented at the meeting « Inter-formation statistiques d’entreprises 2005 » of the
French « Conseil National de l’Information Statistique », available on

Deville, J.-C. and Särndal,C.-E. (1992), “Calibration estimators in survey sampling”,
Journal of the American Statistical Association, 87, pp. 376-382

Grandjean, J.-P. (1997), “The system of enterprise statistics”, Courrier des
statistiques, English series n°3

Hedlin, D. (2003), “Score functions to reduce business survey editing at the U.K.
office for national statistics”, Journal of official statistics, vol 19, n°2

Lawrence, D. and R. McKenzie (2000), “The general application of significance
editing”, Journal of official statistics, vol 16, n°3

Rivière, P. (1997), “The new annual enterprise surveys in France”, Courrier des
statistiques, English series n°3

Shared By:
Description: Taking into account quality in the french business surveys