FROM SAMPLE SURVEYS TO TOTALLY REGISTER-BASED
HOUSEHOLD INCOME STATISTICS: EXPERIENCES FROM
FINLAND AND NORWAY1
Veli-Matti Törmälehto (Statistics Finland) & Jon Epland (Statistics Norway)
With the launch of the European Union Statistics on Income and Living Conditions (EU-
SILC) in 2003, all European countries now conduct an annual sample survey on income and
living conditions in order to deliver micro data with output-harmonised target variables on
income, demography, labour, material deprivation, housing, and health for Eurostat and the
The complete set of target variables required for EU-SILC cannot be collected without direct
data collection from households. The flexible approach of EU-SILC permits, however,
combined use of register and interview data. This combination method has been used in
Norway and Finland to produce the national income distribution surveys (IDS). For the
income data, EU-SILC is then simply a continuation and adaptation of national IDS practises.
With the available register information on income and demography, a totally register-based
income statistics may also be produced in both countries. The advantages of compiling
income distribution statistics from data which cover the whole population instead of a sample
are obvious, especially for national users with greater demand for accuracy, small domain
analyses, and geographical and longitudinal data than is required for EU-SILC.
The paper presents the experiences from Finland and Norway in producing income inequality
and low income indicators from census-type sources. The main objective is to compare the
key income inequality and low income indicators obtained from the total sources to those
estimated from sample sources (IDS/EU-SILC), and to discuss the validity of the results.
The paper first discusses the extent such a totally register-based data would satisfy the
internationally agreed concepts, e.g. those stated in the Canberra Group report and the EU-
SILC regulation for household income statistics. The construction of appropriate income
Paper prepared for the conference of the European Survey Research Association, Prague, 25-29 June 2007.
sharing unit is particularly relevant, the choices being the survey-defined economic
households versus register-based dwelling units. The validity of income concepts needs
discussion as well, especially in Finland where register income data still has some
shortcomings. The common inequality and low income indicators from the Norwegian and
the Finnish experiments are then presented and benchmarked with the sample-based sources.
2. Defining income and the income sharing unit
Income data for Norway
In Statistics Norway the national income distribution statistics are constructed by linking a
large number of administrative and statistical registers that cover different types of income
data. Table 1 gives a crude overview of all the inputs of the register based income statistics.
The main data providers are the Tax Authorities and the National Insurance Service. The
single most important source is the Tax Return Register. This register gives detailed
information on all kinds of taxable income, e. g. wages and salaries, self-employment income,
income from property and taxable pensions. Another important source is the Tax Register,
where information on personal income taxes and social security contributions are collected.
From the National Insurance Service, all types of tax-free transfers (e.g. family allowance,
support to single parents) are collected as well as different types of pension income (e.g. old
age and disability). In addition to tax registers and social security registers some minor
income items are collected from other administrative registers, for example dwelling support
(The State Housing Bank) and scholarships (The State Educational Loan Fund). It should also
be noted that register data are even used to collect some biographical data for individuals,
such as highest level of completed education, formal marital status, citizenship, immigrant
status and municipality of residence.
Once the different registers have been linked to each individual by the use of the Personal
Identification Number, there exists a comprehensive data source that can provide income data
to all social surveys that Statistics Norway currently conducts, including the national Income
Distribution Survey and the EU-SILC.
Income data available from registers more or less include all types of cash income received by
households, and compares well with external sources like for example the National Accounts
(Epland 2006). However, even income data from registers suffer from some obvious
limitations. One serious weakness is the fact that tax registers do not cover any kind of
payments related to the “shadow economy”, i.e. earnings that illegally have been evaded
taxation. These income items will undoubtedly be difficult to capture in surveys as well.
Another limitation in the data worth mentioning is the non-inclusion of totally private inter-
household transfers, for instance regular cash transfers from parents to children that no longer
belong to the same household.2 In the Norwegian EU-SILC, on the other hand, this income
item is collected from the household interview. It is, however, difficult to assess the total size
of these income components that are not available from registers.
Table 1. Overview of income concepts and corresponding administrative data sources in
the Norwegian IDS.
Income concept Administrative register
Employee income Tax-return register,
The Register for End-of-the-Year Certificates
Self-employment income Tax-return register
Income from property Tax-return register
family related allowances National Insurance Service
housing allowances The State Housing Bank
unemployment benefits The Register for End-of-the-Year Certificates
sickness benefits National Insurance Service
student grants The State Educational Loan Fund
old-age, survivor & disability benefits National Insurance Service
social assistance Kostra (Municipality-State-Reporting)
private pensions Tax-return register
Taxes paid and social contributions Tax registers
Some important inter-household transfers are, however, available from administrative registers for example paid child
Income data for Finland
At Statistics Finland, the system of producing household income distribution statistics is quite
similar to Norway. A number of independent income registers (Table 2) are combined
together with the unique Personal Identification Number, resulting in a statistical income data
file which covers all registered income recipients during a year: residents in private
households, population in institutions, and some non-residents. With the PINs, these income
data can be linked to other statistical systems of Statistics Finland.
Table 2. Overview of income concepts (excluding imputed rent) and corresponding data
sources in the Finnish household income distribution statistics.
Income Register source ( TSID Additional income TSID total income /
Concept and IDS/EU-SILC) data collected for survey estimate, %
Wages and Tax register Tax-free wages and 100 %
salaries salaries from abroad
Self- Tax register Supplementary data on 96 %
employment income from forestry
Income from Tax register Interest received 92 %
Transfers The registers of a) The Inter-household 93 %
received Social Insurance Institution transfers received
of Finland, b) Finnish Some supplementary
Centre for Pensions, c) transfer components
National Research and
Development Center for
Welfare and Health d) Tax
register e) the Education
Fund f) Military Injury
Indemnity register of State
Transfers paid Tax register Source tax on interest 98 %
received (imputed), real
estate tax, alimonies
For income statistics, the population is fixed to resident non-institutionalised population at
end of year, and income data is summed up over domicile codes (dwelling units). The result is
maintenance and alimony to former spouse.
a second statistical income data file on dwelling-unit level. The two TSID files (at person and
dwelling-unit level) allow the production of totally register-based household income statistics
according to standard conventions, e.g. by treating dwelling unit as an approximation of the
income receiving unit and individual as the unit of analysis. The two income files are used to
produce the Total Statistics on Income Distribution (TSID) in December N+1, i.e. about one
year after the end of the income reference period. In addition to standard inequality measures
(Gini, decile shares etc.), also detailed regional data and longitudinal figures on at risk of
poverty are published (e.g. persistent low income rates at NUTS3 level).
The total income file TSID is not used as such in the annual sample survey on household
incomes, the fully integrated national Income Distribution Survey / cross-sectional EU-SILC.3
Income definition is more extensive in the IDS/EU-SILC, for both conceptual and practical
reasons. Regarding the practical reasons, there are two major and some minor omissions in
the register income data in Finland. The two major missing components in the registers are
inter-household transfers (including all alimonies received) and interest received. In the
IDS/EU-SILC, these missing components are collected directly from households by
Inter-household transfers are an important omission because these are known to be
concentrated to sub-groups which have substantial risk of poverty: students and single-
parents. The only available transfer item in registers is paid alimonies as these can be
deducted in taxation. Since alimonies received are not available, alimonies paid are also
dropped from the register-based concept. In the IDS/EU-SILC, several questions on alimonies
and other inter-household transfers are asked. While the validity of these income data can be
questioned to some extent, there are no aggregates to compare with. In the IDS/EU-SILC,
they are concentrated as expected: to young and to single-parents.
Interest received is not available on person registers because it is taxed at source: final tax is
deducted when the amount is paid/received. Neither the recipient nor the payer report the
withheld taxes on an individual (PIN-code) level to the tax authorities. Consequently, the only
solution to gather household data on interest received is direct data collection in interviews.
In the coming years, the use of register-based income data for IDS/EU-SILC purposes will be harmonised with the TSID as
far as possible.
For the purposes of the integrated national Income Distribution Survey/EU-SILC, interest
received is collected by asking total amounts and in income bands. There is sizeable item non-
response as well as substantial underestimation in interviewed interest received in the
IDS/EU-SILC. Relatively modest total amounts in National Accounts, concentration to upper
part of the distribution, and stable interest rates have suggested that the bias that results from
missing interest received in the TSID is not too severe.
Regarding the conceptual differences, the "disposable income" concept which is applied in the
national IDS has included net imputed rent of owner-occupiers in property income since the
beginning of the survey (1977). In EU-SILC, imputed rent will be added from 2007 onwards.
The TSID does not include implicit rents nor are there plans for imputing it for the whole
population data-set, although this in fact might be feasible with the user-cost method. In the
comparisons presented in this paper, only employment-related benefits in kind are included in
income, not imputed rent.
Comparison of TSID aggregates with survey estimates reveal that there is almost 100 percent
match in the total sums of wages and salaries.4 There is some discrepancy in self-employment
income as survey collects supplementary data on timber sales and models the cost of
acquisition of income differently (this will be harmonised in 2007). The total sums of
property income and transfers received are lower in the total source than in the survey, mostly
because interest received and inter-household transfers are not covered in the registers. In case
of transfers, there are also a number of other smaller transfer items which contribute to the
difference. In transfers paid, source tax on interest received, alimonies paid, and real estate
taxes completely explain the difference.
In summary, the differences in income concept should cause some discrepancy between
totally register-based statistics (TSID) and income surveys (IDS/EU-SILC). The extent of the
differences in income definitions will be examined in section 3. The current assumption is
that the TSID income concept is comprehensive enough for monitoring changes in income
inequality, and that the gains from having no sampling error clearly outweigh what may be
Thanks to the calibration model used, sampling error should have only marginal effect on the comparison; total income
sums are used in the survey calibration and the TSID is used as the re-weighting frame.
lost in validity of concepts. For low income or income poverty studies, the situation may be
Defining the income sharing unit in Norway
When all income data from the registers have been linked to each individual the next step is to
construct the income sharing unit, i.e. the unit most relevant to the study of economic well-
being of individuals. According to international guidelines, e.g. the Canberra-report and the
Eurostat Income Measurement Manual, the best statistical unit for income distribution
analysis is the household (Expert Group on Household Income Statistics 2001, Church &
Verma 2001). Up until the present the national Income Distribution Survey in Norway has
collected information on household composition from the household interview. The definition
of households follows international guidelines, for example the household definition used in
the EU-SILC.5 All individuals sharing the same dwelling and having common board or
housekeeping are considered to belong to the same household.6
When the aim is to collect household information for the total population one obviously has to
settle for a definition that is less accurate than the one based on data collected from a personal
interview. In general a household definition based on register data will be restricted to the
household dwelling concept, i.e. there will be no information available on the whether or not
the household members actually have common board or housekeeping.7 Yet, even
constructing the household dwelling concept from registers is a challenging task. Two groups
that pose a particular problem are the institutionalised and students.
When household surveys are conducted it is common practice to omit the institutional
population from the sampling frame. In practice this is done during the data collection process
Accoring to the SILC guidelines "Private household" means a person living alone or a group of people who live together in
the same private dwelling and share expenditures, including the joint provision of the essentials of living (EU-SILC
065/03: Description of Target Variables).
Note, however, that even non-response households are included in the Norwegian IDS. For those households that fail to
respond to the household interview (usually 25 – 30 %), missing information on household composition is replaced by
register information on „family‟ composition from population registers.
With respect to Norway there is reason to believe that the difference between the housekeeping concept and the dwelling
concept will be very small indeed. According the quality assessment of the Census 2001 the difference between the two
concepts was less than 1 per cent (Hurlen Foss and Solheim 2006).
(the household interview). In order to have a household definition comparable to the survey
one needs to identify the institutional population in the registers. From the National Insurance
Service there exist a register of the residents of old-age homes and long-stay hospitals. Other
people living in institutions (e.g. child welfare institutions, prisons etc.) are identified by
combining information on addresses, number of residents etc. The most visible effect of
omitting the institutional population from the household register is a reduction in the number
of older people.
The treatment of students is a far more challenging task. In the guidelines of the Income
Distribution Survey and the Norwegian EU-SILC a student is considered a household member
only if he or she spends the minimum of 4 days per week at the address of the (parent)
household. In Norway it is, however, rather rare for university students to reside with their
parents. According to a recent survey only 7 per cent of the students reported that they still
lived at home (Løwe 2007). In register data based on the Central Population Register, by
contrast, most students are still „formally‟ registered as residents at their parents‟ household.
From a household income perspective this may cause a problem. The main livelihood for
Norwegian students is either grants or student loans (63 %) or income from own work (22 %).
Only 2 per cent of the students report that support from their parents is their main income
source (Løwe 2007). Thus it would be wrong to assume that students are part of their parents‟
household and that they belong to the same income sharing unit.8
In order to have a household definition that is closer to the one used in surveys one therefore
needs to identify students that are registered as part of their parents‟ household, and thereupon
“remove” them to another household. Several strategies have been used in order to identify
these students. The main administrative source used to identify students that de facto live
away from their parents is the State Educational Loan Fund. One of the criteria for being
eligible to receive a student grant (administered by the Loan Fund) is that you actually live by
your own away from your parents. Students receiving this type of grant are therefore removed
from their parents‟ household. However, not all students are entitled to a student grant. For
example, “working” students with a substantial employee income are disqualified from
receiving a grant. In order to identify these working students register information on the
Despite the fact that support from parents are not considered the main income source for students, such financial support
may of course be an important additional source of livelihood, particularly for the youngest students.
employer (location of the work place) is used. If the place of work is situated at a distance far
away from the parental dwelling it is presumed that the student de facto no longer belong to
the parents‟ household.
Even other methods of identifying more de facto households are applied. One group that is
particularly difficult to identify in registers are couples living in a consensual union, but
without having common children. In Norway as well as in many other countries this is a
common living arrangement, particularly among the young.9 There is reason to believe that
many adult children that according to the population register live with their parents actually
live in a separate household as part of a consensual union. When a cohabiting couple is
having their first common child they immediately are recorded as a couple household in the
population registers. This information can be used to make presumption about previous
household composition. When a couple has a common child, there is a strong reason to
presume that they were in fact already living as a couple even before the birth of the child. By
using this information an additional number of de facto household can be identified.
Even information from tax records can be used to identify couples living together without
being married. Couples that are co-owners of a dwelling or who share a mortgage often report
this piece of information to the Tax Authorities so that both partners may benefit from a tax
Comparing household distribution
In table 3 we compare register data and survey data in respect to the distribution of different
household types. The formal household definition is a household definition based on legal
residence addresses, e. g. students are in most cases registered as part of their parents‟
household. After performing several adjustments to the formal definition – some of them
described above - we end up with what we call the de facto household definition. Both the
formal and the de facto household definitions are then compared to the survey estimates,
In Norway 50 % of all firstborns have parents living in a consensual union
which may in a way be considered a kind of benchmark. The survey estimates are from the
Income Distribution Survey of 2004.10
As the table shows, there is in general good overlap between the distribution of households
types based on register data and the survey estimates. It is, however, apparent that the
transformation from the „formal‟ to the „de facto‟ household definition improves
comparability to the survey estimates. This is particularly noticeable in respect to the
distribution of young singles and couples with adult children. When students and other young
people have been removed from their „formal‟ parental household to their „de facto‟
household, we find a substantial rise in the proportion of single person household under 30
years of age. At the same time there is a clear reduction in the proportion of couples with
adult children, many of which now turn up in the category „couples without children, 45-66
years‟. Despite the positive effect of moving students away from the parent household, the
proportion of both couples with adult children and couples without children in the age group
65-66 still are significantly different from the survey estimates. This is a clear indication that
even more effort should be made in identifying the real household situation of students and
other young people.11
It should be noted that the total number of households increases when many households are
split during the process of creating a more de facto household definition. The total number of
households is now closer to the survey estimates.
The sample size for the Income Distribution Surveys 2004 is roughly 13,000 households.
An alternative method of measuring the difference between a household definition based on interview and one based on
registers is to show the overlap between the two definitions restricted to a sample survey where both definitions are
available, e.g. similar to what has been done in Finland (Table 4). When comparing the household composition reported in
the Norwegian EU-SILC (N= 6000) with the household definition derived from registers we find a total overlap of 84 %,
i.e. somewhat weaker than in Finland (88%). However, the deviations are mainly restricted to the household types where
we would expect such deviations. The register definition still has too many adult children living with their parents while,
according to the SILC interview, these adult children themselves report that they are either singles or cohabiting. Some of
the deviation between the sources may also be explained by difference in reference date. The register data use 31
December at reference date while the Norwegian SILC record the household situation at the date of the interview which
usually take place some months later, in February-March (the fieldwork ending in June).
Table 3. The distribution of households by household types in Norway. 2004. Register
data and survey estimates. Per cent
’Formal' ’de facto'
household household Survey 95% confidence interval
definition definition Estimates Lowest Highest
All households 100 100 100
Singles < 30 years 7.0 10.7 9.9 9.1 10.7
Singles 30-44 years 8.6 8.0 8.5 7.7 9.3
Singles 45-66 years 10.7 10.5 10.1 9.3 10.9
Singles 67+ years 11.9 11.4 12.3 11.3 13.3
Couples without children < 30 years 1.4 1.7 2.4 2.0 2.8
Couples without children 30-44 years 2.2 2.3 2.5 2.1 2.9
Couples without children 45-66 years 9.6 10.8 12.0 11.0 13.0
Couples without children 67+ years 7.8 7.5 7.7 6.9 8.5
Couples with children 0-5 years 11.0 10.8 10.5 9.7 11.3
Couples with children 6-17 years 11.9 11.6 11.3 10.5 12.1
Couples with children 18+ years 6.3 4.5 3.6 3.0 4.2
Single with children 0-5 years 1.6 1.3 1.8 1.4 2.2
Single with children 6-17 years 4.1 3.8 3.8 3.2 4.4
Single with children 18+ years 2.7 2.1 1.7 1.3 2.1
Other household types 3.1 2.8 2.1 1.7 2.5
Total number of households (1 000) 2 010 2 085 2 135
The Income Distribution Survey 2004 (N= 13 000).
Age of the oldest person in the household
Age of youngest child in the household
The income sharing unit in Finland
The basic setting is again similar to that in Norway. The concept of dwelling unit is the only
option available for totally register-based statistics. In household sample surveys (IDS/EU-
SILC, HBS etc.), standardised housekeeping concept is sought for. Interviews must be used to
verify assumption of using incomes together in order to create household as an economic unit.
The construction of the income sharing unit for the TSID can be based on an established
register on dwelling units. The TSID concept is pure dwelling-unit concept from registers as it
is used in e.g. population census, no adjustments are done. While the income concept may be
more problematic in Finland than in Norway, the procedure with income sharing unit is more
straightforward in Finland.
All individuals who are registered in the same address have the same domicile code and
constitute a dwelling unit. The domicile code is one of the key identification codes in the
register-based systems alongside PIN-codes and business codes (Statistics Finland, 2004).
The registration information comes from the Population Information System of the Population
Register Centre. The Population Information System is kept up and continuously updated by
Population Register Centre and population register districts. The system is maintained by
statutory notifications of changes. The inhabitants are themselves responsible only of
notification of changes of residence.
The Population Information System includes information on Finnish citizens and foreigners
permanently resident in Finland, on persons living in households, institutions, persons living
temporarily abroad, and also homeless persons. Persons living in institutions and collective
households do not belong to the target population of household income statistics, but they are
included in the PIS household population and can be identified and excluded when
delineating target population for household income statistics (for both the TSID and survey
Statistics Finland uses the Population Information system to maintain its own statistical
register on dwelling units which is needed to produce the annual statistics on household-
dwelling units and housing conditions. The total statistics on income distribution (TSID) may
then link dwelling units from this source to the personal income register file using the
Personal Identification Number. As already explained, summing the personal income data
over the domicile codes results in the TSID dwelling-unit file. Finally, background variables
can then merged from other statistical systems; e.g. employment statistics or the register of
completed education and degrees.
In the IDS/EU-SILC, co-residence criteria is supplemented and in some cases over-ruled by
using incomes together. The operational basis for the housekeeping concept is the registered
household-dwelling unit: interviewer has a list of members living in the same address (having
the same domicile code) with the selected respondent, i.e. the interviewer knows the dwelling
unit, and then excludes persons from this list and adds new-ones if the assumption about using
incomes together is validated or if errors are found in the register-based list of members. The
institutionalised population has been excluded already from the sampling frame; a very
limited number of cases come up in the interviews and are treated as over-coverage.
One would expect deviations in certain types of households between the household-dwelling
unit and the housekeeping unit definitions. Sub-tenants, tenants, boarders, domestic staff or au
pairs residing in the dwelling, as well as students living together when they are not married or
cohabiting12 may be registered in the same address and thus be in the same dwelling unit.
Interviewers are instructed to treat them as separate economic households. Students who
move to live on their own must notify a change of address; they should be registered as own
household-dwelling units. Before legislation was changed in 1994, similar problems as in
Norway with students living with their parents (registered to childhood home but living
elsewhere) was acute in Finland as well.
Table 4 reports the difference between number of dwelling unit members and household
members of the selected respondents of the first rotation of the IDS 2005/EU-SILC 2006. In
87.5 % of the cases, the two definitions coincided. Dwelling unit was bigger than household
in 10.2 % and smaller in 1.4 % of the cases. Classified by selected respondents‟ socio-
economic status, we find that the overlap is weakest with students, as expected, but also with
The differences seem surprisingly large. Apart from differences in definitions, both registered
dwelling-unit and surveyed household surely contain measurement error. In principle, there is
no difference in the reference times; the composition of household at end of year is asked in
the interviews (January-May), and it also is the reference point for dwelling units. In the
survey, memory problems and misunderstandings in the interviewer/respondent interaction
may take place; while for the registers, despite being based on statutory notifications, not all
changes are duly reported. It is quite difficult to quantify the extent of errors in both sources.
A typical case is one with three students living in a three room student flat with a shared kitchen. There would be one
dwelling unit but three households in this case.
It must be noted that the Table 4 gives un-weighted distributions. Because the frame (of
persons) is sorted by domicile codes, the inclusion probabilities are greater for larger
dwelling-units and consequently their sampling weights are smaller. The weighted
distributions of households and dwelling units are therefore closer than un-weighted: the
overlap using sampling weights is around 95 %.
Table 4. Household-dwelling unit members minus household members. Finland.
IDS/EU-SILC selected respondents (n=5564). Unweighted percentage distribution.
3 -2 -1 0 1 2 3 4-
Farmers 0.3 0.0 3.2 77.8 13.2 4.5 0.3 0.6 100
Self-employed 0.2 0.7 1.8 87.9 6.6 1.1 0.6 1.1 100
Upper-level salaried employees 0.1 0.3 1.0 90.7 5.6 1.4 0.4 0.4 100
Lower-level salaried employees 0.0 0.5 1.5 87.6 5.3 3.1 1.6 0.4 100
Manual workers 0.0 0.3 0.9 87.3 7.9 1.6 1.1 0.9 100
Students 0.0 0.0 1.6 76.6 9.2 5.4 4.0 2.9 100
Pensioners 0.0 0.0 0.8 91.1 4.9 1.4 0.7 1.0 100
Unemployed 0.3 0.0 1.2 91.7 3.1 1.2 0.6 1.9 100
Others 0.7 0.0 0.7 88.5 5.4 2.7 1.4 0.7 100
All 0.1 0.2 1.3 87.5 6.4 2.3 1.2 1.0 100
Source: IDS 2005/EU-SILC 2006, first rotation wave.
The strategy for the register-based income statistics in Finland is to take the register definition
of dwelling-unit as given and to assume that the differences between household concepts are
not important for income statistics. A study on differences between the two concepts and their
effects on income inequality measures was conducted by Ruotsalainen (2004). The study
concerned year 1999 and the results were similar to those reported in Table 4: in
Ruotsalainen's study 85.5 percent of the cases the two definitions yielded the same results.
The conclusion of the study was that when equivalent incomes are used, household dwelling-
unit may be used in income distribution statistics. For some subgroups, notably students, non-
equivalent mean incomes were significantly different depending on the household definition.
3. Comparing income distribution
The aim of this section is to compare the income distribution from the totally register-based
household income statistics with that of the Income Distribution Surveys in the two countries.
The main objective is to compare the key income inequality and low income indicators
obtained from the two sources, and to discuss the validity of the results.
For both countries and both data sources, standard methodology of using modified OECD-
scale, household/dwelling-unit as the income recipient unit, and individual as the unit of
analysis is applied. In other words, indicators are calculated from the distribution of
equivalent disposable income (after-tax) of persons. In the Norwegian case, both sources have
the same income definition, and the differences are explained by differences in household
concepts and by sampling error. In the Finnish case, the operational income definitions are
different although theoretically both attempt to measure cash disposable income of
households. Therefore income and household definitions as well as sampling error contribute
to the observed differences.
For Norway the income reference year is 2004. The survey data is based on the annual
Income Distribution Survey. In 2004 the IDS consisted of 13,000 households or 34,000
individuals. The IDS is made up of two sub-samples, the EU-SILC and the national Level of
Living Survey. The response rates of these surveys were 70 – 73 per cent. The IDS, however,
includes even non-response households. The only information the IDS collects from these
surveys is household composition. For interviewees that declined to participate in the survey,
missing data on self-reported household situation was replaced by register data on „family‟
composition from the Central Population Register. All income data are collected from
registers. In addition to the traditional Horwitz-Thompson method, the weights in the IDS
were adjusted by using a method of calibration. This method, in brief, constructs new weights
by the use of regression technique in order to produce estimations that for some variables are
identical to known totals (from tax registers).
For Finland, data from income reference period 2005 is used. Both household and dwelling-
unit compositions are fixed to the end of year 2005. The sample survey data is the Income
Distribution Survey 2005 (EU-SILC 2006) and has 10,868 households and 28,039 persons13.
Response rate in the survey was 75 percent. Demographic as well as income data was used in
the calibration model to reduce non-response bias. Response rates have recently fallen to 75
percent in the 2005 and 2006 surveys while in the surveys 1997-2002 they still were around
80 percent or better.
The IDS is completely integrated with EU-SILC which means that the only difference between the IDS 2005 and EU-SILC
2006 is different income concept (it is more extensive in the IDS than currently in EU-SILC).
Income inequality indicators
Figures 1 and 2 give a broad overview of the income distribution in the two countries based
on the share of total income received by different proportions of the population, e. g. deciles.
From the graphs it is apparent that there are small differences between the sample estimates
and the total register-based source in respect to income distribution.
In Norway, register data shows a slightly more unequal distribution compared to the sample
survey, mainly at the top of the distribution. It is, however, only in respect the share of total
income received by the top 5% and top 1% of the population that there are noticeable
differences between the sources. Most likely, the explanation is that the sample was not able
to capture the „extremely„ wealthy households at the very top of the distribution.
Figure 1. Share of equivalent income (deciles and top 5% and 1%) in Norway. Register
data and survey data
1 2 3 4 5 6 7 8 9 10 Top Top
Income Distribution Survey Register data
Figure 2. Share of equivalent income (deciles and top 5 % and 1 %) in Finland. Total
Statistics on Income Distribution 2005 (TSID) and Income Distribution Survey 2005
1 2 3 4 5 6 7 8 9 10 Top 5 Top 1
Income Distribution Survey TSID (total)
In Finland as well, the total source yields slightly more unequal distribution than the sample-
based IDS. Less income is received by the first decile group and more by high income groups
(deciles 9-10) while in the middle of the distribution (deciles 2-8) the shares are very close to
each other (the differences there are mostly only 0.1 percentage points). The difference in the
tenth decile group is explained by the lower share of the sample survey in the top 1 percent.
The lower share in the 9th decile is somewhat harder to explain. Generally, it seems clear that
the results are very well in accordance with each other. Furthermore, the sample estimates are
surprisingly close to the total source also among the very well-off of the society.
Table 5 gives more information on income distribution. The Gini coefficient and the S80/S20
ratio are frequently used by Eurostat as indicators of inequality and are included in the Laeken
indicators of social inclusion. When comparing these summary measures of inequality for
Norway, the conclusion is that there are no significant differences in the results from the two
sources. This is also true for the more sensitive S90/S10 ratio, although register data once
again tend to report a slightly more unequal distribution than the survey.
In Finland, the total source TSID gives higher values than survey point estimates but in case
of Gini and quintile share ratio S80/S20 the TSID value is just within the survey confidence
limit. The average incomes are lower in the total source, as they should be because of the
differences in the income definition, and maybe because of selective unit non-response in the
survey which is not corrected by re-weighting. It must be noted that the estimated confidence
band in Table 5 may be too wide because calibration to margins is not taken into account in
standard error calculations.
Table 5. Indicators of inequality
Register data Survey data 95% confidence interval
(total) (sample) Lowest Highest
Gini 0.283 0.278 0.260 0.296
S90/S10 7.0 6.7 6.1 7.5
S80/S20 4.1 3.9 3.7 4.2
Mean income 250 900 248 600 242 700 254 500
Median income 219 400 219 200 217 600 220 800
Gini 0.282 0.270 0.259 0.284
S90/S10 6.4 5.7 5.0 6.3
S80/S20 4.1 3.8 3.5 4.1
Mean income 20343 21002 20588 21416
Median income 17977 18713 18572 18855
At risk of poverty indicators
We now change focus to the bottom of the income distribution. In table 6 we present a wide
range of low-income thresholds stretching from an income below 30 % of median equivalent
income up to and including below 80 % of the median.
Once more the overall picture for Norway is that survey data and register data tend to yield
more or less the same results. It is only for the rather strict definition of an equivalent income
below 30 per cent of the median that differences between the sources are significant. The
survey estimates are, however, based on very few observations at this low threshold. For the
at-risk-of poverty threshold recommended by Eurostat, i.e. 60 % of median income, there
seems to be small differences between the estimate from the Income Distribution Survey and
the result from the totally register-based income statistics.
In Finland, the register-based low income indicators are higher than survey estimates. Up to
60 % of median the difference is statistically significant in the sense that register-based
figures are higher than the estimated upper confidence limit from the survey. In relative terms
the ratio of the poverty rates is higher at the low end; in absolute terms the difference at 50 %
of median is 1.4 percentage points and at 60 % of median 1.3 percentage points.
Table 6. The proportion of individuals below different low-income thresholds
Register data Survey data 95% confidence interval
(total) (sample) Lowest Highest
30% of median 2.2 2.0 1.8 2.1
40% of median 3.5 3.4 3.2 3.7
50% of median 6.1 6.2 5.9 6.5
60% of median 11.0 11.3 10.9 11.7
70% of median 18.4 18.4 17.9 18.9
80% of median 27.7 27.9 27.4 28.5
30% of median 1.5 0.7 0.6 0.9
40% of median 3.1 2.0 1.7 2.2
50% of median 7.0 5.6 5.1 6.1
60% of median 14.1 12.8 12.0 13.5
70% of median 23.0 22.1 21.2 23.0
80% of median 31.9 31.4 30.4 32.4
In table 7 and 8 we compare the rate of at-risk-of-poverty for different subgroups of the
population. The low-income threshold is now restricted to an equivalent income of less than
60 % of the median income. In respect to age groups (table 7) there seem to be consistence
between the two sources in the Norwegian data for all age groups except for the elderly (65
years or older). For those 65 and older the totally register-based source report a poverty rate
that is significantly lower than the survey. One reason for the difference for this particular age
group may be that the register data exclude a larger number of the institutional population
than the survey, due to difference in definition. In general the institutionalized have lower
personal income than the elderly living in private households.
In Finland, the total source TSID gives higher rates in younger age groups, while for those
aged 50-64 and the elderly the difference is not statistically significant. The flow of inter-
household transfers mostly goes from the older age groups to the younger (students, single-
parents) in Finland, so this may partly explain the result. The difference between the survey
point estimate and the TSID result for children (0-15 years) is 3.4 percentage points which is
a somewhat worrying result.
Table 7. The proportion of individuals at-risk of poverty, by age groups
Register data Survey data 95% confidence interval
(total) (sample) Lowest Highest
0-15 years 8.2 8.4 7.7 9.1
16-24 years 25.4 25.5 24.0 27.0
25-49 years 8.6 8.6 8.0 9.2
50-64 years 5.1 4.9 4.3 5.5
65+ 17.9 19.8 18.4 21.2
All 11.0 11.3 10.9 11.7
0-15 years 12.8 9.4 7.9 10.9
16-24 years 25.3 22.3 19.9 24.6
25-49 years 10.8 9.1 8.2 10.0
50-64 years 9.9 9.5 8.3 10.6
65+ 20.7 22.3 20.2 24.4
All 14.1 12.8 12.0 13.5
Table 8. At-risk-of-poverty rate for different household types
Register data Survey data 95% confidence interval
(total) (sample) Lowest Highest
Under 30 years 59.3 54.8 51.2 58.4
30-44 years 16.5 16.4 13.4 19.4
45-66 years 14.8 13.3 10.7 15.9
67 years or older 38.3 40.6 37.2 44.0
Couples without children, oldest person
Under 30 years 12.7 19.6 16.8 22.4
30-44 years 4.8 4.7 3.1 6.3
45-66 years 2.6 2.8 2.2 3.3
67 years or older 5.3 6.3 5.3 7.3
Couples with children, youngest child:
Under 5 years 7.4 7.1 6.5 7.7
6-17 years 4.1 4.2 3.7 4.6
18 years or older 2.0 2.6 2.0 3.2
Single parents, youngest child
Under 5 years 21.1 22.6 18.4 26.8
6-17 years 12.9 13.4 11.2 15.6
18 years or older 8.1 6.1 4.1 8.1
All 11.0 11.3 10.9 11.7
Under 25 years 60.9 61.8 54.9 68.6
25-44 years 22.9 20.2 16.4 23.9
45-64 years 24.7 25.6 21.9 29.3
65 + 38.7 43.1 38.8 47.5
Couples without children, oldest person
Under 25 years 36.0 22.7 15.7 29.6
25-44 years 7.5 4.5 2.7 6.3
45-64 years 5.5 5.2 4.0 6.5
65 + 8.9 10.5 8.2 12.8
Couples with children, youngest child
0-4 years 10.0 11.1 8.5 13.7
5 years or older 6.2 5.5 4.0 7.0
Single parents 35.3 16.4 11.8 20.1
Other household types 7.3 6.6 5.0 8.2
All 14.1 12.8 12.0 13.5
Table 8 presents the risk-of-poverty rate for various types of households. The overall picture
is once more that the totally register-based source and survey data report more or less the
same incidence of poverty within the households. For Norway the only household types
where there is a significant difference in the poverty rate between the sources are young
singles and young couples without children (oldest person is younger than 30 years old). The
table shows that for the young singles the register-based income statistics show an at-risk-of
poverty rate that is significantly higher than the survey, while for the young couples the
situation is the other way round. However, much of the difference can probably be explained
by the treatment of students in the register-based income statistics. The extremely high
proportion of young singles below the poverty line (almost 60 % according to the register
data) can only be explained by the high number of students belonging to this household type.
When students are „removed‟ from their parents‟ household in the process of constructing a
more „actual‟ household situation, the majority of them will be classified as young singles.
The main income source for students is student loans. Because the income definition does not
include such loans as part of household income, many of them end up as „poor‟. This is also
the case for the household definition based on the household interview. However, from survey
data it is apparent that not all students live alone (although the majority of them do). Some of
them report in the interview that they live together with a partner, e.g. another student. This
piece of information is not (yet) available in the register-based income statistics. As a
consequence (cohabiting) students constitute a large part of the young couples in the survey
data, but very few according to the registers-based data. This partly explains why survey data
reports a significantly higher poverty rate for this group, compared to the totally register-
based income statistics.14
In the Finnish case there are significant differences in two groups: young childless couples
and single parents. The latter is an important population group for poverty analysis and
therefore the result definitely needs attention. Simple counterfactual reveals that the lack of
alimonies in the register data explains large part of the difference. When alimonies are
removed from the IDS survey data, poverty rate for single parents increases by 13.5
percentage points, - from 16.4 per cent to almost 30 per cent. The remaining 6 percentage
point difference may be due to sampling error, especially given the small sample size for
single parents, and due to difference in the applied household concept. Poverty rates of young
In national income statistics it is common practice to exclude students from analyses of at-risk-of poverty.
household increase as well when inter-household transfers are removed from income: singles
under 25 years would have five percentage points and couples under 25 years 3.4 percentage
points higher poverty rate. Other groups are mostly unaffected.
Table 9 attempts to summarise the effects of the different definitions on indicators for
Finland. The first column gives the sample estimates of low income and income inequality,
i.e. the IDS estimate of 12.8 percent at risk of poverty rate and Gini-coefficient of 0.270. If,
for the same persons in the sample, we would use the TSID income concept, the at-risk-of
poverty rate would increase to 13.6 percent and Gini to 0.275 (column two). As already
explained, these increases are almost completely due to the lack of inter-household transfer in
the totally register-based income definition.
For the third column, register-based definitions are applied to IDS sample data by first
gathering dwelling-unit members (instead of household members) around the selected
respondents of the sample and then summing up the register-defined incomes for these
persons. Dividing the income by consumption units (of the dwelling unit) gives register-based
equivalent incomes of the selected respondents. These are then assigned to all persons in the
original sample. Original population-calibrated sampling weights are finally used to estimate
the distribution of equivalent incomes of persons according to register-based concepts. The
change from the second column to the third column is taken to indicate the effect of different
household concepts on indicators. If this method is correct, then the data here indicates that
low income rate would increase further to 14.0 percent but Gini would decrease slightly.
Finally, the fourth column gives the indicators from the total source. The difference between
the third and the fourth column is here interpreted as sampling error which seems to be
negligible for at risk of poverty rate (this result is sensitive to the threshold, however). To
summarise, of the 1.3 percentage point difference in at risk of poverty rates, about 0.8
percentage points would be due to different income definition, 0.4 percentage points due to
different household definitions, and the sample point estimates seem to be quite close to true
parameter values. For the Gini, the difference is not solely explained by different definitions
but now the difference is within the survey confidence limits.
Table 9. Assessment of the effect of income and household definitions on low income and
inequality indicators, Finland 2005.
Definition (1) (2) (3) (4)
- Data source IDS sample IDS sample IDS sample TSID (total)
- Income concept IDS (survey) TSID (registers) TSID (registers) TSID (registers)
- Income receiving unit Household Household Dwelling-unit Dwelling-unit
n (persons) 28,039 28,039 28,039 5,178,562
N (persons) 5,175,503 5,175,503 5,175,503 5,178,562
At risk of poverty, % 12.8 13.6 14.0 14.1
Median poverty gap 14.5 15.1 16.2 16.4
Gini-coefficient 0.270 0.275 0.274 0.282
S80/S20 3.8 3.9 4.0 4.1
Median income 18 719 18 384 18 291 17 977
Low income threshold 11 232 11 030 10 974 10 786
The main purpose of the paper has been to give a description of the build-up of a totally
register-based income statistics in Finland and Norway. In Norway, the total source will now
substitute the national Income Distribution Survey. In Finland, the total source will be used as
the new national source for income distribution data, particularly for regional and longitudinal
purposes. The old Income Distribution Survey still stays in production transformed to serve as
the platform for the Finnish EU-SILC. The official low income (at risk of poverty) indicators
on a national level are still the survey estimates from IDS/EU-SILC. Regional indicators are
not published from the IDS/EU-SILC but from the total source.
In Norway the income definition in the Income Distribution Survey and in the totally register-
based income statistics is identical. Any differences in respect to distribution are thus
explained by differences in defining the income sharing unit (household) or by sampling
errors. The results show that the overall consistency between the two sources is good,
particularly in respect to the main indicators of income distribution and at-risk-of poverty. It
is only at the extreme tails of the income distribution or for particular sub-groups of the
population (e.g. students) that there are significant differences between the sources.
In the Finnish case, the differences between the results from the two sources are explained by
differences in income concept, differences in housekeeping household vs. dwelling-unit
concepts, and by sampling error. While the results on income dispersion from the total source
and the sample survey are generally well in line with each other, for poverty analysis data on
inter-household transfers seem to be vital, and these are not currently available in the total
register source TSID. The validity of the TSID income concept may not be satisfactory for
groups such as single parents or young households when compared to the IDS/EU-SILC.
Since both countries already have used register income data in the surveys to minimise
measurement errors, item non-response, and to reduce respondent burden, the main advantage
of the new source is absence of sampling error, especially avoiding bias resulting from
selective unit non-response. The new totally register-based source may be used in calibration
models as the re-weighting frame for the EU-SILC surveys. For example, distribution of
equivalent income or number of persons at risk of poverty could be used in the calibration to
When data for the whole population is available, detailed small domain analyses, especially in
regional and longitudinal dimensions, become feasible. As an example, Statistics Finland now
publishes persistent low income indicators from the TSID on NUTS3 -level, and distributes in
a database inequality and low income indicators for each of the around 440 municipalities in
Finland. New indicators, such as income mobility indices, could be produced regularly by
extending the income reference period, currently up to ten years (1995-2005).
Despite the obvious advantages of such a totally register-based income statistics, there are still
room for further improvement. In Norway there are still some dwellings that cannot be
identified by the unique domicile number. This is particularly a problem in the capital region
where roughly 20 % of the population is without a unique address code (2004). This
proportion is, however, on the decline. A greater effort will also be made to identify more „de
facto‟ households, particularly among the young. Information from the Postal Service may be
a new data source available to achieve this goal.
Improvements in the Finnish TSID concern income data: the dwelling-unit is an accepted
concept and it is commonly used at register-based systems at Statistics Finland, including
population census. For validity of low income indicators, register estimation or imputation of
alimonies received should be the first priority in the future development of the Total Statistics
on Income Distribution in Finland.
Church, J. & V. Verma (2001): Income Measurement Manual. Working group „Statistics on
Income, Poverty & Social Exclusion‟, Luxembourg, 24-25 April 2001.
Epland, J. (2006): „Challenges in Income Comparability. Experiences from the use of
Register Data in the Norwegian EU-SILC‟. Paper prepared for the VII International Meeting
on Quantitative Methods of Applied Sciences, University of Siena, 11-13 September 2006.
Eurostat (2003): Description of Target Variables: Cross-sectional and longitudinal. EU-SILC
065/03: Version 2003, Luxembourg.
Expert Group on Household Income Statistics (2001): Final Report and Recommendations.
Hurlen Foss, A. og L. Solheim (2006): Kvaliteten i Folke- og boligtellingen 2001 (Quality
assessment of Census 2001), Rapporter nr. 31, Statistisk sentralbyrå.
Løwe, T. (2007): ‟Barn av høyt utdannede får mest støtte‟ (Children of highly educated
parents get most support), Samfunnsspeilet nr. 1, Statistisk sentralbyrå.
Ruotsalainen, P. (2004): "Kotitalous haastattelusta vai rekisteristä?"(Household from
interviews or from registers?), unpublished report (in Finnish only), October 2004, Statistics
Statistics Finland (2004): Use of Registers and Administrative Data Sources for Statistical
Purposes: Best Practises of Statistics Finland. Statistics Finland, Helsinki.
Statistics Norway (2007): Population statistics. Births 2006: Highest fertility since 1991.
Published 19 April 2007, www.ssb.no/english/subjects/02/02/10/fodte_en/.