IPUMS-Europe, 2004-2008 Restricted-access, anonymized microdata by ihd49167

VIEWS: 15 PAGES: 31

									       IPUMS-Europe, 2004-2008:
Restricted-access, anonymized microdata
    for scientific and policy research
                    ***
 Robert McCaa, University of Minnesota Population Center
Nikolai Botev, UN-ECE Population Activities Unit (Geneva)
    www.hist.umn.edu/~rmccaa/ipums-europe


                  hist.umn.edu/~rmccaa/ipums-europe    1
                        Outline

» PAU 1990s project
» IPUMS-International means:
  Restricted access, anonymized microdata
» IPUMS-Europe: sister project (Latin America),
  connections with PAU
» IPUMS-International partners
» Principles: integration, dissemination
                 hist.umn.edu/~rmccaa/ipums-europe   2
         Population Activities Unit
 1990 census round harmonization project:
             focused on Aging
» Begun 1992: PAU/UNECE, UNFPA, US-NIA
» Microdata acquired for 15 countries
» Harmonized
  26 core person variables plus 13 optional;
  10 dwelling/household variables, 18 optional
» Extensive metadata:
  questionnaires, nomenclatures, classifications
» Progressive over-sampling with age
                   hist.umn.edu/~rmccaa/ipums-europe   3
        Population Activities Unit
1990 census round harmonization project:
            focused on Aging




             hist.umn.edu/~rmccaa/ipums-europe   4
         Population Activities Unit,
  1990 census round harmonization project:
              focused on Aging
» General release:
  samples for 8 countries
» Samples for the other 7 countries available under more
  restrictive conditions
» Dissemination: CDs or other media; no online access
» Sustainability: ICPSR (U. of Michigan)

                     hist.umn.edu/~rmccaa/ipums-europe   5
             Problems with PAU effort:

»   Sample design too complex
»   Need for time series
»   Lacked legal authority
»   Inadequate funding
»   Insufficient computing infrastructure and human
    resources
» Antiquated distribution system
» Sustainability problematic
                     hist.umn.edu/~rmccaa/ipums-europe   6
          Population Activities Unit:
      samples of older persons based on
         the 2000-round of censuses
» Tightly integrated with IPUMS-Europe
» Based on the same coding schemes, nomenclatures,
  and classifications
» Utilize the same anonymization techniques and
  approaches; same data access modalities
» Ensure sustainability through the integration with
  IPUMS-Europe: ICPSR & European Data Centers
                    hist.umn.edu/~rmccaa/ipums-europe   7
          Population Activities Unit:
      samples of older persons based on
         the 2000-round of censuses
» Sample design:
 - sample of households not included in the core IPUMS-
 Europe sample, where at least one member is over age 60
 (recommended sampling density: 5 percent);
 - geography to match that of core samples;
» Advantages:
 - more straightforward than the design used for 1990s;
 - in line with the practice of national statistical offices
 (e.g. PUMS-A and PUMS-O of the US Census Bureau);
                    hist.umn.edu/~rmccaa/ipums-europe   8
       From IPUMS-USA (1989-)
         & PAU-Aging (1992-)
 to IPUMS-International (1999-) and beyond
  to IPUMS-International (1999-), Latin
America (2003-), Europe (2004?) and beyond




              hist.umn.edu/~rmccaa/ipums-europe   9
       IPUMS-International means
 Restricted access, Anonymized microdata

» Should be “IRAMS” not IPUMS
» Who are IPUMS-International users?
  Those who:
  » Have a demonstrated need for the data (project
   abstract)
  » Agree to abide by the restrictions of use
  » Place themselves under the jurisdiction of
   Institutional Review Boards
                   hist.umn.edu/~rmccaa/ipums-europe   10
   A
        Using the most demanding standards:
I N
                       legal & administrative
P O
   N
U Y                                        as well as technical:
M M    » Suppress geographical detail (NUTS2/3?)
Si I   » Corrupt the data! (just a little…)
   Z   » Blur/aggregate sensitive codes
   E   » Convert dates to ages (blur key vars.)
       » Swap cases between districts! (just a few…)
   S
       » Scramble order of unit records
             hist.umn.edu/~rmccaa/ipums-europe            11
        Anonymization example: Italy, 1991
               First assessment
Note: population uniques are anonymized after integration

» 1. Suppress geographical variables below commune
» 2. Convert
   » Dates of birth, marriage, immigration to ages
   » Band small groups
» 3. Suppress sensitive codes for small groups:
   » Citizenship
   » Year of immigration to Italy
   » Commune of work/study


                           hist.umn.edu/~rmccaa/ipums-europe   12
    EUROSTAT statistical anonymity standards
                 (Thorogood, 1999)
      --all accepted by IPUMS-International

» 1. small sample size
» 2. limited geographical detail
» 3. top and bottom coding of unique categories
» 4. signed non-disclosure agreement
» 5. prohibit redistribution of datasets to third parties
» 6. prohibit attempts to identify individuals or the making
  of any claim to that affect
» 7. require users to provide copies of publications
                     hist.umn.edu/~rmccaa/ipums-europe   13
    EUROSTAT statistical anonymity standards
                (Thorogood, 1999)
       --all accepted by IPUMSi and more

» 8. Age (constructed from birth date, where necessary)
» 9. Never identify date of birth
» 10. Never identify place of birth
» 11. Migration: timing and place not identified in detail
» 12. Place of residence identified by major civil division
  (pop>60k, 120k, 250k, 1 million--national rule)
» 13. Sensitivity analysis of variables by national experts
» 14. Confidentiality assessment by national experts
                     hist.umn.edu/~rmccaa/ipums-europe   14
      Sister-project: IPUMS-Latin America:
 17 countries, ~500 million pop., 5 census rounds
     80+ samples, 100+ million person records
» Scope: Latin American census microdata, 1960-present
» Work Plan ( funded by National Institutes of Health)
  »   2001: Sign licensing agreements with official agencies
  »   2002: Obtain funding from U.S. NIH
  »   2003: Develop/translate microdata & metadata
  »   2004: Country expert teams design national integrations
  »   2005: MPC/expert teams design regional integration
  »   2006: MPC anonymizes/integrates microdata and metadata
  »   2007: MPC disseminates to bona fide researchers who sign
      non-disclosure license.
            National census/data/research institutes may distribute
      national versions via CDs/web.
                        hist.umn.edu/~rmccaa/ipums-europe    15
          IPUMS-Europe Partnership:
                 More…
» Censuses: 1960s – 2000, where microdata exist
» Countries: >350 million population,
  16, inclined at present (* = signed):
  Austria, Bulgaria, Czech Republic*,
  France*, Germany, Greece, Ireland,
  Israel, Hungary*, Poland, Portugal,
  Romania, Slovenia*, Spain*,
  Switzerland, Turkey
» Research: more knowledge, more users
                    hist.umn.edu/~rmccaa/ipums-europe   16
          IPUMS-Europe Partnership:
             More uniformity…

» Legal: signed memorandum of understanding
» Administrative: restricted to approved users; strong
  enforcement procedures
» Sample design: every nth household
» Anonymization: includes corrupting data
» Integration: more variables, composite coding
» Dissemination: extract custom-tailored datasets, never
  entire samples
                   hist.umn.edu/~rmccaa/ipums-europe   17
                Advantages…
      proven record of accomplishments:

» Uniform legal protocols
» Substantial institutional infrastructure
» Experienced census microdata integrators
» Cost-effective academic environment
» Sustained funding from National Science Foundation,
  National Institutes of Health
» Successful web-based distribution system: users!
                    hist.umn.edu/~rmccaa/ipums-europe   18
     Advantages of IPUMS-International

» Comparability:
  data are rigorously integrated;
  documentation is extensive, both primary (from NSIs)
  and integrated (from MPC)
» Accountability:
  reports on users, usage and publications
  advisory board of statisticians and scientists
» Sustainability: MPC, ICPSR
                    hist.umn.edu/~rmccaa/ipums-europe   19
    IPUMS-Europe, 2004-2008: coverage
  ~20 countries, representing ~400m. people
» Scope: European census microdata, 1950-present
» Work Plan (contingent upon funding)
  » 2003: Sign licensing agreements with census agencies
              Obtain funding from US NIH
  »   2004:   Develop/translate microdata & metadata
  »   2005:   Country expert teams design national integrations
  »   2006:   MPC/expert teams design regional integration
  »   2007:   MPC integrates microdata and metadata
  »   2008:   MPC disseminates to bona fide researchers who sign
              non-disclosure license.
              National census/data/research institutes via CDs/web.
                         hist.umn.edu/~rmccaa/ipums-europe    20
  I   Imagine a new statistical product:
  N   scientifically anonymized, integrated
I T   census microdata samples made up of
  E   unidentifiable individuals...
P R
U N
M A
  T     Easy-to-use web-interface
      » 1998: 1 country signed
S I   » Highest scientific
      » 1999: 3 countries standards
  O   » 2000: 9 powerful integration
      » Proven,
  N   » 2001: 15
      » A quantum leap in usage
  A   » 2002: 32; first release, 6 countries
  L          hist.umn.edu/~rmccaa/ipums-europe   21
   R
I E
P S
U C
M U
Si E   UN Demographic Center for Latin America
   S         (CELADE, Santiago, Chile)
           ~3000 microdata tapes recovered

            and metadata (documentation)
            hist.umn.edu/~rmccaa/ipums-europe   22
         National experts in each country
          are contracted to assist with:
I
         »Assembling microdata and documentation
P
     P   »Developing samples
U    A
            » to minimize confidentiality risks
     Y
M    S      » and to maximize robustness
         »Designing national integration plan
Si
            »census-by-census
            »concept-by-concept
            »code-by-code
         »Writing integrated documentation
                hist.umn.edu/~rmccaa/ipums-europe   23
      P
      A
  I R
 P T
 U N
Census documentation           Standard:UN/Eurostat
 M E
compiled for Colombian         Principles & Recs...
microdata
 Si R
      S
      H Photos from Colombia integration project,
                      February-March, 2000:
      I        4 experts from DANE (census office)
      P            +7 academics (3 universities)
                 hist.umn.edu/~rmccaa/ipums-europe   24
         IPUMSi integration principles

» 1. Respect absolute anonymity and confidentiality
» 2. Preserve all original data, except adjustments to
  insure privacy (top codes, blurrings, masking, re-
  ordering, etc.)
» 3. Harmonize codes using international standards
  occupation: ISCO-88             (detailed, general)
  education: ISCED                    “         “
  family:      IPUMS, etc.            “         “
» 4. Enhance with constructed variables
                    hist.umn.edu/~rmccaa/ipums-europe    25
             Composite coding scheme example:
                       marital status
                             Coding Scheme and Category Availability for Marital Status



                                                 Colombia                  France             Kenya          Mexico             United States         Vietnam
Code            Label                       64   73   85    93   62   68    75      82   90   89   99   60   70   90   00   60     70   80       90   89   99


100    SINGLE/NEVER MARRIED                 X     X   X     X    X    X      X      X    X    X    X    X    X    X    X    X      X    X        X    X    X
       MARRIED/IN UNION
210       Married (not specified)           X     X   X     X    X    X      X      X    X    X    X    .    .    .    .    X      X    X        X    X    X
211         Civil                           .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
212         Religious                       .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
213         Civil and religious             .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
214         Polygamous                      .     .   .     .    .    .      .      .    .    X    X    .    .    .    .    .      .    .        .     .   .
220       Consensual union                  X     X   X     X    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
       SEPARATED/DIVORCED/SPOUSE ABSENT
310       Separated or Divorced             .     X   X     X    .    .      .      .    .    .    .    .    .    .    .    .      .    .        .     .   .
320       Separated                         .     .   .     .    .    .      .      .    .    X    X    .    X    X    X    X      X    X        X    X    X
330       Divorced                          .     .   .     .    X    X      X      X    X    X    X    X    X    X    X    X      X    X        X    X    X
340       Married, spouse absent (n.s.)     X     X   X     X    X    X      X      X    X    X    X    .    .    .    .    X      X    X        X    X    X
341         MSA, civil                      .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
342         MSA, religious                  .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
343         MSA, civil and religious        .     .   .     .    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
344         MSA, polygamous                 .     .   .     .    .    .      .      .    .    X    X    .    .    .    .    .      .    .        .     .   .
350       Consensual union, spouse absent   X     X   X     X    .    .      .      .    .    .    .    X    X    X    X    .      .    .        .     .   .
400    WIDOWED                              X     X   X     X    X    X      X      X    X    X    X    X    X    X    X    X      X    X        X    X    X
999    UNKNOWN/MISSING                      .     X   X     X    .    .      .      .    .    X    X    X    X    X    X    .      .    .        .    X    X
                                        hist.umn.edu/~rmccaa/ipums-europe                                                                   26
            Occupation: the ISCO standard,
              preliminary release: “1” digit
     final: 2-3 or 4 digit, depending upon country
                                          Coding Schemes and Category Availability for Occupation

                                                          Colombia               France         Kenya        Mexico         United States Vietnam
Code             Label                               64    73 85 93    62   68     75 82   90   89 99   60   70 90    00   60 70 80 90 89 99

OCCUPATION, ISCO
01     Legislators, senior officials and managers    X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
02     Professionals                                 X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
03     Technicians and associate professionals       X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
04     Clerks                                        X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
05     Service workers and shop and market sales     X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
06     Skilled agricultural and fishery workers      X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
07     Crafts and related trades workers             X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
08     Plant and machine operators and assemblers    X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
09     Elementary occupations                        X     X   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X
10     Armed forces                                  X     .   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   .
98     Unknown                                       X     X   .   .   .    .      .   .   X    .   X   X    X   X    X    .   .   .        .   .   .
99     N/A                                           X     .   .   .   X    X      X   X   X    .   X   X    X   X    X    X   X   X        X   .   X




                                                    hist.umn.edu/~rmccaa/ipums-europe                                                  27
      Variable availability, preliminary release
                                          Selected Variable Topic Availability, by Country and Census Year

                                                                 Colombia               France          Kenya        Mexico             United States    Vietnam
                                                            64    73 85 93    62   68     75 82   90   89 99    60   70 90    00   60     70 80 90       89 99
Geography and internal migration
    Place of usual residence                                 x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
    Place of birth                                           x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     .    .
    Duration of residence                                    x    x   .   .   .    .      .   .   .    .   x    x    x   .    .    x      x    x   x     .    .
    Place of previous residence                              x    x   .   .   .    .      .   .   .    .   .    x    x   .    .    .      .    .   .     .    .
    Place of residence at a specified date in the past       .    .   x   x   x    x      x   x   x    x   x    .    .   x    x    x      x    x   x     x    x
Household and family structure
    Relationship to head of household/householder            x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
Demographic and social
    Sex                                                      x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
    Age                                                      x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
    Marital Status                                           x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
    Citizenship                                              .    .   .   .   x    x      x   x   x    x   x    x    .   .    .    .      x    x   x     .    .
    Religion                                                 .    .   .   .   .    .      .   .   .    .   x    x    x   x    x    .      .    .   .     .    x
    Language                                                 .    .   .   .   .    .      .   .   .    .   .    .    x   x    x    .      .    x   x     .    .
    National and/or ethnic group                             .    .   .   x   .    .      .   .   .    x   x    x    .   .    x    x      x    x   x     x    x
Fertility and mortality
    Children ever born                                       .    x   x   x   .    .      .   .   .    x   x    x    x   x    x    x      x    x   x     x    x
    Children living                                          .    x   x   x   .    .      .   .   .    x   x    .    .   .    x    .      .    .   .     x    x
    Date of birth of last child born alive                   .    x   .   x   .    .      .   .   .    x   x    .    .   .    x    .      .    .   .     x    x
    Deaths in the past 12 months                             .    .   .   .   .    .      .   .   .    .   .    .    .   .    .    .      .    .   .     .    x
    Maternal or paternal orphanhood                          .    .   .   .   .    .      .   .   .    x   x    .    .   .    .    .      .    .   .     .    .
    Age, date or duration of first marriage                  .    .   .   .   .    .      .   .   .    .   .    .    .   .    .    x      x    x   .     .    .
Education
    Literacy                                                 x    x   x   x   .    .      .   .   .    x   .    x    x   x    x    .      .    .   .     x    x
    School attendance                                        .    x   x   x   .    .      .   .   .    x   x    .    x   x    x    x      x    x   x     x    x
    Educational attainment                                   x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x     x    x
    Field of education and educational qualification         .    .   .   .   x    x      .   .   .    .   .    .    .   .    x    .      .    .   .     x    x
Economics
    Activity status                                          x    x   x   x   x    x      x   x   x    x   x    .    x   x    x    x      x    x   x     x    x
    Time worked                                              x    .   x   .   .    .      .   .   .    .   .    x    x   x    x    x      x    x   x     .    .
    Occupation                                               x    x   .       x    x      x   x   x    x   .    x    x   x    x    x      x    x   x     x    x
    Industry                                                 x    x   .   x   x    x      x   x   x    .   .    x    x   x    x    x      x    x   x     x    x
    Status in employment                                     x    x   x   x   x    x      x   x   x    x   .    x    x   x    x    x      x    x   x     .    .
    Income                                                   .    .   .   .   .    .      .   .   .    .   .    .    x   x    x    x      x    x   x     .    .
    Institutional sector of employment                       .    .   .   .   x    x      x   x   x    .   .    .    .   .    .    .      .    .   .     .    x
    Place of work                                            .    .   .   .   x    x      x   x   x    .   .    .    .   .    x    x      x    x   x     .    .
International migration
    Country of birth                                         x    x   x   x   x    x      x   x   x    x   x    x    x   x    x    x      x    x   x      .   .
    Citizenship                                              .    .   .   .   x    x      x   x   x    .   .    .    .   .    .    x      x    x   x      .   .
    Year or period of arrival                                .    .   x   .   .    .      .   .   .    .   .    .    .   .    .    .      x    x   x      .   .
Disability                                               hist.umn.edu/~rmccaa/ipums-europe                                                              28
    Disability                                               .    .   .   x   x    .      .   .   .    .   .    .    .   .    x    .      x    x   x      .   .
    Cause of disability                                      .    .   .   .   x    .      .   .   .    .   .    .    .   .    x    .      .    .   .      .   .
   D      Web-based extraction system
   I
I S    Legally-binding license agreement
P S       »protects privacy and confidentiality
   E      »assures proper use
U M       »new sanction: loss of employment.
M I    Researcher selects
          »countries
Si N      »censuses
   A      »cases/sub-populations
          »variables
   T      »sample densities
   E   »Facilitates comparative research
   S
              hist.umn.edu/~rmccaa/ipums-europe   29
       additional information at:

www.hist.umn.edu/~rmccaa/ipums-europe

                   contact:
           rmccaa@umn.edu

                    *****

                Thank you
             hist.umn.edu/~rmccaa/ipums-europe   30
    IPUMS-Europe, 2004-2008: coverage
  ~20 countries, representing ~400m. people
» Scope: European census microdata, 1950-present
» Work Plan (contingent upon funding)
  » 2003: Sign licensing agreements with census agencies
              Obtain funding from US NIH
  »   2004:   Develop/translate microdata & metadata
  »   2005:   Country expert teams design national integrations
  »   2006:   MPC/expert teams design regional integration
  »   2007:   MPC integrates microdata and metadata
  »   2008:   MPC disseminates to bona fide researchers who sign
              non-disclosure license.
              National census/data/research institutes via CDs/web.
                         hist.umn.edu/~rmccaa/ipums-europe    31

								
To top