IPUMS-International Disseminating Census Microdata

Document Sample
IPUMS-International Disseminating Census Microdata Powered By Docstoc
					Welcome to the 7th IPUMS-International workshop

     Accomplishments, plans and challenges
 Robert McCaa, Professor of population history
             University of Minnesota
        for additional details, please see:
    1. Introductions, organization and program
» Introductions
    » Doña Carmen Miró, director and founder of CELADE
      thanks to your vision: the world’s largest microdata archive
   »  Don Evelio Fabbroni, technical secretary of IASI
   »  Don Dimas Quiel, Director General of DEC-Panamá, and your staff
   »  Delegates of the National Statistical Institutes of Latin America
   »  Minnesota Population Center: 9 members of the team
» Organización
   » Dinner per-diem for Tuesday
   » All activities are in the Hotel Crowne Plaza
» Program (please see the workshop folder): two intensive days
   » Invited speakers: INEs, CELADE, CED (Barcelona)
   » MPC:
        » Listen and learn
        » Demonstrate what has been accomplisted: recovery, confidentiality, integration,
        » Show the website and how to make good use of it
        » Discuss plans and innovations for the 2nd five-year plan of IPUMS-AL: 2009-2013
          Outline of the presentation
                                                     no. of slides
1. Introduction                               5
2. Accomplishments—celebrate a success far beyond…
  a.   Censuses and documentation recovered               9
  b.   Censuses integrated (see map and inventory)        3
  c.   Confidenciality and security                       5
  d.   Methods and procedures                             3
3. Plan and challenges                                    2
  » Censuses to integrate, 2010 round, tabulator
       (REDATAM?), GIS, laboratories of high security
   IPUMS – Latin America: past, present and future
» Past: Thank you!
   » All the statistical institutes of Latin American are
     cooperating with the IPUMS project.
» Present: Samples of 9 countries (43 censuses) of the
  American continents integrated in the IPUMS system
   » 2002-8: Argentina, Brazil, Chile, Colombia, Costa Rica,
     Ecuador, Mexico, Panamá, y Venezuela
» Future: 2009-13:
   » The remaining Latin American countries
   » Censuses of the 2010 round, entrusted before 2013
   » Tabulator, GIS, Laboratories of high security
   Some members of the IPUMS team (2008)

                                          Steven Ruggles, inventor of IPUMS,
                                          Professor of History, and Director of
                                          the Minnesota Population Center

(Not present: computer gurus, some researchers, and others who were too busy
                       to make time for taking a photo!)
                The objectives of IPUMS

1.   Preserve census microdata and documentation for all the
     countries in the world
2.   Integrate microdata and metadata
3.   Disseminate--without cost--extracts of samples with the
     corresponding documentation to researchers
                       IPUMS Milestones
» 1995: IPUMS-USA first release of integrated microdata
           IPUMS-USA continues: 1850-2000 + ACS samples
»   1999: IPUMS-International funded
»   2002 - 1st International release: 7 countries, including
    Colombia and Mexico
»   2006 release: 20 countries, 63 censuses,
»   2008 release: 35 countries, 111 censuses
    » ~263 million person records
    » Two thousand users
» 2013 release: ~60 countries, ~200 censuses
          Note: microdata are already entrusted to MPC
                   Workshop goals
•   Evalute the integrations to date (9 countries, 43
•   Discuss plans to complete the remaining integrations
    (12 countries, 43 censuses)
•   Consider the incoporation of electronic boundary files
    corresponding to the microdata
•   Study the obstacles and challenges in the
    harmonization of census samples for the 2010 census
•   Discuss methods and procedures of the IPUMS project
    to improve them
•   Respond to doubts, questions, or concerns regarding
    any aspect of the project
              6 presentations on IPUMS

1.   Introduction: past, present, and future – Bob McCaa
2.   Metadata: The IPUMS dynamic system – Toni López
3.   How to make an extract (to obtain microdata) – Miguel
4.   How Integration is accomplished – Matt Sobek
5.   Complementing microdata with GIS: the example of NHGIS
     – Petra Noble
6.   The REDATAM tabulator of integrated microdata
     implemented by IECM (CED, Barcelona) – Toni López
               2a. The Americas:
  The global vanguard in preserving microdata
» 1959: CELADE began the grand project OMUECE
  (Operation of Census Samples).
» Only CELADE, of all the UN demographic centers,
  » Began a project to archive microdata
  » Stimulated an archival program for both data and
» Already, in 1977, CELADE was entrusted with 61 sets
  of microdata encompassing 20 countries.
  » Principal goal: special and comparative tabulations
  » Comparative demographic research of many countries
  » Standardization of basic codes to attain a minimal level of
                 Census Microdata: 1950s
             few countries archived microdata
(a country in green indicates microdata exist for the decade)

                           Mollweide projection
                  Census Microdata: 1960s
                       The Americas:
in the vanguard for encouraging the preservation of microdata

                           Mollweide projection
                    Census Microdata: 1970s
 already in the Americas, the preservation of microdata is almost
universal and is becoming widespread in Europe, Africa and Asia

                             Mollweide projection
                     Census Microdata: 1980s
          The preservation of microdata became generalized

Perú: Can
the tapes for
the census
of 1981 be

                               Mollweide projection
                    Census Microdata: 1990s
               many countries preserved microdata
                (or are disposed to recover them)

Rep. Dom.:
¿can the
data for the
census of
1993 be

                             Mollweide projection
    Inventory of census microdata archived by region
         and decade (% of censuses conducted)

    Region/continent        Countries     2000s     1990s 1980s 1970s          1960s

    Latin America               21         100% 100%          89%     81%       72%

    North America               27           91%     72%      64%     24%         8%

    Africa                      58           15% 22%         25%     15%         2%

    Asia                        44            ?%     54%      31%     30%       13%

    Europe                      46            ?%     67%      55%     41%       13%
    (pob>.5m)                    7         100% 100% 100%             43%       29%
•Note: cases confirmed by the corresponding official statistical institute. Some
datasets remain to be certified. Some countries have not responded to the invitation to
inventory their stocks of data.
          The CELADE archives
~3000 microdata tapes preserved with the
     corresponding documentation
For the entire region, manuals are lacking for only 10 censuses:
1. El Salvador:             1992   2007
2. Guatemala:        1973 1994 2002
3. Honduras:        1974
                  Los Archivos de CELADE
4. Nicaragua:             1995 2005
  ~3000 cintas de microdatos preservadas con su
          documentación correspondiente
5. Rep. Dom.:              2002
6. Perú:                           2007
   2b. Integration: IPUMS-Latin America in global context
                dark green = already integrated
    (35 countries, 111 censuses, 263 millon person records)
green = to be integrated (39 countries, 103 censuses, 150 mill.)

                            Mollweide projection
Census documentation           Standard: UNSD
assembled for the              Principals and
microdata of Colombia          recommendations...

             Photos from the first integration project:
                      Colombian microdata,
                  February-March, 2000:
                    4 experts from DANE
                +7 academics (3 universities)
             IPUMS-Latin America
» Samples currently available in IPUMS
  » 9 Latin American countries, 43 censuses:
    average = 4.8 censuses per country
  » 26 countries for other regions, 68 censuses:
    average = 2.6 censuses per country
» In 2013, is all goes well:
  » 21 Latin American countries, 86 censuses
  » 60 countries for other regions, 120 censuses.
        2c. Statistical Confidentiality
                  and security
»Cited by UN-ECE as ―good practice‖
»On-site inspection: the Dennis Trewin Report
Why was IPUMS cited as “good practice” by
               the UN-ECE
      (2007, Annex 23, pp. 98-103)?
                Good practices (see annex 23):
» High level of confidence and transparency between the
    researchers (users) and the national statistical institutes
»   The conditiions of use are well defined
»   Sanctions for mis-use are clearly spelled out
»   Good use is assured by both juridical and administration
    mechanisms to prevent violations
»   Sanctions are imposed no only against those who misuse the
    data but also against their institutions.
»   The data are anonymized by highly efficient technical means
The standard agreement between National Statistical
     Institutes and the University of Minnesota
        Statistical confidentiality and security:
                 see the Trewin Report
» ―The best practice for an international
  repository of microdata‖
» ―The security of IPUMS is first class…the
  standard of the best national statistical offices‖
» ―in full compliance with the principles and
  recommendations of the ECE‖
    2d. IPUMS methods and procedures
» Dissemination by internet
» Comprehensive documentation, including
  » Data dictionaries and codebooks
  » Complete original source documentation in the official
     questionnaires, manuals, etc.
  » All translated to English and converted into metadatabase
    for each census
» Integration ≠ standardization
  » Composite codes (11, 12, 21, 22…) ≠ serial codes (1, 2, 3, …)
     (see next slide)
                IPUMS—Integration method:
               composite codes (multiple digits)
             retains not only significant distinctions
            but also integrates comparable concepts
                                                           Chile            México
Code   Label                                          1992     2002    1990     2000
0      NIU                                             X        X       X        X
       ACTIVE (In Labor Force)
100     EMPLOYED, not specified                        ·           ·    ·            ·
110      At work                                       X           X    X            X
111       At work, and 'student'                       ·           ·    ·            X
112       At work, and 'housework'                     ·           ·    ·            X
113       At work, and 'seeking work'                  ·           ·    ·            X
114       At work, and 'retired'                       ·           ·    ·            X
115       At work, and 'no work'                       ·           ·    ·            X
116       At work, and 'other'                         ·           ·    ·            X
117       At work, family holding, not specified       ·           ·    ·            ·
118       At work, family holding, not agricultural    ·           ·    ·            ·
119       At work, family holding, agricultural        ·           ·    ·            ·
120     Have job, not at work last week                X           X    X            X
                    In addition…
» Microdata: new high precision samples not
  only for contemporary censuses but also for
  historical ones (before the 90s)
» Systematic metadata for all variables
   » Universes
   » Definitions
   » Comparability
   » Dynamic System—facilitates comparing the
    wording of questionnaires and instructions for any
    combination of countries and censuses
3. IPUMS-Latin America II, 2009-13: objectives
  1. Conclude the integration of the censuses of the remaining
       countries in the region
  2.   Incorporate samples for the 2010 round
  3.   Add digital boundary files at the second administrative
  4.   Facilitate pre-analysis with an on-line tabulator
  5.   Construct a laboratory of high security
  1. Confirm participation in IPUMS-AL II
  2. Facilitate copies of digital boundary files
  3. In time, make available census microdata and
     documentation for the 2010 round
  4. Discuss participation in the high security laboratory
           Appreciation, reflections and invitation
» Appreciation:
   » To the founders (and current members) of CELADE for having had the
     vision and ability to assemble and preserve microdata
   » To the official statistical institutes for, first, cooperating with CELADE,
     and second, for participating in the IPUMS project
» Reflections
   » Latin America: a model for statistical cooperation
   » IPUMS: already the world’s largest microdatabase
» Invitation:
   » Participate in IPUMS-AL II
   » Entrust microdata and documentation
   » Consider: Tabulator, GIS, laboratory of high security
  Thank you!!