Public Use Microdata Samples

Document Sample
Public Use Microdata Samples Powered By Docstoc
					  Public Use Microdata
Using PDQ Explore Software

            Grace York
  University of Michigan Library
            May 2004
       2000 Census Data

• Summary Files 1-4, Equal Employment
  Opportunity, School District Data, and
  Work Flow data are TABULATED data

• American Factfinder EXTRACTS the
  tabulated data
     Public Use Microdata

• Copies of the original questionnaires
  with identifying information edited

• Create your own cross tabulations
  of census data
   Typical PUMS Questions
• Single years of age by sex for teachers in
  Michigan (e.g. when will they retire?)

• Race of those with Arab ancestry (no, they are
  not all white)

• Demographic characteristics of immigrants from
  Senegal (age, sex, education, occupation,
  income, citizenship for a social survey)

• Age, race and sex of automotive industry
  employees (campaign for organ donations)
  PUMS Software Programs
• FTP data from Census Bureau (and manipulate
  with SAS or SPSS)
• Census Bureau CD-ROMS (Beyond 20/20
• SDA Software for Michigan (UMich Only)
• PDQ Explore
   PDQ Explore Software

• Easy interface to
  – Public Use Microdata Samples, 1 and 5%,
  – IPUMS, edited PUMS, 1850-1880, 1900-
    1920, 1940-1990
  – Current Population Survey, 1991+
  – Mortality Schedules
• Permits users to tabulate their own
            Access to PDQ

• Librarians may request free Ids, passwords, and
  software from PDQ

• Send e-mail to
   – You are a librarian who talked to Grace York
   – Requesting ID and password for using PDQ
   – Want to download software for the PDQ
     Toolbox, Expert Edition


• Download the software per
  instructions to your hard drive
• To begin searching, open the icon on
  your desktop
       Before Beginning …
              Choose File

   Two PUMS files – 1% and 5% sample

• 1% has data for the nation, states,
  MSAs and super-Pumas (areas of

• 5% has data for the nation, states,
  MSAs and Pumas (areas of 100,000)
Before Beginning…
   Define the data you want in terms of a
   spreadsheet. The longer part should be
   defined as rows rather than columns.

I want single years of age by sex for all
   Vietnam-era veterans in the United

   Universe = Vietnam-era veterans in the U.S.
   Column=sex (not very wide)
   Row=single years of age (could be long)
 Before Beginning…

Consult Chapter 7 of the PUMS codebook if you
    want to check the possible variables and the
    appendices for place/language/ancestry and
    occupation codes

Chapter 7 is also available on the University of Michigan web site at:
Before Beginning…

Housing Record
   All geographic codes (state, MSA, PUMA)
   All housing records
   Some population records

Population Record
    All population variables
    Ok to combine with geographic codes in housing
    Ask for help for other population/housing
    combinations at:
Before Beginning…

     Variable Codes for the Question
   in the Technical Documentation Data Dictionary

AGE                  Single Years of Age
SEX                  Male or Female
VPS5                 Veteran’s Period of Service 5:
                     On active duty during the
                     Vietnam Era (Aug. 1964 to Apr.
             Logging On
Enter the subscriber name and password that
you were given by the PDQ staff
           Logging On
Press OK to close the message of the
       Defining Workspace
• To conduct a new search, create a new
• Press Finish or return twice
       Defining Workspace
Name your file on your hard drive and save.
       Defining Workspace
At the next screen, use the top menu to choose
  Workspace; then Add a Data Set
       Defining Workspace
Browse data sets; highlight ipums, pums, cps, or
  mortality file; Open
             Defining Variables
•   Once you choose a data set, its codebook will open up
•   Click on the plus button to get a list of variables, their
    alphabetic symbols, and any numeric values
            Defining Variables
•  Determine the alphanumeric variables you want
        (e.g. Vietnam-era veteran: yes is VPS5=1)
• Use Top Menu to Choose Query/Setup New Expert Query
(Access the codebook later through a tab on the desktop toolbar)
          Expert Query Form
1.   Make sure you have the correct data set
2.   Determine if you want a tabulation (counts or numbers)
3.   Name your file
          Expert Query Form
Enter the code for UNIVERSE (what you’re counting)
     in the Universe box (e.g. vps5=1 are Vietnam-era veterans
     for the entire U.S.)
         Expert Query Form
•   Enter the code for the variables in the ROW box
    (age = single years of age; age/5 would be five year age groups)
•   Enter the code for the variables in the COLUMN box (e.g. sex)
•   Press RESULTS to run the query
           Search Results
Search results appear in spreadsheet format
                   Saving Results
•   Click on File/Export Query Results
•   You can save as CSV , tab delimited and several other formats.
    CSV (WYSIWIG) recommended for use with Excel
•   Use SETUP button to return to query or icon at bottom to review
    the codebook
             Geographic Codes
• Geographic codes are found in the Housing documentation
• Limit files to Michigan with the code state=26
• Click on Query/New Expert Query to continue
Narrowing the Universe
Narrow the universe by using & newcode (e.g.
vps5=1 & state=26)
Logical Operators in PDQ

 & is one of numerous operators used in PDQ

     Operator     Name                    Example/Comment

     X:a..b      range                  age:15..44
     unary +     plus                   sex=+1 (never needed)
     unary -     minus                  income4<=-1000
     *           multiply                73*income1/100
     /           divide                 rhhinc/persons
     %          modulo                  subsample%10
     +           add                    income1+income2
     -           subtract                rhhinc-rearning
     <           less than              age<65
     >           greater than           age>64
     <=         less than or equal       age<=65
     >=         greater than or equal    age>=65
     = or ==     equal                   age=23
     != or <>   not equal                income!=0
     & or &&    and                     race=2 & looking=1
     ^           exclusive or           bit-wise--use with caution
     | or ||     or                     age<18 | age>=65
      Altering the Spreadsheet
Once you have a spreadsheet, click on Options to
  create totals or percentages for tables or columns
Adding More Parameters
Expand the table detail by repeating the row and column
  data for another parameter (e.g. race) as shown in
  Dimension 3
           Altering Spreadsheet
•   The default shows separate tables for each of the values in the
    third dimension (e.g. separate spreadsheets for white and black)
•   Change Axis3 tab to FOREACH everything on same spreadsheet
    Calculating Means or Averages
•   Calculate averages by changing the query type to summary
    statistics (e.g. mean or average) at the top
•   Fill in the new Describe Expression box at the bottom with a
    variable code (e.g. age, income)
               Complex Table
Mean income of white male Vietnam-era veterans in Michigan
  by age, whether or not they have earnings
You can respecify only veterans with earnings
       Altering Mean Income
Add & incws > 0 to universe to count only Vietnam-era
veterans who are earning more than $0
               Complex Table
Mean income is higher when data limited to wage-earning
     Small Area Geography
• Data from the PUMS 5% file is available for
  states, metropolitan areas, and Public Use
  Microdata Areas (PUMAS) of 100,000

• You can identify a PUMA or group of PUMAs
  – Maps in American Factfinder (
  – PDF maps on the Census Bureau web site
  – Mable/Geocorr Search Engine
     Small Area Geography
This map shows Detroit as PUMAs 3701-3708
   PUMA Codes for Michigan

Ann Arbor           3200
Detroit             3701-3708
Flint               2200
Grand Rapids        1300
Lansing             1800

PUMA to Place

Place to PUMA
Codebook and PUMAS
The Explore Codebook shows PUMA5 as term for
  5% PUMA boundaries
Small Area Geography and
When creating data sets for PUMAS, be sure to
 include the correct state as the universe (e.g.
  Small Area Geography and
Puma5: 3701..3708 will list the data for each individual area
 Small Area Geography and
Search result for each individual PUMA
Small Area Geography for Ranges
To get the total for the area, list it in the universe as
  puma5 >3700 & puma5 <3709 & state=26
Small Area Geography for Ranges
To get a listing of single years of age between 65 and 85,
  list column as age: 65..85
           Calculating Totals
• To calculate the most spoken languages by 65-85 year
  olds as a group
• Click on Options/Total Options/Row
         Complex Result
Spanish and Polish are two most popular
  languages spoken by seniors 65-85 in Detroit
            Access to PDQ

• Librarians may request free Ids, passwords, and
  software from PDQ

• Send e-mail to
   – You are a librarian who talked to Grace York
   – Requesting ID and password for using PDQ
   – Want to download software for the PDQ
     Toolbox, Expert Edition

   Contacts for Research
                      Initial Queries
Grace York, Documents Center, 203 Hatcher or 936-2378
JoAnn Dionne, Numeric and Spatial Data Services, 825
                    Complex Data Sets
Lisa Neidert, Population Studies Center, 426 Thompson,, 763-2163
PDQ Staff, 310 Depot Street, Suite C, Ann Arbor

Shared By: