Introduction to the cross-sectional government surveys by moti

VIEWS: 0 PAGES: 30

									  Introduction to the cross-
sectional government surveys

            Vanessa Higgins
          ESDS(Government)
     CCSR, University of Manchester

            Queen’s University, Belfast
                   7 Nov 2007
           Introductions


• Who you are and where you are from?

• Have you used any govt datasets? (which,
  what for?)

• What is your “consumption” area of
  interest is?
             My mission

•   What data is available?
•   What is it like?
•   Choosing the right data
•   Considerations when using the data
•   Ideas for how you can use the data

• 5 minute exercise
    Why should you want to know
          about the data?
• Because the data are...
• Very cost effective: data free of charge to
  not for profit researchers
• Saves time: no need to conduct survey
• Access to high quality, well documented
  data
• Can provide nationally representative
  data - allows generalisation to population
• Allows historical and geographical
  comparisons to be made
• ESRC funded data support services
 What data am I talking about?
• UK is particularly rich in microdata which is
  available for secondary analysis
• Today focus on cross-sectional microdata from
  government surveys
  – ESDS Government Surveys (e.g. Labour Force
    Survey, Expenditure and Food Survey)


• Other major sources:
  –   Longitudinal data (e.g. LS, BHPS)
  –   International microdata (e.g. ESS)
  –   ESDS core function/UK Data Archive
  –   Aggregate data
            Which surveys?
•   General Household Survey/Continuous Household Survey
•   Labour Force Survey
•   Expenditure and Food Survey (previously the National Food
    Survey and Family Expenditure Survey)
•   Family Resources Survey
•   Time Use Survey
•   British Social Attitudes/Scottish Social Attitudes/Northern
    Ireland Life & Times/Young People‟s Social Attitudes

•   ONS Omnibus Survey
•   Annual Population Survey
•   National Travel Survey (NI Travel Survey)
•   British Crime Survey/Scottish Crime Survey (NI Crime
    Survey)
•   Health Survey for England/Wales/Scotland (NI
    Health and Social Well Being Survey)
•   Survey of English Housing
What are ESDS Government data
            like?
•   ‘Nationally’ representative survey microdata
•   Continuous surveys – always up-to-date
•   Large sample sizes (GHS 20K, HSE 10k)
•   Cross-sectional (although the LFS has a 5-
    quarter panel element and new GHS has 4
    yearly panel)
• Specialist topic surveys – more depth than
    the Census
• Face-to- face computer-assisted personal
    interviewing
• Identifying information is removed
     All of these microdata are:

• Individual information akin to the sort of data
  you would collect if you were conducting your
  own survey
• Need to be analysed in an appropriate software
  package (like SPSS or Stata)
• Good quality collected by a professional data
  collection organisation for policy purposes
   – Office for National Statistics
   – National Centre for Social Research

• Has good quality documentation & support
  services
Microdata
     Integrated Household Survey
    (Continuous Population Survey)
•   Will bring together
     – Labour Force Survey
     – Annual Population Survey
     – General Household Survey (L)
     – Expenditure and Food Survey
     – Omnibus Survey
     – English Housing Survey (EHS)

•   Benefits
     – Sample size c 265,000 households (over half a million
       adults) for core topics
         • Improve reporting between Censuses
         • Allow some data at regional and sub-regional level
     – Better sampling strategy

•   Expected to start in 2008 (phased implementation)

•   Materials on our website - IHS Consultation Meetings (2006 and
    2007
             Choosing data

• Choosing data:

   – What data is available for my topic?
      • Which surveys cover my topic?
      • What other topics am I interested in?

   – Are the variables I need available?

   – Does it cover the population I‟m interested in?
      • Sampling strategy (geography, respondents,
        sample size)
 What variables are available for
           my topic?

• To understand the variables you
  have available
  – View the documentation/user guide
  – A list of variables & codings should be
    available
  – Information on how derived variables
    were created should be available
  – Double check in the dataset!
What do the variables mean?
Unless...
• you can track your variable back
  to the question(s) asked on the
  questionnaire
• Know who the questions were
  asked of
• And what was done with the raw
  data to turn it into the final data...
You don’t understand the data
Routeing in the documentation: GHS
                                Derived variables
Variable Name : ECSTILO              DO IF SCHEDTYP = 3 OR AGE LT 16.
                                     +
Variable Label : Economic status (harmonised) COMPUTE ECSTILO = -6.
Topic : Employment                   ELSE.
Population : Adults
Hhld/indiv.level : Individual
                                     +        DO IF DVILO3A = 1.
Range : 1 to 10                      +                DO IF SCHEMEET = 1.
Missing values : -6, -8              +                                  1.
                                                              DO IF TRN =
                                 +                             COMPUTE ECSTILO = 2.
1 'Working (incl Unpaid FW'      +                          ELSE IF TRN = 2.
2 'Gov sch with emp'
3 'Gov sch at coll'              +                              COMPUTE ECSTILO = 3.
4 'Unemployed (ILO)'             +                          END IF.
5 'Other Unemployed'             +                  ELSE.
7 'Retired'                      +                          COMPUTE ECSTILO = 1.
6 'Perm unable to work'
                                 +                 END IF.
8 'Keeping house'
9 'Student'                      +        ELSE IF DVILO3A = 2.
10 'Other inactive'              +                 COMPUTE ECSTILO = 4.
-8 'NA, ECSTA not known'         +        ELSE IF DVILO3A = 3.
-6 'Child/No int'.               +                 DO IF YINACT = 1.
                                 +                          COMPUTE ECSTILO = 9.
                                 +                 ELSE IF YINACT = 2.
                                 +                          COMPUTE ECSTILO = 8.
                                 +                 ELSE IF YINACT = 3.
                                 +                          COMPUTE ECSTILO = 10.
    Population base: type of
             survey
• Most large scale surveys are household
  surveys they interview 1+ person in
  private households
  – This will exclude people in institutions
  – Has knock effects for particular topics;
    health, age etc.
• Surveys tend to gather limited
  information about children
  – May only relate to their existence age and
    relationships to other household members
  – There may also be other age restrictions on all
    or part of the survey
The population base: nation
    • Most large scale surveys seek to be
      nationally representative but what is a
      nation?
       –   Labour Force Survey = UK
       –   General Household Survey = GB
       –   Health Survey for England = England
       –   Not always apparent from the name
       –   Increase of country-specific surveys
           following devolution
             • Over 80% of the population live in
               England (9% Scotland, 5% Wales, 3%
               NI) so surveys designed for UK wide
               analyses will not generally have large
               enough samples to analyse separate
               countries
  The sampling strategy will affect
           your results
• Few data sources approximate simple
  random sampling
• Stratification increases the precision of
  estimates – the Labour Force Survey is
  stratified
• Clustering reduces the precision of
  estimates – e.g. the General Household
  Survey
• Many major surveys use stratification and
  clustering
• Guidance should be available in the
  documentation
• Practical Exemplars in Analysing
  Surveys (PEAS)
  Disproportionate sampling
• The British Social Attitudes survey
  takes only 1 person per household
  – If left like this the chance of selection in
    the sample is related to the size of one‟s
    household so probabilities of selection are
    unequal
• Over-sampling in order to obtain
  satisfactory sample sizes for minority
  groups (often referred to as „boosts‟)
  – Health Survey for England has done this
    with ethnic minorities
 Weighting can be used to prevent bias
    from disproportionate sampling

                     weighted                         unweighted

            Frequency          % of all         Frequency         % of all
Number in household including R?      Q37
1                    759.2            17.1              1326            29.9
2                  1608.4             36.3              1522            34.3
3                    838.3            18.9                  671         15.1
4                    774.6            17.5                  596         13.4
5                    311.3                 7                232              5.2
6                       91.4              2.1                57              1.3
7                       31.4              0.7                16              0.4
8                       13.8              0.3                 9              0.2
9                        1.1               0                  1               0
10                       1.7               0                  1               0
12                       1.1               0                  1               0
Total              4432.1              100              4432             100
                                Dataset: British Social Attitudes Survey, 2003
Practical research uses of the
             data

• Looking at change over time

• Look at sub-populations

• Using the flexibility of the data to
  look at alternative definitions

• Looking within households
Using successive cross-sectional
        data over time

Pros…                  Cons…
• Reasonable amount    • Limits to continuity in
   of comparability      the data (e.g. ethnic)
• Can pool             • Cannot establish
   years/quarters to     individual change
   look at periods
• Data is
   representative at
   each time point
• Good at looking at
   impacts on groups
   (not individuals)
Change over time
 (source: GHS)
               Secondary analysis:
            change for subpopulations

            SMOKING AND SOCIAL CLASS - MEN
       45
       40
       35
       30
       25
   %




       20
       15
       10
        5
        0
             1994 1995 1996 1997 1998 1999 2000 2001

Marmot, M                     year

(2003)
  Source:HSE           all   sc I&II   sc IV&V
 Looking at small populations

• Many surveys with 10+k
  respondents
  – Permits minority groups to be
    represented
  – Rare subpopulations sample size may be
    too small… can consider combining
    years if appropriate
  Combining datasets to increase
          sample size
Survey data is subject to sampling error!
Example: Pregnancy and Employment

• Using 1998-99 General Household Survey data
  alone there are only 168 pregnant women aged
  16-49
• 95% Confidence interval for % pregnant women
  economically inactive 34.2 – 49.1%
• Combined 3 years’ data to obtain sample of 465
  pregnant women
• Confidence interval using 3 years’ data: 34.9 –
  43.9%
Using the flexibility of the data to
  look at alternative definitions
What are „hours worked‟?
• Is it just paid work? Or unpaid as well?
• Hours usually worked, or actually
  worked last week?
• In main job, or in any job?
• What about students?
• Overtime – paid?
• Overtime – unpaid?
• Lunch hours?
• Do non-workers work zero hours or
  should they be excluded?
Hierarchical data: conceptually



       Household 1                            Household 2
        North West                               Wales
       Social rented                          Owner occupier



Person 1           Person 2     Person 1         Person 2        Person 3
  HoH             Son of HoH        HoH       Spouse of HOH    Parent of HoH
 Female              Male          Male           Female          Female
   28                 12             33             31              72
  GCSE               N/A          Degree          Degree         No quals
P/T Work             N/A       F/T Employee    P/T Employee    Econ Inactive
No LTILL           No LTILL      No LTILL        No LTILL          LTILL
    In addition to straightforward
         secondary analysis:
•   Context to your own primary research
     – Your research could be quantitative or qualitative
     – To assess the national context of an area study
     – To assess whether your sample is typical of national data
     – To assess the scale of behaviours – how big is the behaviour you are
        looking at?

•   Teaching
     – Methods courses
        • Using the data in a hands on manner
        • Using substantive exemplars to demonstrate a methodological
           point
        • Using the surveys as methodological exemplars

     – Substantive courses
        • Making your point using data
        • Integrating methods into substantive courses

     – Teaching datasets
         • General Household Survey
         • Labour Force Survey
         • British Crime Survey
         • Health Survey for England
               Exercise
Suggest datasets which would fulfil
   the following criteria, for a range
   of employment projects:
1. A large up-to-date UK dataset with
   extensive questions on employment
   and training
2. Any 1960s employment microdata
3. A dataset with extensive questions on
   income from sources other than just
   earnings
4. A dataset which could be used to look
   at attitudes to work

								
To top