Docstoc

HEALTH STATISTICS UBC Library Data Services University of

Document Sample
HEALTH STATISTICS UBC Library Data Services University of Powered By Docstoc
					     Canadian Community Health Survey
A new program for collecting health information
         Interuniversity Research Data Seminar
             University of British Columbia

                     Béland Yves

          Household Survey Methods Division
                  Statistics Canada
                    February 19, 2002
           Presentation Outline
Health Information Roadmap
   –Origin of the CCHS
   –Objectives / Content
   –CCHS two-year plan
CCHS Cycle 1.1 - Sample Design
   –Allocation, frame
   –Selection - Oversampling
   –Data Collection
   –Imputation
   –Weighting, sampling error
   –Bootstrap Variance Estimation
   –Data Quality
   –Data Dissemination
CCHS Cycle 1.2 - Overview
Future Cycles of CCHS
             Health Information Roadmap
Four-year action plan to strengthen Canada’s health
information system
Earmarks funds for specific priorities/activities based on
national vision and provincial/regional consultations
Partners: Health Canada, Canadian Institute on Health
Information (CIHI) and Statistics Canada

Key elements:
   –fill critical data gaps in health services and address population health
   data gaps at a sub-provincial level
   –foster common data and technical standards
   –develop indicators and conduct special studies
    Canadian Community Health Survey
     Results of the Consultation Process

Assess health measure variations at many levels of geography
Collect data on issues unique to a health region or province
Respond quickly to emerging issues
Explore certain key health issues in-depth
Analyse the effects of shocks including policy changes
       Canadian Community Health Survey
                Two-year Plan
Cycle 1.1 - Health region-level survey
   –Produce reliable estimates for sub-provincial areas
   –Continuous monthly collection : Sept. 2000 - Nov. 2001
   –Sample size : 133,300 respondents
   –Questionnaire content
       •health determinants
       •health status
       •utilization of health services
       •socio-demographic / socio-economic characteristics
Cycle 1.2 - Provincial-level survey
   –Produce reliable provincial estimates from a sample of 30,000 respondents
   –Monthly collection : May 2002 - Dec. 2002
   –In-depth focus content: 90-100 minute interviews on mental health and
   well-being
             CCHS and NPHS
    A More Robust Health Survey Program

CCHS                              NPHS - Household
 – cross-sectional                    – « goes longitudinal » only, starting
 – sample of 160,000                    in wave 4
   respondents over two years         – sample of 20,000 persons
 – national, provincial and           – national and provincial level
   regional level estimates             estimates
 – customized questionnaires
   at regional level
                                   NPHS - Health Care Institutions
 – built-in flexibility for buy-
   in sample and/or content           – longitudinal and cross-sectional
 – continuous development of          – sample of 2,500
   in-depth health content            – national level estimates
                   CCHS - Cycle 1.1
               Health Region-level survey

Produce timely cross-sectional estimates for 136 health regions

Target population
   –individuals living in private occupied dwellings aged 12 years old or over
   –Exclusions: those living on Indian Reserves and Crown Lands, residents
   of institutions, full-time members of the Canadian Armed Forces and
   residents of some remote areas


CCHS 1.1 covers ~98% of the Canadian population
              CCHS - Questionnaire content

45-minute interview questionnaire

    –30 minutes of common modules common to all health regions

    –10 minutes of optional items selected by health regions from a predefined
    list of modules

    –5 minutes of standard socio-economic items


 27 different versions of the questionnaire


The complete questionnaire can be found at www.statcan.ca/health_surveys
       CCHS - Sample Allocation to Provinces
Prov     Pop          # of            1st Step           2nd Step            Total
         Size         HRs             500/HR             X-prop              Sample

NFLD     551K         6               *2,780             1,230               4,010
PEI      135K         2               1,000              1,000               2,000
NS       909K         6               3,000              2,040               5,040
NB       738K         7               3,500              1,650               5,150
QUE      7,139K       16              8,000              16,280              24,280
ONT      10,714K      37              18,500             23,760              42,260
MAN      1,114K       11              5,500              2,500               8,000
SASK     990K         11              *5,400             2,320               7,720
ALB      2,697K       17              *8,150             6,050               14,200
BC       3,725K       20              10,000             8,090               18,090
CAN      29,000K      133             65,830             64,920              130,750

* The sampling fraction in some small HRs was capped at 1 in 20 households
 CCHS - Sample Allocation to Health Regions

          Pop. Size           # of   Mean
          Range               HRs    Sample Size

Small     less than 75,000    41       525
Medium    75,000 - 240,000    60       900
Large     240,000 - 640,000   25       1,500
X-Large   640,000 and more    7        2,500
CCHS - Sample Allocation to Territories

               Population      Sample

 Yukon           25,000          850
 NWT             36,000          900
 Nunavut         22,000          800
                 CCHS - Sample Frame

CCHS sample selected from three frames:
    •Area frame (Labour Force Survey structure)
    •RDD frame of telephone numbers (Random Digit Dialling)
    •List frame of telephone numbers

Three frames are needed for CCHS for the following reasons:

   1. To yield the desired sample sizes in all health regions
   2. Have a telephone data collection structure in place to quickly address
   provincial/regional requests for buy-in sample and/or content at any point in
   time
   3. Optimize collection costs
         Area frame - Sampling of households


     83% of CCHS sampled households
     Stratified multistage sample design


#1: Each health region is divided     Stratum #1    
    into strata
                                          
                                                           
#2: Clusters selected within strata
    (PPS sampling) (1st stage)        Stratum #2           
#3: Dwellings selected within
                                                  
    clusters (2nd stage)
      RDD frame of telephone numbers
         Sampling of households
Elimination of non-working banks method
  – 7% of CCHS sampled households
  – Telephone bank: area code + first 5 digits of a 7-digit phone #

  1- Keep the banks with at least one valid phone #
  2- Group the banks to encompass as closely as possible the
    health region areas - RDD strata
  3- Within each RDD stratum, first select one bank at random
    and then generate at random one number between 00 and 99
  4- Repeat the process until the required number of telephone
    numbers within the RDD stratum is reached
       List frame of telephone numbers
            Sampling of households
Simple random sample of telephone numbers
  – 10% of CCHS sampled households
  – Telephone companies’ billing address files and Telephone
    Infobase (repository of phone directories)

  1- Create a list of phone numbers
  2- Stratify the phone numbers by health region using the
    residential postal codes
  3- Select phone numbers at random within a health region
  4- Repeat the process until the required number of telephone
    numbers is reached
            CCHS - Sampling of persons

Area frame
    SRS of one person aged 12 years of age or older (82% of households)
    SRS of two persons aged 12 years of age or older (18%)

RDD / List frames
    SRS of one person aged 12 years of age or older
        CCHS - Sampling of persons

Age                1996               LFS                 * CCHS
group              Census             sample              simulated
                                      (all persons)       sample
                                                          ( only 1 person)


12-19              13.2               13.7                 8.5
20-29              16.4               14.4                 14.3
30-44              30.8               28.7                 29.1
45-64              25.8               28.0                 27.9
65 +               13.8               15.2                 20.2

* averaged distribution over 100 repetitions using the May 99 LFS sample
  CCHS - Representativity of sub-populations

To address users’ needs, two sub-population groups
needed larger effective sample sizes:

Youths (12-19 years old)
   –Decision > Oversample youths by selecting a second person (12-19) in
   some households based on their composition


Elderlies (65 years old and +)
   –Decision > Do not oversample - let the general sample selection process
   address the issue by itself
Sampling strategy based on household composition
                                   Number of persons aged 20 or over

       Number              0         1        2        3         4         5+
       of 12-19

        0                  -         A        A        A         A         B

        1                  A         A        C        C         C         B

        2                  A         C        C        C         C         C

        3+                 A         C        C        C         C         C


   A: SRS of one person aged 12+
   B: SRS of two persons aged 12+
   C: SRS of one person in the age group 12-19 and SRS of one person 20+
CCHS - Sample Distribution after Oversampling


Age                1996               * CCHS                       * CCHS
group              Census             simulated                    simulated
                                      sample                       sample
                                      ( only 1 person)             ( some 2 persons)

12-19              13.2                 8.5                            14.9
20-29              16.4                 14.3                           13.1
30-44              30.8                 29.1                           28.1
45-64              25.8                 27.9                           26.3
65 +               13.8                 20.2                           17.6

* averaged distribution over 100 repetitions using the May 99 LFS sample
        CCHS - Initial data collection plan

12 monthly samples
12 collection months + 1           (09 / 2000 - 08 / 2001) + 09 / 2001


Area frame                          RDD / List frames

   CAPI                               CATI
   STC field interviewers             STC call centres
   targeted response rate: 90%        targeted response rate: 85%
   anticipated vacancy rate: 13%      telephone hit rate: 15-60%
 CCHS data collection - Observed situation


Field interviewers
  – workload exceeded field staff capacity


Call centres
  – new collection infrastructure
  – unequal allocation of work among call centres
        CCHS - Final response rates
            Field    Call centres   Total
NFLD         86.6       89.3        86.8
PEI          87.7       82.6        84.7
NS           88.8       89.3        88.8
NB           88.4       92.4        88.5
QUE          85.7       84.8        85.6
ONT          82.8       79.5        82.0
MAN          90.0       85.0        89.5
SASK         87.0       85.4        86.8
ALB          85.2       84.9        85.1
BC           83.9       86.7        84.7
YUK          79.3       95.6        82.7
NWT          89.6       85.4        89.2
NUN *        66.3       34.6        62.5
CAN          85.1       83.1        84.7
             CCHS - Proxy interviews


Higher number of proxy interviews than expected
   – ~ 6% instead of 2-3%


Major consequence: one third of the questionnaire is
                   missing which could be proble-
                   matic for small health regions

Solution : Imputation
                 CCHS - Imputation

 3-step strategy
   – common modules / mental health related optional modules
     / other optional modules


 more than 2,000 imputation classes (region, age,
 sex, questionnaire type, skip patterns, etc…)

 hot-deck imputation using nearest neighbour
 approach according to 12-16 key characteristics
       CCHS - Weighting and Estimation
Three separate weighting systems:
   –Area frame design
   –RDD frame design
   –List frame design
Several adjustments
   – non-response (household and person)
   – seasonal factor
   – etc...
Integration of the two weighting systems based on Deffs
Calibration using a one-dimensional poststratification
adjustment of ten age/sex poststrata within each health region

Variance estimation : bootstrap re-sampling approach
   –set of 500 bootstrap weights for each individual
                 CCHS Weighting Strategy

          Area frame                      Telephone frame

 Initial weight (dwelling level)    Initial weight (dwelling level)
                 |                                  |
  Remove out-of-scope units          Remove out-of-scope units
             |                                  |
   Household nonresponse              Household nonresponse
              |                                  |
# of people in hhld (person wgt)           No phone lines
                |                                |
  Person level nonresponse         # of people in hhld (person wgt)
               |                                   |
       Final Area weight             Person level nonresponse
                                                  |
                                        Multiple phone lines
                                                  |
                                       Final Telephone weight
              Weighting & Estimation

Final Area weight                                           Final Telephone weight

                                Integration
                                     |
                             Seasonal effect
                                   |
                           Post Stratification
                    (by health region, 10 age-sex groups)
                                       |
                       Final CCHS master weight
               CCHS - Special Weights

For various reasons, many other weights are produced

   – Quarter 4 special weight
   – PEI special weight

   – Share weights (master, Q4 and PEI special)
   – Link weights (master, Q4 and PEI special)
                      Sampling Error

Difference in estimates obtained from a sample as
 compared to a census
The extent of this error depends on four factors:
   – sample size
   – variability of the characteristic of interest
   – sample design
   – estimation method
Generally, the sampling error decreases as the size of the
 sample increases
                    Sampling Error

Measure of precision, reliability of the estimates
 – Variance (standard deviation)
 – Coefficient of variation
    • Standard deviation of estimate x 100% / estimate itself
    • CV allows comparison of precision of estimates with
      different scales
 – Example:
    • 24% of population are daily smokers, std dev. = 0.003
    • CV=0.003/0.24 x 100%=1.25%
           Sampling Variability Guidelines

Type of estimate     CV        Guidelines

Acceptable         0.0-16.5    General unrestricted release

Marginal           16.6-33.3   General unrestricted release but with
                               warning cautioning users of the high
                               sampling variablitity.
                                Should be identified by letter M.

Unacceptable       > 33.3      No release.
                               Should be flagged with letter U.
                 Sampling Error

Measuring sampling error for complex sample designs:
 – Simple formulas not available
 – Most software packages do not incorporate design
   effect (and weights adjustments) appropriately for
   calculations
 – Solution for CCHS: the Bootstrap method
                   Bootstrap method

Principle:
  – You want to estimate how precise is your estimation of the
    number of smokers in Canada
  – You could draw 500 totally new CCHS samples, and compare
    the 500 estimations you would get from these samples. The
    variance of these 500 estimations would indicate the
    precision.
  – Problem: drawing 500 new samples is $$$
  – Solution: Use your sample as a population, and take many
    smaller subsamples from it.
                  Bootstrap method
How CCHS Bootstrap weights are created
 (the secret is now revealed!!!)
Adjust THE processdata file (example presented (with given
Apply n-1 clusters among picked n-1 among WEIGHTS*) / stratum)
Repeatfor survey that weWGTS: Estimate thennumber = nsmokers
Selectthe the factweighttimes (*BOOTSTRAP REPLICATES*)n-1 = 1.33)
Startingthe BOOTSTRAP (Wgt) (*BOOTSTRAP (factor of
USING point: Full 500      n within each stratum for a replacement)
                                   B2   . . .   . . . . . . . . B500
ID Wgt Cluster Smoke B1 = # of times .the cluster is selected
A     10      1       X       1
                             10
                             13     0                             3
                                                                 30
                                                                 40
B     10      1       X       1
                             10
                             13     0                             3
                                                                 30
                                                                 40
C     10      1           T = 40
                              1
                             10
                             13     0                             3
                                                                 30
                                                                 40
D
E
      10
      10
              2
              2           Var =  (B - B) / 499
                             13
                              1
                             10
                             10
                             13
                              1
                                   10
                                    1
                                   13
                                   13
                                   10
                                    1        i
                                                    2             0
                                                                  0
F     10      2               1
                             10
                             13    13
                                   10
                                    1                             0
G     10      3       X       0     0                             0
H     10      3               0     0                             0
I     10      4               1
                             10
                             13    27
                                   26
                                   20
                                    2                             0
J     10      4       X       1
                             13
                             10    20
                                    2
                                   27                             0
      40                     39    27   . . . . . . . . . . . .  80
               Bootstrap Method
How Bootstrap replicates are built (cont’d)
  The “real” recipe
    1- Subsampling of clusters (SRS) within strata
    2- Apply (initial design) weight
    3- Adjust weight for selection of n-1 among n
    4- Apply all standard weight adjustments
      (nonresponse, share, etc.)
    5- Post-stratification to population counts
  The bootstrap method intends to mimic the same
   approach used for the sampling and weighting
   processes
                Bootstrap Method

Sampling weight vs. Bootstrap weights
   – Sampling weight used to compute the estimation of a
     parameter (e.g.: number of smokers)
   – Bootstrap weights used to compute the precision of the
     estimation (e.g.: the CV of the number of smokers
     estimation)
        CCHS - Data Dissemination Strategy
Wide range of users and capacity
   –136 health regions
   –13 provincial/territorial Ministries of Health
   –Health Canada and CIHI
   –Internal STC analysts
   –Academics
   –Others


Data products
   –Microdata
   –Analytical products (Health Reports, How Healthy are Canadians, etc…)
   –Tabular statistics (ePubs, Cansim II, community profiles, etc…)
   –Client support (head and regional offices, CCHS website, workshops, etc…)
                 CCHS - Access to microdata
Master file
   –all records, all variables
       •Statistics Canada
       •university research data centres
       •remote access


Share / Link files
   –respondents who agreed to share / link
       •provincial/territorial Ministries of Health
       •health regions (through the STC third-party share agreement)


Public Use Microdata File (PUMF)
   –all records, subset of variables with collapsed response categories
       •free for 136 health regions
       •cost recovery for others
          CCHS - Overview of Cycle 1.2

Produce provincial cross-sectional estimates from a sample of
30,000 respondents

Area frame sample only / one person per household

CAPI only

90-100 minute in-depth interviews on mental health and well-
being based on WMH2000 questionnaire

Scheduled to begin collection in May 2002
                   CCHS - Future Plans

Same two-year cycle approach:
   –health region level survey starting in January 2003
   –provincial level survey starting in January 2004


New consultation process with provincial and regional
authorities

Flexible sample designs (adaptable to regional needs)

Development of an in-depth nutrition focus content (Cycle 2.2)
    CCHS Web site


www.statcan.ca/health_surveys

www.statcan.ca/enquetes_santé
       Contacts in Methodology

Yves Béland:
 yves.beland@statcan.ca

François Brisebois:
   francois.brisebois@statcan.ca

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/10/2012
language:English
pages:43