Pitfalls in Genetic Association Studies by sammyc2007

VIEWS: 187 PAGES: 42

									         Pitfalls in Genetic
        Association Studies

                M. Tevfik DORAK

Paediatric and Lifecourse Epidemiology Research Group
             Sir James Spence Institute
              Newcastle University, U.K.


            Clinical Studies & Objective Medicine
                 Bodrum, 15-16 April 2006
          Incident or prevalent cases
   Comparable controls or convenience samples
               Population based?

Confounding by ethnicity / Population stratification
  Confounding by locus / Linkage disequilibrium

                Statistical power
               Multiple comparisons

      No adjustment for known associations
            Effect modification by sex
            Different genetic models

                 Overinterpretation
                   LD ruled out?
    Biological plausibility (a priori hypothesis)?

             Replication / Consistency
                  Publication bias
Tabor, Risch & Myers. Nat Rev Genet 2002   (www)
Internal Validity of an Association Study

              Avoid "BIAS"
       Control for "CONFOUNDING"
           Rule out "CHANCE"
Internal Validity of an Association Study

               Avoid "BIAS"
                 Be careful!
       Control for "CONFOUNDING"
    Matching, Stratification, MV Analysis
            Rule out "CHANCE"
         Large sample, Replication
       Special Kinds of Confounding in
           Genetic Epidemiology

           CONFOUNDING by Locus (LD)
             MV Analysis, LD Analysis

            CONFOUNDING by Ethnicity
             (Population Stratification)
        Matching, Stratification, MV Analysis
         Family-Based Association Studies
Genomic Controls and Specially Designed GE Analysis
       Special Kinds of Confounding in
           Genetic Epidemiology

        CONFOUNDING by Locus (LD)
             MV Analysis, LD Analysis

            CONFOUNDING by Ethnicity
             (Population Stratification)
        Matching, Stratification, MV Analysis
         Family-Based Association Studies
Genomic Controls and Specially Designed GE Analysis
   Mapping Disease Susceptibility Genes by Association Studies




Plot of minus log of P value for case-control test for allelic association with AD, for SNPs immediately
                                     surrounding APOE (<100 kb)


Martin, 2000 (www)
       Example: Linkage Disequilibrium


  HLA-B47 association with congenital adrenal
    hyperplasia (Dupont et al, Lancet 1977)
  HLA-B14 association with late-onset adrenal
hyperplasia (Pollack et al, Am J Hum Genet 1981)

  Is congenital adrenal hyperplasia an immune
          system-mediated disease?
     Example: Linkage Disequilibrium


HLA-B47 association with congenital adrenal
hyperplasia is due to deletion of CYP21A2 on
          HLA-B47DR7 haplotype
HLA-B14 association with late-onset adrenal
 hyperplasia is due to an exon 7 missense
mutation (V281L) in CYP21A2 on HLA-B14DR1
                  haplotype
Preliminary evidence of an association between HLA-
DPB1*0201 and childhood common ALL supports an
infectious aetiology
Leukemia 1995;9(3):440-3

Evidence that an HLA-DQA1-DQB1 haplotype influences
susceptibility to childhood common ALL in boys provides
further support for an infection-related aetiology
Br J Cancer 1998;78(5):561-5



                                         Why not LD?
               Publication Bias

     Negative studies do not get published -
      NIH Genetic Associations Database ?

       A different kind of publication bias?


Preliminary evidence of an association between HLA-
DPB1*0201 and childhood common ALL supports an
infectious aetiology
Leukemia 1995;9(3):440-3

Evidence that an HLA-DQA1-DQB1 haplotype influences
susceptibility to childhood common ALL in boys provides
further support for an infection-related aetiology
Br J Cancer 1998;78(5):561-5
       Special Kinds of Confounding in
           Genetic Epidemiology

           CONFOUNDING by Locus (LD)
             MV Analysis, LD Analysis

            CONFOUNDING by Ethnicity
             (Population Stratification)
        Matching, Stratification, MV Analysis
         Family-Based Association Studies
Genomic Controls and Specially Designed GE Analysis
Wacholder, 2002 (www)
Population Stratification




                            Marchini, 2004 (www)
Cardon & Palmer, 2003 (www)
Internal Validity of an Association Study

               Avoid "BIAS"
                 Be careful!
       Control for "CONFOUNDING"
    Matching, Stratification, MV Analysis
            Rule out "CHANCE"
         Large sample, Replication
Example: Functional Correlation
                             Example: Replication


       HFE-C282Y Association in Childhood ALL
 30                                                    50
                                                   %
%        Patients                                      45      Patients
 25      Controls                                              Controls
                                                       40
                                                       35
 20
                                                       30
 15                                                    25
                                                       20
 10
                                                       15
                                                       10
  5
                                                        5
  0                                                     0
       All patients   Males Only    Males (cALL)              All patients   Males Only      Males (cALL)


      WELSH GROUP                                           SCOTTISH GROUP
      117 patients - 415 newborns                           135 patients - 238 newborns
      P = 0.005; OR = 2.8 (1.4 to 5.4)                      P = 0.0004; OR = 3.0 (1.7 to 5.4)
      In cALL: P = 0.02; OR = 2.9 (1.4 to 6.4)              In cALL: P < 0.0001; OR = 4.7 (2.5 to 8.9)


                                                                                      Dorak et al, Blood 1999
Palmer LJ. Webcast   (www)
Multiple Comparisons & Spurious Associations




                                      Diepstra, Lancet 2005 (www)
Genetic Models and
Case-Control Association Data Analysis
The data may also be analysed assuming a prespecified
genetic model. For example, with the hypothesis that
carrying allele B increased risk of disease (dominant
model), the AB and BB genotypes are pooled giving a
2x3x2 table. This is particularly relevant when allele B is
rare, with few BB observations in cases and controls.
Alternatively, under a recessive model for allele B, cells AA
and AB would be pooled. Analysing by alleles provides an
alternative perspective for case control data. This breaks
down genotypes to compare the total number of A and B
alleles in cases and controls, regardless of the genotypes
from which these alleles are constructed. This analysis is
counter-intuitive, since alleles do not act independently, but
it provides the most powerful method of testing under a
multiplicative genetic model, where risk of developing a
disease increases by a factor r for each B allele carried: risk
r for genotype AB and r2 for genotype BB. If a multiplicative
genetic model is appropriate, both case and control
genotypes will be in Hardy–Weinberg equilibrium, and this
can be tested for. A fourth possible genetic model is
additive, with an increased disease risk of r for AB
genotypes, and 2r for BB genotypes. This model shows a
clear trend of an increased number of AB and BB
genotypes, with the risk for AB genotypes approximately
half that for BB genotypes. The additive genetic model can
be tested for using Armitage’s test for trend.


Lewis CM. Brief Bioinform 2002 (www)
           HLA-DRB4 Association in Childhood ALL
40                                            25
                                             %
%
                                  *
           Patients                                    Patients
                                              20
30         Controls                                    Controls

        Boys, n=64                                   Girls, n=53
                                              15

20

                                              10
                                                                              *
10
                                                 5


    0                                            0
            DRB5      DRB3         DRB4                 DRB5       DRB3        DRB4



    Homozygosity for HLA-DRB4 family is associated with susceptibility to childhood ALL
                  in boys only (P < 0.0001, OR = 6.1, 95% CI = 2.9 to 12.6 )
         Controls are an unselected group of local newborns (201 boys & 214 girls)
               * Case-only analysis P = 0.002 (OR = 5.6; 95% CI = 1.8 to 17.6)
    This association extends to a DRB4-HSP70 haplotype (OR = 8.3; 95% CI = 3.0 to 22.9)
                This association has been replicated in Scotland and Turkey
                    HLA-DRB4 ASSOCIATION

                                  ADDITIVE MODEL
Linear Model

Logit estimates                                   Number of obs    =        265
                                                  LR chi2(1)       =      14.24
                                                  Prob > chi2      =     0.0002
Log likelihood = -139.37794                       Pseudo R2        =     0.0486
------------------------------------------------------------------------------
        caco | Common Odds Ratio   Std. Err.    z    P>|z|         [95% CI]
-------------+----------------------------------------------------------------
     drb4add |      2.208651       .4734163    3.70 0.000     1.45103 - 3.36186
------------------------------------------------------------------------------


Heterozygosity and Homozygosity

Logit estimates                                   Number of obs   =        265
                                                  LR chi2(2)      =      22.00
                                                  Prob > chi2     =     0.0000
Log likelihood = -135.49623                       Pseudo R2       =     0.0751
------------------------------------------------------------------------------
         caco | Odds Ratio    Std. Err.      z    P>|z|   [95% Conf. Interval]
-------------+----------------------------------------------------------------
Wild-type      |   1.00 (ref)
Heterozygosity |   1.060652   .3557426     0.18   0.861     .549642    2.04676
Homozygosity   |   6.258503    2.65464     4.32   0.000     2.72534   14.37211
------------------------------------------------------------------------------
  HLA-DRB4 - HSPA1B HAPLOTYPE ASSOCIATION

                           EFFECT MODIFICATION

Logit estimates                                   Number of obs    =       532
                                                  LR chi2(3)       =     23.97
                                                  Prob > chi2      =    0.0000
Log likelihood = -268.27826                       Pseudo R2        =    0.0428
------------------------------------------------------------------------------
        caco |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex | -.0299037    .2229554    -0.13   0.893    -.4668883    .4070808
       hsp53 |   2.530033   .5929603     4.27   0.000     1.367852    3.692214
_IsexXhsp5~2 | -2.758189    .8812645    -3.13   0.002    -4.485436   -1.030943
       _cons | -1.321474    .3517969    -3.76   0.000    -2.010984   -.6319651
------------------------------------------------------------------------------


                           CONFOUNDING BY SEX

Logit estimates                                   Number of obs    =       532
                                                  LR chi2(2)       =     11.99
                                                  Prob > chi2      =    0.0025
Log likelihood = -274.26995                       Pseudo R2        =    0.0214
------------------------------------------------------------------------------
        caco | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hsp53 |    3.32777   1.191429     3.36   0.001     1.649684    6.712832
         sex |   .7693041   .1636106    -1.23   0.218     .5070725    1.167148
------------------------------------------------------------------------------
                                Adjusted for sex?
  HLA-DRB4 - HSPA1B HAPLOTYPE ASSOCIATION


                                 BOYS ONLY
Logit estimates                                   Number of obs    =       265
                                                  LR chi2(1)       =     22.41
                                                  Prob > chi2      =    0.0000
Log likelihood = -135.29119                       Pseudo R2        =    0.0765
------------------------------------------------------------------------------
        caco | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hsp53 |   12.55392   7.444028     4.27   0.000     3.926876    40.13392
------------------------------------------------------------------------------


                                 GIRLS ONLY
Logit estimates                                   Number of obs    =       267
                                                  LR chi2(1)       =      0.13
                                                  Prob > chi2      =    0.7205
Log likelihood = -132.98706                       Pseudo R2        =    0.0005
------------------------------------------------------------------------------
        caco | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hsp53 |      0.796   .5189439    -0.35   0.726       .22181    2.856571
------------------------------------------------------------------------------

                  The association is modified by sex
Cardon & Bell. Nat Rev Genet 2001   (www)
Current Criteria for Good Association Studies
Cardon & Bell. Nat Rev Genet 2001   (www)
Statistical checklist for genetic association studies
- In a case-control study:
             Cases and controls derive from the same study base
             There are more controls than cases (up to 5-to-1, for increased statistical power)
             There are at least 100 cases and 100 controls
- Statistical power calculations are presented
- Hardy-Weinberg equilibrium (HWE) is checked and appropriate tests are used
- If HWE is violated, allelic association tests are not used
- Possible genotyping errors and counter-measures are discussed
- All statistical tests are two-tailed
- Alternative genetic models of association considered
- The choice of marker/allele/genotype frequency (for comparisons) is justified
- For HLA associations, a global test for association (G-test, RxC exact test) for each locus is used
(if necessary, with correction for multiple testing)
- Chi-squared and Fisher tests are NOT used interchangeably
- P values are presented without spurious accuracy (with two decimal places)
- Strength of association has been measured (usually odds ratio and its 95% CI)
- In a retrospective case-control study, ORs are presented (as opposed to RRs)
- Multiple comparisons issue is handled appropriately (this does not necessarily mean Bonferroni
corrections)
- Alternative explanations for the observed associations (chance, bias, confounding) are
discussed


                                                       http://www.dorak.info/hla/stat.html
Multifactorial Etiology




   ROCHE Genetic Education (www)
Models of gene–environment interactions




                Hunter, 2005 (www)
Generating Protein Diversity from the 'Small' Genome




                                           Banks, 2000 (www)
Generating Protein Diversity from the 'Small' Genome

   Alternative Splicing Can Generate Very Large Numbers of
              Related Proteins From a Single Gene
     Most extreme example is the Drosophila Dscam Gene:




           12 x 48 x 33 x 2 = 38,016 alternative splice variants


                                                          Black, Cell 2000 (www)
                                                     Wojtowicz, Cell 2004 (www)

                                   DSCAM = Down syndrome cell adhesion molecule
Generating Protein Diversity from the 'Small' Genome

      Alternative Splicing Can be Tissue or Cell-Specific




                     Lodish et al. Molecular Cell Biology, 5th Ed, WH Freeman (www)
Generating Protein Diversity from the 'Small' Genome

        mRNA editing (base modification) is a different
             mechanism of alternative splicing




                                   Lodish et al. Molecular Cell Biology, 5th Ed, WH Freeman (www)




                              OMIM 107730 (www)
                            Chen & Chan, 1996 (www)
                             Wedekind, 2003 (www)
                   RNA Editing in The Cell – NCBI Online (www)
     Integration of proteomics in genetic
 epidemiology studies would eliminate a lot of
     obstacles arising from the following
Only <2% of the genome is protein-coding and most sequence
                 variants are silent changes
 Even genome-wide sequence variant studies cannot identify
      the genomic counterparts of 1.5 million proteins
  Epigenetic changes, alternative transcription/splicing and
posttranslational modifications cannot be predicted by study of
                      sequence variants
  Genomic DNA studies does not take into account selective
       expression of genes in certain cell or tissues
 No genetic association is complete without demonstration of
                   the functional relevance
              50s Rule
         Genome - 1:50 coding
        Gene:Protein - 1:50 ratio


Overall efficiency of pure genomic studies
                 1:2500
http://www.dorak.info

								
To top