SNP Applications

Document Sample
SNP Applications Powered By Docstoc
					          SNP Applications




statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt
    Human Genome and SNPs
• Now that the human genome is (mostly)
  sequenced, attention turning to the
  evaluation of variation
• Alterations in DNA involving a single base
  pair are called single nucleotide
  polymorphisms, or SNPs
• Map of ~1.4 million SNPs (Feb 2001)
• It is estimated that ~60,000 SNPs occur
  within exons; 85% of exons within 5 kb
  of nearest SNP
                SNP Initiatives
• Industrial
  –   Genset
  –   Incyte
  –   Celera
  –   CuraGen
• Academic – Industry Consortium
• Governmental
  – US
  – Japan
• Non-industrial scale academic programs
   Goals of SNP Initiatives
• Immediate goals:
   – Detection/identification of …
   – The hundreds of thousands of SNPs
     estimated to be present in the
     human genome
   – Interest also in other organisms, e.g.
     potatoes(!)
   – Establishment of SNP Database(s)
Longer term goals: Areas of
      SNP Application
• Gene discovery and mapping
• Association-based candidate
  polymorphism testing
• Diagnostics/risk profiling
• Response prediction
• Homogeneity testing/study design
• Gene function identification
• …etc.
• See Schork, Fallin, Lanchbury 2000
            Polymorphism
• Technical definition: most common
  variant (allele) occurs with less than 99%
  frequency in the population
• Also used as a general term for variation
• Many types of DNA polymorphisms,
  including RFLPs, VNTRs, microsatellites
• ‘Highly polymorphic’ = many variants
    Use of Polymorphism in
        Gene Mapping
• 1980s – RFLP marker maps
• 1990s – microsatellite marker maps
    SNPs in Genetic Analysis
• Abundance – lots
• Position – throughout genome
• Haplotype patterns – groups of SNPs
  may provide exploitable diversity
• Rapid and efficient to genotype
• Increased stability over other types of
  mutation
• Recombination patterns – e.g. ‘hot spots’
Gene Discovery and Mapping
• Linkage Analysis
  – Within-family associations between
    marker and putative trait loci
• Linkage Disequilibrium (LD)
  – Across-family associations
       One locus: Founder
      genotype probabilities
• Founder: individual whose parents are
  not in the pedigree
• Usually obtain genotype probs. assuming
  Hardy-Weinberg Equilibrium (HWE):
             Say P(D) = p, P(d) = 1-p;
 Then P(DD) = p2, P(Dd) = 2p(1-p), P(dd) = (1-p)2
• Genotypes of founder couples treated as
  independent:
      P(Father Dd and Mother DD) = 2p(1-p)3
      One locus: Transmission
         probabilities (I)
• Offspring get their genes according to Mendel’s
  rules…
• Independently for different offspring

        D                      D
        d       1          2   d

                     3    d
                          d

            P(3 dd | 1 Dd & 2 Dd) = ½ x ½
One locus: Transmission
  probabilities (II)
  D                       D
          1       2
  d                       d


      3       4       5
      d       D       D
      d       d       D
P(3 dd & 4 Dd & 5 DD| 1 Dd & 2 Dd)
  = (½ x ½) x (2 x ½ x ½) x (½ x ½)
    One locus: Penetrance
• Usual to assume that the chance of
  having a particular phenotype (being
  affected with a disease, say) depends
  only on the genotype at one locus
• Complete penetrance:
            P(affected|DD) = 1
• Incomplete penetrance:
          P(affected|DD) = p (<1)
One locus: putting it all together
                      1           2           Assume:
          D                               D
          d                               d   P(Aff|dd) = .1
              3           4       5           P(Aff|Dd) = .3
                                              P(Aff|DD) = .8
                  d           D       D       P(D) = .01
                  d           d       D
P(pedigree) = (2 x .01 x .99 x .7) x (2 x .01 x .99 x .3)
    x (½ x ½ x .9) x (2 x ½ x ½ x .7) x (½ x ½ x .8)
Crossing over and Recombination
      Two loci: Linkage and
         Recombination
    Dd                         Dd
               1          2
    TT                         tt

                   3     Dd
                         Tt

                         T          T
3 produces             D (1-)/2 /2       ½
gametes in
proportions:           d /2        (1-)/2 ½
                         ½          ½
        Recombination Fraction

•  = ½ : independent assortment (Mendel)
•  < ½ : linked loci
•  = 0 : tightly linked loci (no recombination)
• In 3, if the loci are linked then D-T and d-t
  are parental haplotypes, D-t and d-T are
  recombinant haplotypes
  LOD-score Linkage Analysis
• LOD(*) = log10 of the odds ratio L:
         L = P(data|*)/P(data|½)
• LOD(*) measures the relative strength
  of the data for  = * rather than  = ½
• Can compute LOD() at several values
• Can find the value  maximizing the LOD
IBD Allele Sharing
    Allele-sharing Methods
• Based on number (or proportion) of
  alleles shared identical by descent
  (IBD) of related individuals
• Can be done either assuming
  (likelihood-based) or not assuming
  (nonparametric) a genetic mode of
  inheritance for a trait
               Errors
• Genotyping errors can result in false
  positive or false negative findings
• Data checking/cleaning necessary
  (although there are approaches which
  model error)
• Must be especially careful with SNP
  genotypes, because errors often pass
  simple Mendelian checks
  Disease-Marker Association
• A marker locus is associated with a
  disease if the distribution of genotypes
  at the marker locus in disease-affected
  individuals differs from the distribution
  in the general population
• A specific allele may be positively
  associated (over-represented in
  affecteds) or negatively associated
  (under-represented)
      Examples: Alzheimer’s
 • Alzheimer’s disease and ApoE

               E4 present   E4 absent
    Patients   58           33
    Controls   16           55


The E4 allele appears to be positively associated
with Alzheimer’s disease:
       Odds Ratio = (58/16)/(33/55) = 6
          Examples: HLA
  Disease             Allele   RR
  Ankylosing          B27      87
  spondylitis
  Myasthenia gravis   B8       4.1

  Systemic lupus      B8       2.1
  erythematosus
  Hemachromotosis     A3       8.2

(and many more…)
     Linkage Disequilibrium

               Disease locus
   LD          Alleles D, d    penetrance



Marker locus
                               Disease
Alleles M, m
      Linkage Disequilibrium
• Concept of the ‘historical recombinant’
• Explanations for observed association
  between marker and disease:
  – Marker locus may be a disease
    susceptibility locus
  – Marker locus may be linked to disease
    susceptibility locus
  – Spurious result due, e.g. to admixture,
    population stratification, heterogeneity
             Linkage and LD


Mutation occurs          Allele D is created
Nearby marker            Allele M was nearby




  D and M subsequently transmitted together
Candidate Polymorphism Testing

 • Linkage and LD assume markers have
   indirect association with the trait
 • Large SNP collections may allow
   testing for direct, physiologically
   relevant associations with trait
  Diagnostics/Risk Profiling
• Identified SNP associations can
  potentially be used to develop
  diagnostic tools
• Applicability will require large-scale
  studies, since most diseases of
  interest now are influenced by many
  genetic and nongenetic factors
       Response Prediction
• Related to diagnosis/risk assessment
• Strategy: stratify populations to
  improve effectiveness of interventions
• Pharmaceutical companies especially
  interested in this:
  – Aim to identify those likely to respond
  – Predict toxicity reactions in susceptible
    individuals
• Response to any kind of substance;
  creation of ‘functional foods’
       Homogeneity Testing

• Test to protect against false inferences
  about the relationship between
  endpoints (e.g. disease) and risk factors
• Assess generalizability of results
• Can assess the homogeneity of the
  genetic background of study
  participants using a panel of randomly
  distributed SNPs
Gene Function Identification
• Alternative to other experimental
  procedures (e.g. knock-outs, which
  cannot be used in humans)
• Studies to compare individuals with
  and without naturally occurring
  disease predisposing genetic profiles
      Haplotype Variation
• The large databases already available
  (and increasing in size) should allow
  characterization of haplotype
  variation across the genome in
  different populations
• Can help population geneticists trace
  evolution and reveal connections
  between populations/ethnic groups