Linkage Disequilibrium by HC111213171416

VIEWS: 15 PAGES: 46

									                                      Slide 1




   Linkage Disequilibrium


         Joe Mychaleckyj
Center for Public Health Genomics
             982-1107
       jcm6t@virginia.edu

                    Joe Mychaleckyj
                                           Slide 2



Today we’ll cover…

•   Haplotypes
•   Linkage Disequilibrium
•   Visualizing LD
•   HapMap




                         Joe Mychaleckyj
                                                                                        Slide 3



References


                                       Principles of Population Genetics,
                                       Fourth Edition (Hardcover) by Daniel L. Hartl,
                                       Andrew G. Clark (Author)




                                       Genetic Data Analysis II Bruce S Weir
         QuickTime™ an d a
TIFF (Uncompressed) decompressor
   are need ed to see this picture .

                                                                            x
                                                                                x
                                                                                    x




                                                          Joe Mychaleckyj
                                            Slide 4



References

       Statistical Genetics: Gene Mapping Through
       Linkage and Association Eds Benjamin M.
       Neale, Manuel A.R. Ferreira,
       Sarah E. Medland, Danielle Posthuma




                         Joe Mychaleckyj
                                                                         Slide 5




                   SNP1                 SNP2                SNP3
                   [A / T]              [C / G]             [A / G]
                      A                    C                   G
                      A                    C                   A
                      T                    G                   G
Haplotype: specific combination of alleles occurring (cis) on the same
chromosome (segment of chromosome)

N SNPs - How many Haplotypes are possible ?




          2N (ie very large diversity possible)


                                          Joe Mychaleckyj
                                            Slide 6



Terminology

• Haplotype: Specific combination
  (phasing) of alleles occurring (cis) on
  the same chromosomal segment
• Linkage/Linked Markers: Physical co-
  location of markers on the same
  chromosome
• Diplotype: Haplogenotype ie pair of
  phased haplotypes one maternally, one
  paternally inherited

                        Joe Mychaleckyj
                                                             Slide 7

                 SNP1 [ A / a ]           SNP2 [ B / b ]



Major Allele Freq:     p(A)                           p(B)
Minor Allele Freq:     p(a)                           p(b)
 Independently segregating SNPs:
 Haplotype Frequency p(ab) = p(a) x p(b)

             LINKAGE EQUILIBRIUM
 (How many haplotypes in total ?)

 LINKAGE DISEQUILIBRIUM
 Haplotype Frequency p(ab)≠ p(a) x p(b)


                                    Joe Mychaleckyj
                                          Slide 8



Linkage Disequilibrium

• Non-random assortment of alleles at 2
  (or more) loci
• The closer the markers, the stronger
  the LD since recombination will have
  occurred at a low rate
• Markers co-segregate within and
  between families


                        Joe Mychaleckyj
                                                             Slide 9


* LINKAGE EQUILIBRIUM *
                                                   Not a Punnett
            SNP2 Allele                               Square!
SNP1
Allele      B          b
  A      p(A)p(B)             p(A)p(b) p(A)
  a      p(a)p(B)             p(a)p(b) p(a)

                p(B)   p(b)
 Example:
 p(A)p(B)+p(a)p(B)=p(B){ p(A)+p(a)} = p(B)

                                 Joe Mychaleckyj
                                                             Slide 10

                 SNP1 [ A / a ]           SNP2 [ B / b ]



Major Allele Freq:       p(A)                         p(B)
Minor Allele Freq:       p(a)                         p(b)


 LINKAGE DISEQUILIBRIUM
 Haplotype Frequency p(ab) = p(a) p(b) + D
 (sign of D is generally arbitrary, unless comparing D values
 between populations or studies)
 D: Lewontin’s LD Parameter (Lewontin 1960)




                                    Joe Mychaleckyj
                                                   Slide 11


* LINKAGE DISEQUILIBRIUM *

          SNP2 Allele
SNP1
Allele     B          b
  A      p(A)p(B)+D         p(A)p(b)-D p(A)
  a      p(a)p(B)-D         p(a)p(b)+D p(a)

             p(B)         p(b)


 p(A)p(B)+D + p(a)p(B)-D =p(B){ p(A)+p(a)} =
 p(B)
                                 Joe Mychaleckyj
                                                      Slide 12

     b         B

a   0.16      0.04    p(a)=0.20   What is the LD ?
                                            ≠0
A   0.14      0.66    p(B)=0.80
                                  p(ab) ≠ p(a) p(b)
    p(b)=0.30 p(B)=0.70

                                  p(ab) = p(a) p(b) + D

    0.16 = 0.2 x 0.3 + D
    D = 0.1
    Since p(ab) = p(a)p(b)+ D
    +D was used and D is +ve here, but arbitrary
    eg can relabel alleles A,B as minor
                                  Joe Mychaleckyj
                                                                Slide 13

Range of D values (-ve to +ve)

D has a minimum and maximum value that depends on the allele
frequencies of the markers
Since haplotype frequencies cannot be -ve
p(aB) = p(a)p(B) - D ≥ 0                D ≤ p(a)p(B)
p(Ab) = p(A)p(b) - D ≥ 0                D ≤ p(A)p(b)
These cannot both be true, so D ≤ min( p(a)p(B), p(A)p(b) )
p(ab) = p(a)p(b) + D ≥ 0                D ≥ -p(a)p(b)
p(AB) = p(A)p(B) + D ≥ 0                D ≥ -p(A)p(B)
These cannot both be true, so D ≥ max( -p(a)p(b), -p(A)p(B) )


* Similar equations if we had defined p(ab) = p(a)p(b) - D


                                        Joe Mychaleckyj
                                            Slide 14

Limits of D LD Parameter

 Limits of D are a function of allele
 frequencies

 Standardize D by rescaling to a
 proportion of its maximal value for the
 given allele frequencies (D')
          D’ =    D
                 Dmax



                          Joe Mychaleckyj
                                                Slide 15

D’ (Lewontin, 1964)
D’ = D / Dmax
Dmax = min (p(A)p(B), p(a)p(b))       D<0
Dmax = min (p(A)p(b), p(a)p(B))       D>0
Again, sign of D’ depends on definition

D’ = 1 or -1 if one of p(A)p(B), p(A)p(b),
  p(a)p(B), p(a)p(b) = 0
= Complete LD (ie only 3 haplotypes seen)
D’=1 or -1 suggests that no recombination has
  taken place between markers
Beware rare markers - may not have enough
  power/sample size to detect 4th haplotype
                            Joe Mychaleckyj
                                                                        Slide 16

D’ Interpretation

        b       B                         b          B

  a   0.06      0.14   p(a)=0.20    a   0.2           0        p(a)=0.20

  A   0.24      0.56   p(A)=0.80    A   0.1           0.7      P(A)=0.80

      p(b)=0.30 p(B)=0.70               p(b)=0.30 p(B)=0.70

D=0 ; Dmax undefined               D=Dmax =0.14 ; D’ = +1


                                           p(a) = 0.2       p(b)= 0.3
  D’=1 (perfect LD using D’ measure
   - No recombination between marker
   - Only 3 haplotypes are seen



                                        Joe Mychaleckyj
                                                            Slide 17



Creation of LD

• Easiest to understand when markers are
  physically linked
• Creation of LD
  –   Mutation
  –   Founder effect
  –   Admixture
  –   Inbreeding / non-random mating
  –   Selection
  –   Population bottleneck or stratification
  –   Epistatic interaction
• LD can occur between unlinked markers
• Gametic phase disequilibrium is a more
  general term
                                          Joe Mychaleckyj
                                                       Slide 18


SNP1                 SNP1      SNP2

                      A          B        n=3 haplotypes

A
                                          Recombination
    n=2 haplotypes
                      A          b

a                     a          B


            SNP1     SNP2

              A        B

              A        b    n=4 haplotypes

              a       B

              a       b
                            Joe Mychaleckyj
                                          Slide 19



Destruction of LD

• Main force is recombination
• Gene conversion may also act at short
  distances (~ 100-1,000 bases)
• LD decays over time (generations of
  interbreeding)




                        Joe Mychaleckyj
                                                           Slide 20

   SNP1       SNP2        Probability Recombination
                          occurs = θ
                          Probability Recombination
                          does not occur = 1-θ

Initial LD between SNP1 - SNP2: D0

After 1 generation

Preservation of LD:
       D1 = D0(1-θ)
                             NB: Overly simple model -
After t generations:         does not account for allele
                             frequency drift over time
        Dt = D0 (1- θ)t

                            Joe Mychaleckyj
                                                         Slide 21




                          Dt = D0 (1-θ)t




         QuickTime™ an d a
TIFF (Uncompressed) decompressor
   are need ed to see this picture .




                                       Joe Mychaleckyj
                                                    Slide 22

r2 LD Parameter (Hill & Robertson, 1968)
   r   2   =         D2
               p(a)p(b)p(A)p(B)

 • Squared correlation coefficient varies 0 - 1
 • Frequency dependent
 • Better LD measure for allele correlation
   between markers - predictive power of SNP1
   alleles for those at SNP2
 • Used extensively in disease gene or
   phenotype mapping through association
   testing

                                  Joe Mychaleckyj
                                                                                 Slide 23

r2 Interpretation

           b        B                               b               B

    a    0.06        0.14 p(a)=0.20         a     0.2               0     p(a)=0.20

    A    0.24        0.56 p(A)=0.80         A     0.1               0.7   p(A)=0.80

         p(b)=0.30 p(B)=0.70                      p(b)=0.30 p(B)=0.70

D=0 ; Dmax undefined                      D=Dmax =0.14 ; D’ = +1

r2 = 0                                    r2 = 0.14/0.24 = 0.58

                                                p(a) = 0.2    p(b) = 0.3
r2 ≠ 1 Correlation is not perfect, even
though D’ = 1
r2 = 1 if D’ = 1 and p(a) = p(b) = 0.3


                                                  Joe Mychaleckyj
                                                                Slide 24

r2 Interpretation
                                      p(a) = 0.3   p(b) = 0.3
Only 2 haplotypes:


r2 = 1 Correlation is perfect
D’ =1 (less than 4 haplotypes)
p(a) = p(b) (= 0.3 in this example)

   • r2=1 when there is perfect correlation between
     markers and one genotype predicts the other exactly
       – Only 2 haplotypes present
   • D’ = 1 ≠> r2 = 1
   • No recombination AND markers must have identical
     allele frequency
       – SNPs are of similar age
   • Corollary
      – Low r2 values do not necessarily = high recombination
      – Discrepant allele frequencies Joe Mychaleckyj
                                                   Slide 25


Common Measures of Linkage Disequilibrium

   -1                 D’                       1
                Recombination



    0                 r2                       1
                 Correlation

     Other LD Measures exist, less
     common usage
                             Joe Mychaleckyj
                                  Slide 26




Visualizing LD metrics




                Joe Mychaleckyj
                                                   Slide 27



                    SNP
                    1 2     3    4      5      6
| D’ |
          SNP1
1.0
0.8       SNP2
0.6       SNP3
0.2
          SNP4
 0
          SNP5
          SNP6


 Not usually worried about sign of D’
                                 Joe Mychaleckyj
                  Slide 28




Joe Mychaleckyj
                                         Slide 29



Haploview: TCN2 (r2)




                       Joe Mychaleckyj
                                              Slide 30




            http://www.hapmap.org

Launched October 2002


                            Joe Mychaleckyj
                                                                 Slide 31


International HapMap
Project
• Initiated Oct 2002
• Collaboration of scientists worldwide
• Goal: describe common patterns of human
  DNA sequence variation
• Identify LD and haplotype distributions
• Populations of different ancestry (European,
  African, Asian)
   – Identify common haplotypes and population-specific
     differences
• Has had major impact on:
   – Understanding of human popualtion history as reflected in
     genetic diversity and similarity
   – Design and analysis of genetic association studies

                                      Joe Mychaleckyj
                                                       Slide 32



HapMap samples

• 90 Yoruba individuals (30 parent-parent-offspring
  trios) from Ibadan, Nigeria (YRI)
• 90 individuals (30 trios) of European descent from
  Utah (CEU)
• 45 Han Chinese individuals from Beijing (CHB)
• 44 Japanese individuals from Tokyo (JPT)




                                 Joe Mychaleckyj
                                                          Slide 33


Project feasible
because of:
• The availability of the human genome sequence
• Databases of common SNPs (subsequently enriched by
  HapMap) from which genotyping assays could be
  designed
• Development of inexpensive, accurate technologies for
  highthroughput SNP genotyping
• Web-based tools for storing and sharing data
• Frameworks to address associated ethical and cultural
  issues




                                 Joe Mychaleckyj
                                                        Slide 34



HapMap goals

• Define patterns of genetic variation across human
  genome
• Guide selection of SNPs efficiently to “tag” common
  variants
• Public release of all data (assays, genotypes)
• Phase I: 1.3 M markers in 269 people
   1 SNP/5kb (1.3M markers)
   Minor allele frequency (MAF) >5%
• Phase II: +2.8 M markers in 270 people




                                  Joe Mychaleckyj
                                           Slide 35




http://www.hapmap.org/
                         Joe Mychaleckyj
                  Slide 36




Joe Mychaleckyj
                  Slide 37




Joe Mychaleckyj
                                                                      Slide 38



HapMap publications

•   The International HapMap Consortium. A Haplotype Map of the
    Human Genome.
    Nature 437, 1299-1320. 2005.

•   The International HapMap Consortium. The International
    HapMap Project.
    Nature 426, 789-796. 2003.

•   The International HapMap Consortium. Integrating Ethics and
    Science in the International HapMap Project.
    Nature Reviews Genetics 5, 467 -475. 2004.

•   Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. The
    International HapMap Project Web site.
    Genome Research,15:1591-1593. 2005.




                                            Joe Mychaleckyj
                                                Slide 39



ENCODE project

• Aim: To compare the genome-wide resource
  to a more complete database of common
  variation—one in which all common SNPs
  and many rarer ones have been discovered
  and tested
• Selected a representative collection of ten
  regions, each 500 kb in length
• Each 500-kb region was sequenced in 48
  individuals, and all SNPs in these regions
  (discovered or in dbSNP) were genotyped in
  the complete set of 269 DNA samples


                           Joe Mychaleckyj
                            Slide 40

      Comparison of linkage
      disequilibrium and
      recombination for two ENCODE
      regions




       Nature 437, 1299-1320. 2005




Joe Mychaleckyj
                                Slide 41




LD in Human Populations




              Joe Mychaleckyj
                                                  Slide 42



Haplotype Blocks

N SNPs = 2N Haplotypes possible, ie very large
diversity possible
But: we do not see the full extent of haplotype
diversity in human populations
      Extensive LD especially at short distances eg
            ~20kbases.
Haplotypes are broken into blocks of markers with
high mutual LD separated by recombination hotspots
Non-uniform LD across genome


                                Joe Mychaleckyj
                                                                  Slide 43



   Haplotype Blocks




  Haplotype blocks: at least 80% of observed haplotypes
  with frequency >= 5% could be grouped into common
  patterns



Whole Genome Patterns of Common DNA Variation
in Three Human Populations, Science 2005, Hinds et al.
                                                Joe Mychaleckyj
                                            Slide 44



Length of LD spans

                                       r2




                     Joe Mychaleckyj
                                                              Slide 45

Example: Large block of LD on chromosome 17
Cluster of common (frequent SNPs In high LD)
518 SNPs, spanning 800 kb
25% in EUR, 9% in AFR, missing in CHN
Genes:
  Microtubule-associated protein tau
  Mutations associated with a variety of
  neurodegeneartive disorders
  Gene coding for a protease similar to
  presenilins
  Mutations result in Alzheimer’s disease
  Gene for corticotropin-releasing hormone
  receptor
     • Immune, endocrine, autonomic, behavioral response to
       stress
                                     Joe Mychaleckyj
                                          Slide 46



Chromosome 17 LD Region




             Prevalent inversion in EUR
             human population
             ~25%




                     Joe Mychaleckyj

								
To top