Lecture 9: Linkage Analysis II

Shared by: Ok37N54
Categories
Tags
-
Stats
views:
7
posted:
11/25/2011
language:
English
pages:
36
Document Sample
scope of work template
							Lecture 9: Linkage Analysis II

           Date: 9/24/02
 Unknown linkage phase
 Mixture of linkage phase
 Mixture of self and random mating
         Unknown Linkage Phase for
               Backcross
    coupling                  repulsion            no information
     A        a                A       a       A       a   x a   a
     B        b                b       B       b       b     B   b
                  A   A                A   A
          x       B   B            x   B   B




A    a        A       a   A    a       A   a   A       a     a       a
B    b        b       B   B    b       b   B   b       b     B       b
                                                   ?             ?
                 Unknown Linkage Phase F2

coupling-coupling         repulsion-repulsion
                                                          coupling-repulsion
A       a       x A   a   A       a x A        a
                                                          A   a   x A
B       b         B   b   b       B   b        B                          a
                                                          B   b     b     B


    A       a    a    A       A    a     a    A
    B       b    B    b       B    b     B    b       A   a   a   A
                                                      B   b   B   b

                                                              A   a   a   A
                                       dealt with later
                                                              B   b   B   b
         Determining Linkage Phase:
                   F2-CD
Goal: Calculate likelihood for F2 with one codominant and one
dominant locus. Show that the coupling and repulsion likelihoods
are symmetric about 0.5.

1. Determine the possible gametes and their probabilities.
Assume coupling of A and B in both parents.
                 AB         Ab            aB            ab
                (1-q)/2     q/2           q/2         (1-q)/2
2. Determine the observable genotypes and their probabilities.
      AAB-         Aabb       AaB-          Aabb             aaB-    aabb
     (1-q2)/4      q 2/4   (1-q +q 2)/2    q(1-q)/2      q(2-q)/4   (1-q)2/4
              Determining Linkage Phase:
                        F2-CD

     3. Write an expression for the likelihood, then log likelihood.
                        1 q q / 2 q 1 q  / 2 q 2 q  / 4 1 q  / 4
LC q   1  q 2 / 4 q 2 / 4
                     f1         f2       2    f3             f4             f5       2    f6




             lC q   f1 log 1  q 2   2 f 2 log q   f 3 log 1  q  q 2 
                f 4 logq 1  q   f 5 logq 2  q   2 f 6 log 1  q 
      4. Repeat the whole process now assuming repulsion phase
      and obtain expression for lR(q).

       5. Confirm lC(q)=lR(1-q).
                         Symmetry Around 0.5

                     0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
                    0

                 -1000

                 -2000
Log Likelihood




                 -3000
                                                                         Coupling Phase
                 -4000
                                                                         Repulsion Phase
                 -5000

                 -6000

                 -7000

                 -8000
                                  Recombinant Fraction
          An Ad Hoc Linkage Phase
           Determination Method I
 When the likelihood surface for the coupling and
  repulsion phase is symmetric about 0.5 (backcross
  and F2 with 1 codominant marker, then a single
  test is sufficient.
 Calculate the G statistic under the coupling
  assumption (use lC(q)).
      If it is significant and q<0.5, then the linkage is coupling
      If it is significant and q>0.5, then the linkage is
       repulsion.
      If it is not significant, no determination can be made.
        An Ad Hoc Linkage Phase
        Determination Method II
 When the likelihood surface is not symmetric
  (e.g. F2 with dominant markers).
 Calculate GC under coupling and GR under
  repulsion model.
 If either is significant and
     GC > GR, then linkage is coupling.
     GR > GC, then linkage is repulsion.
 Otherwise, no determination can be made.
   Statistical Phase Determination:
                 Error
 There is a high chance of making an error when
  linkage is loose.
 When q<0.3, then the chance of error is small
  except for F2-DD, even with sample sizes of ~20.
 For F2-DD cross need sample size >100 to keep
  error down.
 Sample size needed decreases as linkage becomes
  tighter.
          Once Linkage Phase
             Determined
 Once linkage phase has been determined, the
  analysis continues as before.
 Assume linkage phase is now known and do
  a phase-known analysis.
       Phase-Unknown Gametes

               gametes produced by father    AB     ab

        AaBb        aabb
                                             Ab    aB

AaBb    aabb        Aabb         aaBb


• There are multiple reasons why you may not know phase.
• One reason is that grandparents are unavailable.
     Likelihood for Phase-Unknown
                Gametes
          Let X be the count of AB and ab gametes.
          Let Y be the count of Ab and aB gametes.

Lq   Pdata q   Pdata, coupledq   Pdata, repulsion q 
      Pdata coupled,q Pcoupled  Pdata repulsion,q Prepulsion 

     q   X
              1  q 
                     Y   1
                            q 1  q 
                              Y         X 1

                         2                2
        Distribution of the Log
     Likelihood Ratio Test Statistic
 Unfortunately, the test statistic
                   G=2(lnL1 – lnL2)
does not have a regular distribution under the null of
no linkage.
 Numerical approximation of the distribution is
required.
 On the other hand, there is usually insufficient data
in one family to get a significant test statistic.
     Distribution When There Are
           Multiple Families

                        1                           X
           Lq    ln  q X 1  q   q Y 1  q  
                                      Y 1
                        2              2             

 The distribution of G approaches a 50:50 mixture of
  a probability mass at 0 and a chi-squared
  distribution with one degree of freedom. In other
  words, we can simply perform a one-tailed chi-
  square test to test linkage when large numbers of
  families are included in the study.
   General Analysis with Missing
        Information: Step 1


     AaBb      aabb         Aabb         aaBb

1. Identify all possible mating types that could produce these
   offspring and their expected frequency. (Retain phase
   information).
All Possible Mating Types
 Mating Type    Expected Frequency
AB/ab x AB/ab        (2p1p2q1q2)2
AB/ab x Ab/aB        2(2p1p2q1q2)2
Ab/aB x Ab/aB        (2p1p2q1q2)2
AB/ab x Ab/ab   2(2p1p2q1q2)(2p1p2q2q2)
Ab/aB x Ab/ab   2(2p1p2q1q2)(2p1p2q2q2)
AB/ab x aB/ab   2(2p1p2q1q2)(2p2p2q1q2)
Ab/aB x aB/ab   2(2p1p2q1q2)(2p2p2q1q2)
AB/ab x ab/ab   2(2p1p2q1q2)(p2p2q2q2)
Ab/aB x ab/ab   2(2p1p2q1q2)(p2p2q2q2)
Ab/ab x aB/ab   2(2p1p2q2q2)(2p2p2q1q2)
   General Analysis with Missing
       Information : Step 2



2. Conditional on parental mating type, calculate the
   probability of each offspring genotype.
          Probability of Offspring
         Conditional on Mating Type

e.g. AB/ab      AB         Ab         aB          ab
 x Ab/aB      (1-q)/2      q/2        q/2       (1-q/2
   AB
             0.25q1q   0.25q2     0.25q2   0.25q1q
   q/2
   Ab
             0.251q2 0.25q1q 0.25q1q 0.251q2
 (1-q/2
   aB
             0.251q2 0.25q1q 0.25q1q 0.251q2
 (1-q/2
   ab
             0.25q1q   0.25q2     0.25q2   0.25q1q
   q/2
   General Analysis with Missing
       Information : Step 3

        PAaBb AB/ab x Ab/aB  4  0.25q 1  q 
                                   q 1  q 
3. Calculate the unconditional probability of each
   offspring genotype.

PAaBb         PAaBb mating typePmating type
             mating ty pes
   General Analysis with Missing
       Information : Step 4

4. Sum the log-likelihood contributions over all possible
   offspring genotypes.


            l q              f logP j 
                                       j
                       j offspring genoty pe
   General Analysis with Missing
       Information : Step 5

5. The log-likelihood ratio statistic is asymptotically a
   50:50 mixture of 0 point and mass and chi-squared
   with one degree of freedom.

                 G  2ln L1  ln L0 
       Mixture of Linkage Phase

 A mixture of linkage phase results when the
  two parents have difference phase. Consider
  the F2 with coupling-repulsion parents.
               AB/ab x Ab/aB
    Mixture of Linkage Phase:
   Expected Genotype Frequency
Genotype   Count    Expected            Pi(R|G)
                    Frequency
 AABB       f1       0.25q(1-q)            0.5
 AABb       f2     0.25(1-2q +q 2)   q 2/[(1-q)2+q 2]
  Aabb      f3         0.25q
    Mixture of Linkage Phase:
   Expected Genotype Frequency
Genotype   Count   Expected         Pi(R|G)
                   Frequency
 AABB       f1     0.25q(1-q)          0.5
 AABb       f2     0.25(1-q )2   q 2/[(1-q)2+q 2]
  Aabb      f3     0.25q(1-q)          0.5
 AaBB       f4     0.25(1-q )2   q 2/[(1-q)2+q 2]
 AaBb       f5       q(1-q)            0.5
  Aabb      f6     0.25(1-q )2   q 2/[(1-q)2+q 2]
  aaBB      f7     0.25q(1-q)          0.5
  aaBb      f8     0.25(1-q )2   q 2/[(1-q)2+q 2]
  aabb      f9     0.25q(1-q)          0.5
Mixture of Linkage Phase: Log
         Likelihood

 Lq    f1  f 3  f 5  f 7  f 9 log q
   N  f 3  f 4  f 6  f8 log 1  q 
Analytic MLE available:



      qˆ  f1  f 3  f 5  f 7  f 9
                     2N
      Mixture of Self and Random
            Mating (MSR)
 Controlled crosses not always available.
 Frequently, crosses resulting from open-pollinated
  populations are. These lead to MSR.
 Assume loci A and B are linked in coupling phase
  with recombination fraction q.
 Assume alleles a and A at A and b and B at B.
 Assume u and v are the frequencies of A and B in
  the pollen pool. (e.g. frequency of a is 1-u)
 Assume linkage equilibrium in the pollen.
         MSR - Expected Frequencies
          for Codominant Alleles
Genotype    Count             Expected Frequencies
                            Outcross                 Self
 AABB        f1            0.5uv(1-q)           0.25(1-q)2
 AABb        f2        0.5u[(1-v)(1-q)+vq]      0.5q(1-q)
  Aabb       f3            0.5u(1-v)q            0.25q 2
 AaBB        f4        0.5v[(1-u)(1-q )+uq]     0.5q (1-q)
 AaBb        f5     0.51q 12q)(u+v-2uv)]   0.5(1-q)2
  Aabb       f6        0.5(1-v)(u-2uq +q )      0.5q (1-q)
  aaBB       f7            0.5(1-u)vq            0.25q 2
  aaBb       f8         0.5(1-u)(v-2vqq )      0.5q (1-q)
  aabb       f9         0.5(1-u)(1-v)(1-q)      0.25(1-q)2
               MSR – Log Likelihood
                    Function

                       9
             Lq    f i log tpoi  1  t  psi 
                      i 1


• t is the probability of outcrossing (vs. selfing)
• poi is the expected frequency of type i progeny from outcross.
• psi is the expected frequency of type i progeny from self.
• q enters through the above expected frequencies as provided in previous
    table.
    Estimating Allelic Frequencies
        in Pollen Pool (u and v)
 Use a single locus, say A.
 Consider heterozygous maternal plants (Aa).
 Write an expression for the log-likelihood in
  MSR population.
 Condition on the outcrossing rate t.
 Solve analytically for umle.
      Estimating the Outcrossing
                Rate t
 The prior analysis conditioned on the
  outcrossing rate t.
 Unfortunately Aa heterozygous mother is
  necessary to determine linkage but is least
  informative for t.
                    MSR - Estimating
                Recombination Fraction q I
 EM: Calculate the conditional probabilities
    of recombination given the genotype.
         1 9
q n 1   f i  poiAb  poiaB t   psiAb  psiaB 1  t 
         N i 1
  NR: Calculate the score and information.
        9
                 d log 1  t  psi  tpoi               9
                                                                     d 2 log 1  t  psi  tpoi 
S q       fi                                I q    E f i 
       i 1                 dq                            i 1                  dq 2
    MSR - Estimating q, u, and v
               EM
 Pick initial estimates (u0, v0, q0).
 Calculate expected gametic frequencies in
  selfed and outcrossed populations conditional
  on current estimates and observed genotype
  frequencies. tf i poig
 Calculate the mle for (u1, v1, q1).
 Iterate.
         MSR - Estimating q, u, and v
                    (NR)

                  L                              2L
                                                   
                                                            2L      2L 
                                                                          
                  
                  u                              u
                                                        2
                                                            uv    uq 
                  L                              2L    2L      2L 
S q , u , v                     I q , u, v   
                  v                               uv    v 2   qv 
                                                    2                    
                  L                               L    2L      2L 
                                                  uq   qv     q 2 
                  q                                                   

                           un 1   un 
                                    1 1
                           vn 1    vn   I S
                          q  q  N
                           n 1   n 
      MSR – Linkage Information

 Linkage information content is sensitive to allele
  frequencies when outcrossing is high.
 Linkage information content decreases rapidly as
  the allelic frequencies approach 0.5.
 When linkage is tight MSR provides less
  information relative to F2 than when linkage is
  tight, but high linkage is always more informative
  than low linkage.
        MSR - Bias and Variance

 Bias and mean square error is higher for dominant
  markers than codominant.
 Bias and mean square errors are acceptable for
  q<0.2 only when dominant allele frequency is less
  than true q.
 When dominant allele frequency is > 0.5, high
  negative bias on q.
 Allele frequency cannot be accurately estimated
  when true frequency is <0.1 or >0.5 and outcrossing
  is low.
                    Summary

 Unknown linkage phase
     Reducing the problem to a phase-known problem
     Likelihood when phase unknown
 Likelihood for general pedigree with missing
  information.
 Likelihood for mixture of linkage phase
 Mixture of Self and Random mating (MSR)

						
Related docs
Other docs by Ok37N54
Phi Theta Kappa Scholarship Programs
Views: 5  |  Downloads: 0
Lesson 2-2
Views: 2  |  Downloads: 0
Tabelle1
Views: 219  |  Downloads: 0
COUNTY OF SAN DIEGO BOARD OF SUPERVISORS
Views: 5  |  Downloads: 0
Cheques de Febrero
Views: 160  |  Downloads: 0
MILTON VARSITY VOLLEYBALL
Views: 1  |  Downloads: 0
????????????? ???????
Views: 13  |  Downloads: 0
COBOL subprogram linkage with Datacom/DB
Views: 450  |  Downloads: 3
TLCUE8 - Comercio Internacional
Views: 63  |  Downloads: 0