# Lecture 9: Linkage Analysis II

Shared by:
Categories
Tags
-
Stats
views:
7
posted:
11/25/2011
language:
English
pages:
36
Document Sample

```							Lecture 9: Linkage Analysis II

Date: 9/24/02
 Mixture of self and random mating
Backcross
coupling                  repulsion            no information
A        a                A       a       A       a   x a   a
B        b                b       B       b       b     B   b
A   A                A   A
x       B   B            x   B   B

A    a        A       a   A    a       A   a   A       a     a       a
B    b        b       B   B    b       b   B   b       b     B       b
?             ?

coupling-coupling         repulsion-repulsion
coupling-repulsion
A       a       x A   a   A       a x A        a
A   a   x A
B       b         B   b   b       B   b        B                          a
B   b     b     B

A       a    a    A       A    a     a    A
B       b    B    b       B    b     B    b       A   a   a   A
B   b   B   b

A   a   a   A
dealt with later
B   b   B   b
F2-CD
Goal: Calculate likelihood for F2 with one codominant and one
dominant locus. Show that the coupling and repulsion likelihoods

1. Determine the possible gametes and their probabilities.
Assume coupling of A and B in both parents.
AB         Ab            aB            ab
(1-q)/2     q/2           q/2         (1-q)/2
2. Determine the observable genotypes and their probabilities.
AAB-         Aabb       AaB-          Aabb             aaB-    aabb
(1-q2)/4      q 2/4   (1-q +q 2)/2    q(1-q)/2      q(2-q)/4   (1-q)2/4
F2-CD

3. Write an expression for the likelihood, then log likelihood.
                1 q q / 2 q 1 q  / 2 q 2 q  / 4 1 q  / 4
LC q   1  q 2 / 4 q 2 / 4
f1         f2       2    f3             f4             f5       2    f6

lC q   f1 log 1  q 2   2 f 2 log q   f 3 log 1  q  q 2 
 f 4 logq 1  q   f 5 logq 2  q   2 f 6 log 1  q 
4. Repeat the whole process now assuming repulsion phase
and obtain expression for lR(q).

5. Confirm lC(q)=lR(1-q).
Symmetry Around 0.5

0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
0

-1000

-2000
Log Likelihood

-3000
Coupling Phase
-4000
Repulsion Phase
-5000

-6000

-7000

-8000
Recombinant Fraction
Determination Method I
 When the likelihood surface for the coupling and
repulsion phase is symmetric about 0.5 (backcross
and F2 with 1 codominant marker, then a single
test is sufficient.
 Calculate the G statistic under the coupling
assumption (use lC(q)).
   If it is significant and q<0.5, then the linkage is coupling
   If it is significant and q>0.5, then the linkage is
repulsion.
   If it is not significant, no determination can be made.
Determination Method II
 When the likelihood surface is not symmetric
(e.g. F2 with dominant markers).
 Calculate GC under coupling and GR under
repulsion model.
 If either is significant and
   GC > GR, then linkage is coupling.
   GR > GC, then linkage is repulsion.
 Otherwise, no determination can be made.
Statistical Phase Determination:
Error
 There is a high chance of making an error when
 When q<0.3, then the chance of error is small
except for F2-DD, even with sample sizes of ~20.
 For F2-DD cross need sample size >100 to keep
error down.
 Sample size needed decreases as linkage becomes
tighter.
Determined
 Once linkage phase has been determined, the
analysis continues as before.
 Assume linkage phase is now known and do
a phase-known analysis.
Phase-Unknown Gametes

gametes produced by father    AB     ab

AaBb        aabb
Ab    aB

AaBb    aabb        Aabb         aaBb

• There are multiple reasons why you may not know phase.
• One reason is that grandparents are unavailable.
Likelihood for Phase-Unknown
Gametes
Let X be the count of AB and ab gametes.
Let Y be the count of Ab and aB gametes.

Lq   Pdata q   Pdata, coupledq   Pdata, repulsion q 
 Pdata coupled,q Pcoupled  Pdata repulsion,q Prepulsion 

q   X
1  q 
Y   1
 q 1  q 
Y         X 1

2                2
Distribution of the Log
Likelihood Ratio Test Statistic
 Unfortunately, the test statistic
G=2(lnL1 – lnL2)
does not have a regular distribution under the null of
 Numerical approximation of the distribution is
required.
 On the other hand, there is usually insufficient data
in one family to get a significant test statistic.
Distribution When There Are
Multiple Families

1                           X
Lq    ln  q X 1  q   q Y 1  q  
Y 1
2              2             

 The distribution of G approaches a 50:50 mixture of
a probability mass at 0 and a chi-squared
distribution with one degree of freedom. In other
words, we can simply perform a one-tailed chi-
square test to test linkage when large numbers of
families are included in the study.
General Analysis with Missing
Information: Step 1

AaBb      aabb         Aabb         aaBb

1. Identify all possible mating types that could produce these
offspring and their expected frequency. (Retain phase
information).
All Possible Mating Types
Mating Type    Expected Frequency
AB/ab x AB/ab        (2p1p2q1q2)2
AB/ab x Ab/aB        2(2p1p2q1q2)2
Ab/aB x Ab/aB        (2p1p2q1q2)2
AB/ab x Ab/ab   2(2p1p2q1q2)(2p1p2q2q2)
Ab/aB x Ab/ab   2(2p1p2q1q2)(2p1p2q2q2)
AB/ab x aB/ab   2(2p1p2q1q2)(2p2p2q1q2)
Ab/aB x aB/ab   2(2p1p2q1q2)(2p2p2q1q2)
AB/ab x ab/ab   2(2p1p2q1q2)(p2p2q2q2)
Ab/aB x ab/ab   2(2p1p2q1q2)(p2p2q2q2)
Ab/ab x aB/ab   2(2p1p2q2q2)(2p2p2q1q2)
General Analysis with Missing
Information : Step 2

2. Conditional on parental mating type, calculate the
probability of each offspring genotype.
Probability of Offspring
Conditional on Mating Type

e.g. AB/ab      AB         Ab         aB          ab
x Ab/aB      (1-q)/2      q/2        q/2       (1-q/2
AB
0.25q1q   0.25q2     0.25q2   0.25q1q
q/2
Ab
0.251q2 0.25q1q 0.25q1q 0.251q2
(1-q/2
aB
0.251q2 0.25q1q 0.25q1q 0.251q2
(1-q/2
ab
0.25q1q   0.25q2     0.25q2   0.25q1q
q/2
General Analysis with Missing
Information : Step 3

PAaBb AB/ab x Ab/aB  4  0.25q 1  q 
 q 1  q 
3. Calculate the unconditional probability of each
offspring genotype.

PAaBb         PAaBb mating typePmating type
mating ty pes
General Analysis with Missing
Information : Step 4

4. Sum the log-likelihood contributions over all possible
offspring genotypes.

l q              f logP j 
j
j offspring genoty pe
General Analysis with Missing
Information : Step 5

5. The log-likelihood ratio statistic is asymptotically a
50:50 mixture of 0 point and mass and chi-squared
with one degree of freedom.

G  2ln L1  ln L0 

 A mixture of linkage phase results when the
two parents have difference phase. Consider
the F2 with coupling-repulsion parents.
AB/ab x Ab/aB
Expected Genotype Frequency
Genotype   Count    Expected            Pi(R|G)
Frequency
AABB       f1       0.25q(1-q)            0.5
AABb       f2     0.25(1-2q +q 2)   q 2/[(1-q)2+q 2]
Aabb      f3         0.25q
Expected Genotype Frequency
Genotype   Count   Expected         Pi(R|G)
Frequency
AABB       f1     0.25q(1-q)          0.5
AABb       f2     0.25(1-q )2   q 2/[(1-q)2+q 2]
Aabb      f3     0.25q(1-q)          0.5
AaBB       f4     0.25(1-q )2   q 2/[(1-q)2+q 2]
AaBb       f5       q(1-q)            0.5
Aabb      f6     0.25(1-q )2   q 2/[(1-q)2+q 2]
aaBB      f7     0.25q(1-q)          0.5
aaBb      f8     0.25(1-q )2   q 2/[(1-q)2+q 2]
aabb      f9     0.25q(1-q)          0.5
Likelihood

Lq    f1  f 3  f 5  f 7  f 9 log q
 N  f 3  f 4  f 6  f8 log 1  q 
Analytic MLE available:

qˆ  f1  f 3  f 5  f 7  f 9
2N
Mixture of Self and Random
Mating (MSR)
 Controlled crosses not always available.
 Frequently, crosses resulting from open-pollinated
populations are. These lead to MSR.
 Assume loci A and B are linked in coupling phase
with recombination fraction q.
 Assume alleles a and A at A and b and B at B.
 Assume u and v are the frequencies of A and B in
the pollen pool. (e.g. frequency of a is 1-u)
 Assume linkage equilibrium in the pollen.
MSR - Expected Frequencies
for Codominant Alleles
Genotype    Count             Expected Frequencies
Outcross                 Self
AABB        f1            0.5uv(1-q)           0.25(1-q)2
AABb        f2        0.5u[(1-v)(1-q)+vq]      0.5q(1-q)
Aabb       f3            0.5u(1-v)q            0.25q 2
AaBB        f4        0.5v[(1-u)(1-q )+uq]     0.5q (1-q)
AaBb        f5     0.51q 12q)(u+v-2uv)]   0.5(1-q)2
Aabb       f6        0.5(1-v)(u-2uq +q )      0.5q (1-q)
aaBB       f7            0.5(1-u)vq            0.25q 2
aaBb       f8         0.5(1-u)(v-2vqq )      0.5q (1-q)
aabb       f9         0.5(1-u)(1-v)(1-q)      0.25(1-q)2
MSR – Log Likelihood
Function

9
Lq    f i log tpoi  1  t  psi 
i 1

• t is the probability of outcrossing (vs. selfing)
• poi is the expected frequency of type i progeny from outcross.
• psi is the expected frequency of type i progeny from self.
• q enters through the above expected frequencies as provided in previous
table.
Estimating Allelic Frequencies
in Pollen Pool (u and v)
 Use a single locus, say A.
 Consider heterozygous maternal plants (Aa).
 Write an expression for the log-likelihood in
MSR population.
 Condition on the outcrossing rate t.
 Solve analytically for umle.
Estimating the Outcrossing
Rate t
 The prior analysis conditioned on the
outcrossing rate t.
 Unfortunately Aa heterozygous mother is
necessary to determine linkage but is least
informative for t.
MSR - Estimating
Recombination Fraction q I
 EM: Calculate the conditional probabilities
of recombination given the genotype.
1 9
q n 1   f i  poiAb  poiaB t   psiAb  psiaB 1  t 
N i 1
 NR: Calculate the score and information.
9
d log 1  t  psi  tpoi               9
d 2 log 1  t  psi  tpoi 
S q       fi                                I q    E f i 
i 1                 dq                            i 1                  dq 2
MSR - Estimating q, u, and v
EM
 Pick initial estimates (u0, v0, q0).
 Calculate expected gametic frequencies in
selfed and outcrossed populations conditional
on current estimates and observed genotype
frequencies. tf i poig
 Calculate the mle for (u1, v1, q1).
 Iterate.
MSR - Estimating q, u, and v
(NR)

 L                              2L

2L      2L 

 
 u                              u
2
uv    uq 
 L                              2L    2L      2L 
S q , u , v                     I q , u, v   
 v                               uv    v 2   qv 
 2                    
 L                               L    2L      2L 
                                 uq   qv     q 2 
 q                                                   

 un 1   un 
          1 1
 vn 1    vn   I S
q  q  N
 n 1   n 

 Linkage information content is sensitive to allele
frequencies when outcrossing is high.
 Linkage information content decreases rapidly as
the allelic frequencies approach 0.5.
 When linkage is tight MSR provides less
information relative to F2 than when linkage is
MSR - Bias and Variance

 Bias and mean square error is higher for dominant
markers than codominant.
 Bias and mean square errors are acceptable for
q<0.2 only when dominant allele frequency is less
than true q.
 When dominant allele frequency is > 0.5, high
negative bias on q.
 Allele frequency cannot be accurately estimated
when true frequency is <0.1 or >0.5 and outcrossing
is low.
Summary

   Reducing the problem to a phase-known problem
   Likelihood when phase unknown
 Likelihood for general pedigree with missing
information.
 Likelihood for mixture of linkage phase
 Mixture of Self and Random mating (MSR)

```
Related docs
Other docs by Ok37N54
Phi Theta Kappa Scholarship Programs
Lesson 2-2
Tabelle1