Lecture 9: Linkage Analysis II
Document Sample


Lecture 9: Linkage Analysis II
Date: 9/24/02
Unknown linkage phase
Mixture of linkage phase
Mixture of self and random mating
Unknown Linkage Phase for
Backcross
coupling repulsion no information
A a A a A a x a a
B b b B b b B b
A A A A
x B B x B B
A a A a A a A a A a a a
B b b B B b b B b b B b
? ?
Unknown Linkage Phase F2
coupling-coupling repulsion-repulsion
coupling-repulsion
A a x A a A a x A a
A a x A
B b B b b B b B a
B b b B
A a a A A a a A
B b B b B b B b A a a A
B b B b
A a a A
dealt with later
B b B b
Determining Linkage Phase:
F2-CD
Goal: Calculate likelihood for F2 with one codominant and one
dominant locus. Show that the coupling and repulsion likelihoods
are symmetric about 0.5.
1. Determine the possible gametes and their probabilities.
Assume coupling of A and B in both parents.
AB Ab aB ab
(1-q)/2 q/2 q/2 (1-q)/2
2. Determine the observable genotypes and their probabilities.
AAB- Aabb AaB- Aabb aaB- aabb
(1-q2)/4 q 2/4 (1-q +q 2)/2 q(1-q)/2 q(2-q)/4 (1-q)2/4
Determining Linkage Phase:
F2-CD
3. Write an expression for the likelihood, then log likelihood.
1 q q / 2 q 1 q / 2 q 2 q / 4 1 q / 4
LC q 1 q 2 / 4 q 2 / 4
f1 f2 2 f3 f4 f5 2 f6
lC q f1 log 1 q 2 2 f 2 log q f 3 log 1 q q 2
f 4 logq 1 q f 5 logq 2 q 2 f 6 log 1 q
4. Repeat the whole process now assuming repulsion phase
and obtain expression for lR(q).
5. Confirm lC(q)=lR(1-q).
Symmetry Around 0.5
0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
0
-1000
-2000
Log Likelihood
-3000
Coupling Phase
-4000
Repulsion Phase
-5000
-6000
-7000
-8000
Recombinant Fraction
An Ad Hoc Linkage Phase
Determination Method I
When the likelihood surface for the coupling and
repulsion phase is symmetric about 0.5 (backcross
and F2 with 1 codominant marker, then a single
test is sufficient.
Calculate the G statistic under the coupling
assumption (use lC(q)).
If it is significant and q<0.5, then the linkage is coupling
If it is significant and q>0.5, then the linkage is
repulsion.
If it is not significant, no determination can be made.
An Ad Hoc Linkage Phase
Determination Method II
When the likelihood surface is not symmetric
(e.g. F2 with dominant markers).
Calculate GC under coupling and GR under
repulsion model.
If either is significant and
GC > GR, then linkage is coupling.
GR > GC, then linkage is repulsion.
Otherwise, no determination can be made.
Statistical Phase Determination:
Error
There is a high chance of making an error when
linkage is loose.
When q<0.3, then the chance of error is small
except for F2-DD, even with sample sizes of ~20.
For F2-DD cross need sample size >100 to keep
error down.
Sample size needed decreases as linkage becomes
tighter.
Once Linkage Phase
Determined
Once linkage phase has been determined, the
analysis continues as before.
Assume linkage phase is now known and do
a phase-known analysis.
Phase-Unknown Gametes
gametes produced by father AB ab
AaBb aabb
Ab aB
AaBb aabb Aabb aaBb
• There are multiple reasons why you may not know phase.
• One reason is that grandparents are unavailable.
Likelihood for Phase-Unknown
Gametes
Let X be the count of AB and ab gametes.
Let Y be the count of Ab and aB gametes.
Lq Pdata q Pdata, coupledq Pdata, repulsion q
Pdata coupled,q Pcoupled Pdata repulsion,q Prepulsion
q X
1 q
Y 1
q 1 q
Y X 1
2 2
Distribution of the Log
Likelihood Ratio Test Statistic
Unfortunately, the test statistic
G=2(lnL1 – lnL2)
does not have a regular distribution under the null of
no linkage.
Numerical approximation of the distribution is
required.
On the other hand, there is usually insufficient data
in one family to get a significant test statistic.
Distribution When There Are
Multiple Families
1 X
Lq ln q X 1 q q Y 1 q
Y 1
2 2
The distribution of G approaches a 50:50 mixture of
a probability mass at 0 and a chi-squared
distribution with one degree of freedom. In other
words, we can simply perform a one-tailed chi-
square test to test linkage when large numbers of
families are included in the study.
General Analysis with Missing
Information: Step 1
AaBb aabb Aabb aaBb
1. Identify all possible mating types that could produce these
offspring and their expected frequency. (Retain phase
information).
All Possible Mating Types
Mating Type Expected Frequency
AB/ab x AB/ab (2p1p2q1q2)2
AB/ab x Ab/aB 2(2p1p2q1q2)2
Ab/aB x Ab/aB (2p1p2q1q2)2
AB/ab x Ab/ab 2(2p1p2q1q2)(2p1p2q2q2)
Ab/aB x Ab/ab 2(2p1p2q1q2)(2p1p2q2q2)
AB/ab x aB/ab 2(2p1p2q1q2)(2p2p2q1q2)
Ab/aB x aB/ab 2(2p1p2q1q2)(2p2p2q1q2)
AB/ab x ab/ab 2(2p1p2q1q2)(p2p2q2q2)
Ab/aB x ab/ab 2(2p1p2q1q2)(p2p2q2q2)
Ab/ab x aB/ab 2(2p1p2q2q2)(2p2p2q1q2)
General Analysis with Missing
Information : Step 2
2. Conditional on parental mating type, calculate the
probability of each offspring genotype.
Probability of Offspring
Conditional on Mating Type
e.g. AB/ab AB Ab aB ab
x Ab/aB (1-q)/2 q/2 q/2 (1-q/2
AB
0.25q1q 0.25q2 0.25q2 0.25q1q
q/2
Ab
0.251q2 0.25q1q 0.25q1q 0.251q2
(1-q/2
aB
0.251q2 0.25q1q 0.25q1q 0.251q2
(1-q/2
ab
0.25q1q 0.25q2 0.25q2 0.25q1q
q/2
General Analysis with Missing
Information : Step 3
PAaBb AB/ab x Ab/aB 4 0.25q 1 q
q 1 q
3. Calculate the unconditional probability of each
offspring genotype.
PAaBb PAaBb mating typePmating type
mating ty pes
General Analysis with Missing
Information : Step 4
4. Sum the log-likelihood contributions over all possible
offspring genotypes.
l q f logP j
j
j offspring genoty pe
General Analysis with Missing
Information : Step 5
5. The log-likelihood ratio statistic is asymptotically a
50:50 mixture of 0 point and mass and chi-squared
with one degree of freedom.
G 2ln L1 ln L0
Mixture of Linkage Phase
A mixture of linkage phase results when the
two parents have difference phase. Consider
the F2 with coupling-repulsion parents.
AB/ab x Ab/aB
Mixture of Linkage Phase:
Expected Genotype Frequency
Genotype Count Expected Pi(R|G)
Frequency
AABB f1 0.25q(1-q) 0.5
AABb f2 0.25(1-2q +q 2) q 2/[(1-q)2+q 2]
Aabb f3 0.25q
Mixture of Linkage Phase:
Expected Genotype Frequency
Genotype Count Expected Pi(R|G)
Frequency
AABB f1 0.25q(1-q) 0.5
AABb f2 0.25(1-q )2 q 2/[(1-q)2+q 2]
Aabb f3 0.25q(1-q) 0.5
AaBB f4 0.25(1-q )2 q 2/[(1-q)2+q 2]
AaBb f5 q(1-q) 0.5
Aabb f6 0.25(1-q )2 q 2/[(1-q)2+q 2]
aaBB f7 0.25q(1-q) 0.5
aaBb f8 0.25(1-q )2 q 2/[(1-q)2+q 2]
aabb f9 0.25q(1-q) 0.5
Mixture of Linkage Phase: Log
Likelihood
Lq f1 f 3 f 5 f 7 f 9 log q
N f 3 f 4 f 6 f8 log 1 q
Analytic MLE available:
qˆ f1 f 3 f 5 f 7 f 9
2N
Mixture of Self and Random
Mating (MSR)
Controlled crosses not always available.
Frequently, crosses resulting from open-pollinated
populations are. These lead to MSR.
Assume loci A and B are linked in coupling phase
with recombination fraction q.
Assume alleles a and A at A and b and B at B.
Assume u and v are the frequencies of A and B in
the pollen pool. (e.g. frequency of a is 1-u)
Assume linkage equilibrium in the pollen.
MSR - Expected Frequencies
for Codominant Alleles
Genotype Count Expected Frequencies
Outcross Self
AABB f1 0.5uv(1-q) 0.25(1-q)2
AABb f2 0.5u[(1-v)(1-q)+vq] 0.5q(1-q)
Aabb f3 0.5u(1-v)q 0.25q 2
AaBB f4 0.5v[(1-u)(1-q )+uq] 0.5q (1-q)
AaBb f5 0.51q 12q)(u+v-2uv)] 0.5(1-q)2
Aabb f6 0.5(1-v)(u-2uq +q ) 0.5q (1-q)
aaBB f7 0.5(1-u)vq 0.25q 2
aaBb f8 0.5(1-u)(v-2vqq ) 0.5q (1-q)
aabb f9 0.5(1-u)(1-v)(1-q) 0.25(1-q)2
MSR – Log Likelihood
Function
9
Lq f i log tpoi 1 t psi
i 1
• t is the probability of outcrossing (vs. selfing)
• poi is the expected frequency of type i progeny from outcross.
• psi is the expected frequency of type i progeny from self.
• q enters through the above expected frequencies as provided in previous
table.
Estimating Allelic Frequencies
in Pollen Pool (u and v)
Use a single locus, say A.
Consider heterozygous maternal plants (Aa).
Write an expression for the log-likelihood in
MSR population.
Condition on the outcrossing rate t.
Solve analytically for umle.
Estimating the Outcrossing
Rate t
The prior analysis conditioned on the
outcrossing rate t.
Unfortunately Aa heterozygous mother is
necessary to determine linkage but is least
informative for t.
MSR - Estimating
Recombination Fraction q I
EM: Calculate the conditional probabilities
of recombination given the genotype.
1 9
q n 1 f i poiAb poiaB t psiAb psiaB 1 t
N i 1
NR: Calculate the score and information.
9
d log 1 t psi tpoi 9
d 2 log 1 t psi tpoi
S q fi I q E f i
i 1 dq i 1 dq 2
MSR - Estimating q, u, and v
EM
Pick initial estimates (u0, v0, q0).
Calculate expected gametic frequencies in
selfed and outcrossed populations conditional
on current estimates and observed genotype
frequencies. tf i poig
Calculate the mle for (u1, v1, q1).
Iterate.
MSR - Estimating q, u, and v
(NR)
L 2L
2L 2L
u u
2
uv uq
L 2L 2L 2L
S q , u , v I q , u, v
v uv v 2 qv
2
L L 2L 2L
uq qv q 2
q
un 1 un
1 1
vn 1 vn I S
q q N
n 1 n
MSR – Linkage Information
Linkage information content is sensitive to allele
frequencies when outcrossing is high.
Linkage information content decreases rapidly as
the allelic frequencies approach 0.5.
When linkage is tight MSR provides less
information relative to F2 than when linkage is
tight, but high linkage is always more informative
than low linkage.
MSR - Bias and Variance
Bias and mean square error is higher for dominant
markers than codominant.
Bias and mean square errors are acceptable for
q<0.2 only when dominant allele frequency is less
than true q.
When dominant allele frequency is > 0.5, high
negative bias on q.
Allele frequency cannot be accurately estimated
when true frequency is <0.1 or >0.5 and outcrossing
is low.
Summary
Unknown linkage phase
Reducing the problem to a phase-known problem
Likelihood when phase unknown
Likelihood for general pedigree with missing
information.
Likelihood for mixture of linkage phase
Mixture of Self and Random mating (MSR)
Get documents about "