Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

QTL

VIEWS: 204 PAGES: 57

									Quantitative Trait Loci, QTL
An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:

Why study quantitative traits?
• Many (most) human traits/disorders are complex in the sense that they are governed by several genetic loci as well as being influenced by environmental agents; • Many of these traits are intrinsically continuously varying and need specialized statistical models/methods for the localization and estimation of genetic contributions; • In addition, in several cases there are potential benefits from studying continuously varying quantities as opposed to a binary affected/unaffected response:

For example:
• in a study of risk factors the underlying quantitative phenotypes that predispose disease may be more etiologically homogenous than the disease phenotype itself; • some qualitative phenotypes occur once a threshold for susceptibility has been exceeded, e.g. type 2 diabetes, obesity, etc.; • in such a case the binary phenotype (affected/unaffected) is not as informative as the actual phenotypic measurements;

A pedigree representation

Variance and variability
• methods for linkage analysis of QTL in humans rely on a partitioning of the total variability of trait values; • in statistical theory, the variance is the expected squared deviation round the mean value,
Y  E (Y ) :
V (Y )  E[(Y  Y ) 2 ];

• it can be estimated from data as:
1 n s  i 1 ( yi  y ) 2 ; n
2

• the square root of the variance is called the standard deviation;

A simple model for the phenotype
Y=X+e where • Y is the phenotypic value, i.e. the trait value; • X is the genotypic value, i.e. the mean or expected phenotypic value given the genotype; • e is the environmental deviation with mean 0. • We assume that the total phenotypic variance is the sum of the genotypic variance and the environmental variance, V (Y ) = V (X ) + V (e), i.e. the environmental contribution is assumed independent of the genotype of the individual;

Distribution of Y : a single biallelic locus

A single biallelic locus: genetic effects
Genotype

Genotypic value • a is the homozygous effect, • k is the dominance coeffcient • k = 0 means complete additivity, • k = 1 means complete dominance (of A2), • k > 1 if A2 is overdominant.

Example: The pygmy gene, pg
• From data we have the following mean values of weight: X++ = 14g, X+pg = 12g, Xpgpg = 6g, • 2a = 14 -6 = 8 implies a =4, • (1 + k)a = 12 - 6 = 6 implies k = 0.5. Data suggest recessivity (although not complete) of the pygmy gene.

Decomposition of the genotypic value, X
• Xij is the mean of Y for AiAj-individuals; • when k = 0 the two alleles of a biallelic locus behaves in a completely additive fashion: X is a linear function of the number of A2-alleles; • we can then think of each allele contributing a purely additive effect to X ; • this can be generalized to k ≠ 0 by decomposition of X into additive contributions of alleles together with deviations resulting from dominance; • the generalization is accomplished using leastsquares regression of X on the gene content;

Least-squares linear regression

ˆ  X = X +  , i.e. fitted value  residual deviation;  minimize the sum of squared residuals; ˆ  V ( X )  V (X )  V ( ), variance decomposit ion

Model 1
ˆ  X i j  X ij   ij     i   j   ij   is the population mean phenotype,

 i is the additive effect of allele Ai ,  ij is the residual deviation due to dominance;
ˆ   X ij    1 N1   2 N 2 , with N k the number of Ak - alleles in the genotype;    21 ˆ       X ij  1 2   2 2  for A1 A1 , for A1 A2 , for A2 A2 .

1 p1   2 p2  0    2  1  1   p2  2  p1

Interpretations
ˆ • in the linear regression X  X   ˆ X is the heritable component of the genotype, δis the non-heritable part; • the sum of an individuals additive allelic effects, αi+αj is called the breeding value and is denoted Λij • under random mating αican be interpreted as the average excess of allele Ai • this is defined as the difference between the expected phenotypic value when one allele (e.g. the paternally transmitted) is fixed at Ai and the population average, μ;

 pk  proportion of Ak - alleles in population; allele is 0, i.e.

Linear Regression
 1 p1   2 p2  0 ;

 the expected additive effect of a randomly drawn

 which implies the corresponding population variance

 12 p1   22 p2
 since for a biallelic locus N1  2-N 2 , ~ X    N  
ij 2 ij

where ~     2 1 ,    2  1.

Graphically

Linear Regression Model solving 
~ • X ij      N2   ij
X

N2
0 1
2

prob.

0 a(1+k)
2a

p12
2 p1 p2

p

2 2

cov( X , N 2 ) •  var( N 2 )

2 E ( X )  a(1  k )  2 p1 p2  2ap2  2ap2 (1  p1k )

V ( X )  a (1  k )  2 p1 p2  4a p  4a p (1  p1k )
2 2 2 2 2 2 2 2

2

E ( N 2 )  2 p2
2 Var ( N 2 )  2 p1 p2

E ( XN 2 )  a(1  k ) 1 2 p1 p2  2a  2 p  2ap2 (2 p2  p1 (1  k ))

2 2

COV ( X , N 2 )  2ap2 [2 p2 (1  p1k )  2 p2  p1 (1  k )]  2ap1 p2 [1  k  2 p2 k ]  2ap1 p2 [1  k ( p1  p2 )]    a  [1  k ( p1  p2 )]

average excesses
 i*  E ( X | one allele is i)   X 1*  X 12 p(another one is 2 | 1)
 X 11 p(another one is1 | 1)   X
random ating m



X 12 p2  X 11 p1   X

 (1   2 ) p2  (21 ) p1  1

Interpretations under random mating
• α= a [1+ k (p1-p2)] ; α= - p2 α; α= p1 α, Population parameters for k≠0 • α is called the average effect of allelic substitution: substitute A1 A2for a randomly chosen A1 –allele • then the expected change in X is, (X12 -X11) p1 + (X22 -X12) p2 ; • which equals α. (simple calculations).

 : Average effect of allelic substitution
A1 A2 A1 A2 A2 A1

p1 ( X 12  X 11)  p2 ( X 22  X 12 )  p1  a (1  k )  p2  a(1  k )  a  (1  k ( p1  p2 ))

α is a function of p2 and k :

Partitioning the genetic variance
• the variance, V (X ), of the genotypic values in a population is called the genetic variance:
ˆ V (X )  V (X  ) ˆ  V ( X )  V ( )  V A  VD

•

is the additive genetic variance, i.e. variance associated with additive allelic effects; • VD  (2 p1 p2 ak ) 2 dominance genetic variance, i.e. due to dominance deviations;

2 VA  2 p1 p2 2  2( p112  p2 2 )

VA
2 VA  2( p112  p2 2 )

 p11  p2 2  0 VA  2 p1 p2  p 4 ( Linear regression)
2 2 2 2 2  (2 p1 p2  2p2 ) 2

 2 p1 p2 2  2 p1 p2 a 2 [1  k ( p1  p2 )]2

V (X); VA; VD are functions of p2 and k:

VA  [dashed]  2 p1 p2 [a(1  k ( p1  p2 ))]2 ; VD  [dotted ]  (2 p1 p2 ak ) 2 ;

Example: The Booroola gene, (Lynch and Walsh, 1998)

In summary
• The homozygous effect a, and the dominance coefficient k are intrinsic properties of allelic products. • The additive effect αi, and the average excess αi* are properties of alleles in a particular population. • The breeding value is a property of a particular individual in reference to a particular population. It is the sum of the additive effects of an individual's alleles. • The additive genetic variance, VA, , is a property of a particular population. It is the variance of the breeding values of individuals in the population.

Multilocus traits
• Do the separate locus effects combine in an additive way, or do there exist non-linear interaction between different loci: epistasis? • Do the genes at different loci segregate independently? • Do the gene expression vary with the environmental context: gene by environment interaction? • Are specic genotypes associated with particular environments: covariation of genotypic values and environmental effects?

Example: epistasis

Average length of vegetative internodes in the lateral branch (in mm) of teosinte. Table from Lynch and Walsh (1998).

Two independently segregating loci
• Extending the least-squares decomposition of X :

X    1  1  2   2  
• Λk is the breeding value of the k'th locus, δk is the dominance deviation of the k'th locus, ε is a residual term due to epistasis; • if the loci are independently segregating

V ( X )  V (1 )  V ( 2 )  V (1 )  V ( 2 )  V ( )  VA,1  VA,2  VD,1  VD,2  V ( )  VA  VD  V ( )

Neglecting V (ε)
• the epistatic variance components contributing to V (ε) are often small compared to VA and VD; • in linkage analysis it is this often assumed that V (ε) = 0; • note however: the relative magnitude of the variance components provide only limited insight into the physiological mode of gene action; • epistatic interactions, can greatly inflate the additive and/or dominance components of variance;

Resemblance between relatives
A model for the trait values of two relatives: Yk = Xk + ek, k = 1 , 2, where for the k’th relative • Yk is the phenotypic value, • Yk is the genotypic value, • ek is the mean zero environmental deviation. • the ek’s are assumed to be mutually independent and also independent of k. Hence, the covariance of the trait values of two relatives is given by the genetic covariance, C(X1; X2), i.e. C(Y1; Y2) = C(X1; X2)

A (preliminary) formula for C(X1 ,X 2)
For a single locus trait

C(X1; X2) = c1VA + c2VD
• c1 and c2 are constants determined by the type of relationship between the two relatives. • same formula applies for multilocus traits if no epistatic variance components are included in the model, i.e. V (ε) = 0. • in this latter case and are given by summation of the corresponding locus-specific contributions.

Joint distribution of sibling trait values

Single biallelic, dominant (k =1 ) model. Correlation 0.46.

Measures of relatedness
• N = the number of alleles shared IBD by two relatives at a given locus; • the kinship coefficient, θ , is given by 2 θ = E(N) / 2;
i.e. twice the kinship coefficient equals the expected proportion of alleles shared IBD at the locus.

• The coefficient of fraternity, Δ, is defined as Δ = P(N = 2).

Some examples
• Siblings (z0; z1; z2) = (1/4; 1/2; 1/4) implying E(N) = 1. Thus θ= 1/4 and Δ = 1/4: • Parent-offspring (z0; z1; z2) = (0; 1; 0) implying E(N) = 1. Thus θ = 1/4 and Δ = 0: • Grandparent - grandchild (z0; z1; z2) = (1/2; 1/2; 0) implying E(N) = 1=2. Thus θ = 1/8 and Δ = 0:

Covariance formula for a single locus
Under the assumed model
1 X 1     i1   1j   ij

X 2     i2   2   ij2 j Cov ( X 1 , X 2 )  Cov ( i1   1j ,  i2   2 ) j
1  Cov ( ij ,  ij2 )

C (Y1 , Y2 )  C ( X 1 , X 2 )  2θVA  VD E(N )  VA  P ( N  2)VD 2

A single locus; perfect marker data
N C(Y1,Y2|N)  VA  I  N  2 VD 2 with 1 if N  2 I {N  2}   0 if N  0 or N  1 i.e. if N  0 0  C (Y1,Y2|N)  VA / 2 if N  1 V  V if N  2 D  A

Covariance formula for multiple loci
n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0
C (Y1 , Y2 )  C ( X 1 , X 2 )  2 VA  VD  2  l



l

V A,l   l VD ,l

 E ( Nl )  V A,l  P ( N l  2) VD ,l  ;   2 

N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly.

Covariance formula for multiple loci
n independently segregating loci assuming no epistatic interaction, i.e. putting V (ε) = 0
C (Y1 , Y2 )  C ( X 1 , X 2 )  2 VA  VD  2  l



l

V A,l   l VD ,l

 E ( Nl )  V A,l  P ( N l  2) VD ,l  ;   2 

N l is the mumber of alleles shared IBD at locus l ; V A,l , VD ,l are locus - specific additive - and dominace variance contributi ons, respective ly.

Covariance... continued
Define for every pair of relatives

 (x)  E[ Nx | MDx] / 2;
and
(x)  2  P( Nx  2 | MDx);

For two related individuals we then have,

C (Y1 , Y2 | MD x )   l
( x)

E[ N l | MD x ] ( VA,l  P ( N l  2 | MD x )VD ,l ) ; 2
(x)

  VA, x   2 VD , x  2VA, x  VD , R

Haseman-Elston method
• Uses pairs of relatives of the same type: most often sib pairs; • for each relative pair calculate the squared phenotypic difference: Z = (Y1 –Y2)2; • given MDx regress the Z's on the expected proportion of alleles IBD, π(x) = E [Nx |MDx]/2, at the test locus; • a slope coefficient β< 0, if statistically significant, is considered as evidence for linkage;

HE: an example

0.5 Proportion of marker alleles identical by decent

Solid line is the tted regression line; Dotted line indicates true underlying relationship

HE: motivation
E[(Y1  Y2 ) ]  V [Y1  Y2 ]
2

 V (Y1 )  V (Y2 )  2C (Y1  Y2 )  2V (Y )  2C (Y1  Y2 )
Assume strictly additive gene action at each locus, i.e.VD = 0. Then, for a putative QTL at x,

E[(Y1  Y2 ) 2 | MD x ]  2V (Y )  2C (Y1  Y2 | MD x )  2V (Y )  2[ ( x )VA, x  2VA, R ] NOTE : This is a linear function in  ( x ) !

HE: linkage test
E[Y1 , Y2|MD x ]     where
( x)

  2[V (Y )  2VA, R ]   2VA, x
The linkage test is H0 :  H1 :   0, ( VA, x  0) vs  0

HE: examples with simulated data

simulated data from n = 200 sib-pairs; top to bottom: h2 = 0:50; 0:33; 0:25.

Heritability and power
• for a given locus we may define the locus-specific heritability as the proportion of the total variance 'explained' by that particular site, e.g. (in the narrowsense), V

h2 

A

V (Y )

• the locus-specific heritability is the single most important parameter for the power of QTL linkage methods; • heritabilities below  10% leads, in general, to unrealistically large sample sizes.

HE: two-point analysis
~ ~    ( m) E[(Y1  Y2 ) | marker genotypes]  
2

where  is the expected proportion of marker alleles shared IBD. ~ •  depends on the type of relatives considered; ~ • for sib pairs   2(1  2 ) 2VA,l ; • recombination fraction (θ) and effect size (VA;l ) are confounded and cannot be separately estimated;
(m )

HE: in summary
Simple, transparent and comparatively robust but:
• • • • poor statistical power in many settings; different types of relatives cannot be mixed; parents and their offspring cannot be used in HE; assumptions of the statistical model not generally satisfied;

• Remedy:
• use one of several suggested extensions of HE; • alternatively, use VCA instead

VCA
QTL Polygenes Mathematically:
Independent environment

Yi=+Tai+gi+qi+ei

Trait value

where  is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error.

VCA: an additive model
Y      i zi   X l  e
i 1 l 1 p n

E (Y )    i 1  i zi ;
p

V (Y )  VA  VD  V (e)  VA, x  VD , x  VA, R  VD , R  V (e) C (Y1 , Y2 | MD x )   VA, x   V
( x) ( x) 2 D,x

 2VA, R  VD , R

VCA: major assumption
The joint distribution of the phenotypic values in a pedigree is assumed to be multivariate normal with the given mean values, variances and covariances; • the multivariate normal distribution is completely specified by the mean values, variances and covariances; • the likelihood, L, of data can be calculated and we can estimate the variance components VA;x; VD;x ; VA;R; VD;R;

VCA: linkage test
The linkage test of H0 : VA;x = VD;x = 0 uses the LOD score statistic

L(full model) LOD x  log 10 L(VA, x  VD, x  0)
When the position of the test locus, x, is varied over a chromosomal region the result can be summarized in a LOD score curve.

VCA vs HE: LOD score proles

From Pratt et al.; Am. J. Hum. Genet. 66:1153-1157, (2000)

Linkage methods for QTL
• Fully parametric linkage approach is difficult; • Model-free tests comprise the alternative choice; • We will discuss Haseman-Elston Regression (HE); Variance Components Analysis (VCA);

Both can be viewed as two-step procedures: 1. use polymorphic molecular markers to extract information on inheritance patterns; 2. evaluate evidence for a trait-influencing locus at specified locations;

Similarities and differences
• HE and VCA are based on estimated IBDsharing given marker data; • both methods require specification of a statistical model! ('model-free' means 'does not require specification of genetic model') • similarity in IBD-sharing is used to evaluate trait similarity using either linear regression (HE) or variance components analysis (VCA);


								
To top