Lecture 5 Assaying Genetic Variation by nml23533


									Lecture 5: Assaying Genetic

     September 11, 2009
                  Exam 1
September 30 in lab

I will be out of town September 26 through
 September Oct 2

Baneshwar will do review September 28 and
 Gancho Slavov will help out

Old exam posted at:
              Last Time
Hardy-Weinberg review

More about dominant loci and allele

Hypothesis testing

Detecting departures from H-W
Measures of genetic variation

Hardy-Weinberg departures

Exact tests for assessing
 departures from expectations
What is a Population?
Operational definition:
 an assemblage of

Population genetics
 definition: a collection
 of randomly mating

Why does this
Measuring diversity
  Allele frequency is same as
   sampling probability

  Two allele system: frequency
   of one allele provides
   frequency of other: p and q

  Homozygotes: individuals
   with the same allele at both
   homologous loci

  Heterozygotes: individuals
   with different alleles at
   homologuous loci
Dominance and Additivity
 Dominance: masking of action of
  one allele by another allele

    Homozygotes indistinguishable
     from heterozygotes
 Additivity: phenotype can be
  perfectly predicted from

    Intermediate heterozygote
 Codominant: both alleles are
  apparent in genotype: does NOT
  refer to phenotype!

Hardy-Weinberg Law
 Hardy and Weinberg came up with
  this simultaneously in 1908

 Frequencies of genotypes can be
  predicted from allele frequencies
  following one generation of
  random mating

 Assumptions:

    Infinite population
    Random mating
    No selection
    No migration
    No Mutation
Hardy-Weinberg Law and Probability
                 A(p)                  a(q)

        A (p)   AA (p2)              Aa (pq)

        a (q)   aA (qp)              aa (q2)

                 p2 + 2pq + q2 = 1
Expected Heterozygosity
If a population is in Hardy-Weinberg Equilibrium, the
  probability of sampling a heterozygous individual at a
  particular locus is the Expected Heterozygosity:
      2pq
         for 2-allele, 1 locus system


      1-(p2 + q2) or 1-Σ(expected homozygosity)                        n

         more general: what’s left over after calculating
                                                            H E  1   p 2i ,
                                                                       i 1
          expected homozygosity

   Homozygosity is overestimated at small
   sample sizes. Must apply correction factor:
                                                          2N         n
          Correction for bias in
         parameter estimates by
                                                   HE           1   p 2i ,
            small sample size
                                                        2 N  1  i 1 
Maximum Expected Heterozygosity
  Expected heterozygosity is maximized when all
   allele frequencies are equal

  Approaches 1 when number of alleles = number
   of chromosomes

                                               2N 1
                            2                       2
                      1              1 
H E(m ax)  1             1  2N      
                i 1  2 N            2N     2N
  Applying small sample correction factor:

           2N         n
                                 2N  2N 1 
    HE           1   p 2i              1
         2 N  1  i 1  2 N  1  2 N 
Departures from Hardy-Weinberg
   Chi-Square test is simplest (frequentist) way to
    detect departures from Hardy-Weinberg

   Compare calculated Chi-Square value versus “critical
    value” to determine if significant departures are
                 Observed Heterozygosity
 Proportion of individuals in a population that are
  heterozygous for a particular locus:

  HO   
         N      ij     Where Nij is the number of diploid
                      individuals with genotype AiAj, and i ≠ j
 Difference between observed and expected heterozygosity
  will become very important soon

 This is NOT how we test for departures from Hardy-
  Weinberg equilibrium!
How do you calculate deviations from Hardy-
       Weinberg for this example?
   Observations of Malate Dehydrogenase Genotype
              Frequencies in Drosophila
Meaning of P-value

   Probability of a Chi-square value of the
    calculated magnitude or greater if the null
    hypothesis is true

   Critical values are not magical numbers

   Important to state hypotheses correctly

   Interpret results within parameters of test

    p<0.05: The null hypothesis of no
      significant departure from Hardy-
      Weinberg equilibrium is rejected.
Alternatives to Chi-Square Calculation

   If expected numbers are very small (less than
    5), Chi-square distribution is not accurate

   Exact tests are required if small numbers of
    expected genotypes are observed

   Essentially a sample-point method based on
      Sample space is too large to sample exhaustively
      Take a random sample of all possible outcomes
      Determine if observed values are extreme compared to
       simulated values
   Fisher’s Exact Test in lab last time
  Exact Tests for Detecting Departures from
              Expected Patterns
 Father of exact tests: R.A. Fisher

 Prompted by a dispute over tea
Applying Fisher’s Exact Test to Hardy Weinberg

 Probability of observing a particular group of genotypes
  follows a multinomial probability distribution:
                           N!         2N    2N
            P(Data)                p1 11 p2 22 (2 p1 p2 ) N12 ,
                      N11!N 22!N12!

                                              Expected          Expected
             Homozygote     Heterozygote
                                            Frequency of      Frequency of
               Counts          Count
                                            Homozygotes      Heterozygotes

 How extreme is your distribution of genotypes relative
  to what would be expected by chance if genotypes
  follow Hardy-Weinberg proportions?
Probability of Observing Mdh Genotypic

                     N!               2N    2N
P(Data)                                         2N
                                    p1 11 p2 22 p3 33 (2 p1 p2 ) N12 (2 p2 p3 ) N 23 (2 p1 p3 ) N13
          N11!N 22!N12!N 23!N13N 33

         Is this an extremely low probability?

         How many combinations must we calculate?
              114! = 2.5 x 10186
         In practice, this sample space must be

         Monte-Carlo Markov Chain Methods often used
               Example: Merling Pattern in Dogs
 Merle or “dilute” coat color is a desired trait in collies and
  other breeds

 Homozygotes for mutant gene lack most coat color
    Heterozygotes         Homozygous mutants   Homozygous wild-type

     N=2531                    N=197


To top