Homework Assignment 4 (Due Tuesday, October 13,2009) by ikk84581


									                                                                                Statistical Computing (2009)

                                       Homework Assignment 4
                                    (Due Tuesday, October 13, 2009)

Please email your tarred and zipped R package to your TA (sboca@jhsph.edu). The data are available on the
class web page as an R binary.

The proband data set contains genotypes for 250 trios of affected probands and parents. There are 20 single
nucleotide polymorphisms (SNPs) of interest, each genotype is represented generically as AA, AB, or BB. The
father, mother, and child in each family are identified by the letters F, M, and C (for example, M17 is the
mother in trio 17).

Independent segregation of alleles means that a child inherits one of the two alleles from each parent, with
equal probabilities for the alleles. For example, if a parent is AB, there is a 50% chance that the offspring
inherits the A allele, and a 50% chance that the offspring inherits the B allele. Only sampling families with
affected probands means that we might be able to see a departure from the assumption of independent
segregation if there is a genotype-phenotype association. For example, if B was the risk allele, we should see
a B allele transmission from parent to offspring in excess of 50%. Note that only heterozygous (AB) parents
are informative.

  1. Write an R function that accepts data in the format of the proband data set, and for each SNP, counts
     the number of transmitted and non-transmitted B alleles from heterozygous parents, across all trios.
     For example, if the parents in a trio are AA and AB, then one allele of the offspring will be A (from
     parent 1), and the second allele could be A or B. Thus, the offspring can be AA or AB. In the former
     case the B allele is non-transmitted, in the latter case it is transmitted. When the parents are for
     example BB and AB, we ignore the homozygous BB parent, as the genotype is not informative for the
     departure from independent segregation. Note that when both parents are heterozygous AB, then we
     have two informative genotypes. If the offspring is AA (AB, BB), the we have 0 (1,2) transmitted B
     alleles and 2 (1,0) non-transmitted B-alleles, respectively. Your function should return an object that
     includes the number of transmitted and non-transmitted B alleles from heterozygous parents for each
     SNP, and the respective p-value derived from testing the null hypothesis of independent segregation (the
     R function binom.test() allows you to do this very conveniently). This object should have a class that
     you specifically defined.

  2. We want to use the generic functions summary() and plot() to operate on objects of your particular
     class. Using S3 methods, write a function that summarizes your data in a table. For each SNP (a row),
     you should include the total number of informative transmission, (which is the sum of) the transmitted
     and the non-transmitted B alleles, and the p-value from the hypothesis test (so 4 columns of numbers,
     annotated with the SNP name). Also write a function to plot the -log10 p-values (y-axis) for all SNPs

  3. Generate an R package that can run the provided proband data set as an example.

To top