Statistical Computing (2009) Homework Assignment 4 (Due Tuesday, October 13, 2009) Please email your tarred and zipped R package to your TA (firstname.lastname@example.org). The data are available on the class web page as an R binary. The proband data set contains genotypes for 250 trios of aﬀected probands and parents. There are 20 single nucleotide polymorphisms (SNPs) of interest, each genotype is represented generically as AA, AB, or BB. The father, mother, and child in each family are identiﬁed by the letters F, M, and C (for example, M17 is the mother in trio 17). Independent segregation of alleles means that a child inherits one of the two alleles from each parent, with equal probabilities for the alleles. For example, if a parent is AB, there is a 50% chance that the oﬀspring inherits the A allele, and a 50% chance that the oﬀspring inherits the B allele. Only sampling families with aﬀected probands means that we might be able to see a departure from the assumption of independent segregation if there is a genotype-phenotype association. For example, if B was the risk allele, we should see a B allele transmission from parent to oﬀspring in excess of 50%. Note that only heterozygous (AB) parents are informative. 1. Write an R function that accepts data in the format of the proband data set, and for each SNP, counts the number of transmitted and non-transmitted B alleles from heterozygous parents, across all trios. For example, if the parents in a trio are AA and AB, then one allele of the oﬀspring will be A (from parent 1), and the second allele could be A or B. Thus, the oﬀspring can be AA or AB. In the former case the B allele is non-transmitted, in the latter case it is transmitted. When the parents are for example BB and AB, we ignore the homozygous BB parent, as the genotype is not informative for the departure from independent segregation. Note that when both parents are heterozygous AB, then we have two informative genotypes. If the oﬀspring is AA (AB, BB), the we have 0 (1,2) transmitted B alleles and 2 (1,0) non-transmitted B-alleles, respectively. Your function should return an object that includes the number of transmitted and non-transmitted B alleles from heterozygous parents for each SNP, and the respective p-value derived from testing the null hypothesis of independent segregation (the R function binom.test() allows you to do this very conveniently). This object should have a class that you speciﬁcally deﬁned. 2. We want to use the generic functions summary() and plot() to operate on objects of your particular class. Using S3 methods, write a function that summarizes your data in a table. For each SNP (a row), you should include the total number of informative transmission, (which is the sum of) the transmitted and the non-transmitted B alleles, and the p-value from the hypothesis test (so 4 columns of numbers, annotated with the SNP name). Also write a function to plot the -log10 p-values (y-axis) for all SNPs (x-axis). 3. Generate an R package that can run the provided proband data set as an example.
Pages to are hidden for
"Homework Assignment 4 (Due Tuesday, October 13,2009)"Please download to view full document