# Homework Assignment 4 (Due Tuesday, October 13,2009) by ikk84581

VIEWS: 10 PAGES: 1

• pg 1
```									                                                                                Statistical Computing (2009)

Homework Assignment 4
(Due Tuesday, October 13, 2009)

Please email your tarred and zipped R package to your TA (sboca@jhsph.edu). The data are available on the
class web page as an R binary.

The proband data set contains genotypes for 250 trios of aﬀected probands and parents. There are 20 single
nucleotide polymorphisms (SNPs) of interest, each genotype is represented generically as AA, AB, or BB. The
father, mother, and child in each family are identiﬁed by the letters F, M, and C (for example, M17 is the
mother in trio 17).

Independent segregation of alleles means that a child inherits one of the two alleles from each parent, with
equal probabilities for the alleles. For example, if a parent is AB, there is a 50% chance that the oﬀspring
inherits the A allele, and a 50% chance that the oﬀspring inherits the B allele. Only sampling families with
aﬀected probands means that we might be able to see a departure from the assumption of independent
segregation if there is a genotype-phenotype association. For example, if B was the risk allele, we should see
a B allele transmission from parent to oﬀspring in excess of 50%. Note that only heterozygous (AB) parents
are informative.

1. Write an R function that accepts data in the format of the proband data set, and for each SNP, counts
the number of transmitted and non-transmitted B alleles from heterozygous parents, across all trios.
For example, if the parents in a trio are AA and AB, then one allele of the oﬀspring will be A (from
parent 1), and the second allele could be A or B. Thus, the oﬀspring can be AA or AB. In the former
case the B allele is non-transmitted, in the latter case it is transmitted. When the parents are for
example BB and AB, we ignore the homozygous BB parent, as the genotype is not informative for the
departure from independent segregation. Note that when both parents are heterozygous AB, then we
have two informative genotypes. If the oﬀspring is AA (AB, BB), the we have 0 (1,2) transmitted B
alleles and 2 (1,0) non-transmitted B-alleles, respectively. Your function should return an object that
includes the number of transmitted and non-transmitted B alleles from heterozygous parents for each
SNP, and the respective p-value derived from testing the null hypothesis of independent segregation (the
R function binom.test() allows you to do this very conveniently). This object should have a class that
you speciﬁcally deﬁned.

2. We want to use the generic functions summary() and plot() to operate on objects of your particular
class. Using S3 methods, write a function that summarizes your data in a table. For each SNP (a row),
you should include the total number of informative transmission, (which is the sum of) the transmitted
and the non-transmitted B alleles, and the p-value from the hypothesis test (so 4 columns of numbers,
annotated with the SNP name). Also write a function to plot the -log10 p-values (y-axis) for all SNPs
(x-axis).

3. Generate an R package that can run the provided proband data set as an example.

```
To top