Allred, Adam Forrest

Document Sample
scope of work template
							                                  Allred, Adam
                 The CFTR Protein: Evolution and Disease
                      Faculty Mentor: David McClellan, Integrative Biology

The principal aim of my research has been to aid in the understanding of complex molecular
pathologies by identifying evolutionary patterns that maintain the structural and functional
integrity of a particular protein product. A disease phenotype often results when such delicate
patterns are disturbed. The Cystic Fibrosis Transmembrane conductance Regulator (CFTR), an
epithelial chloride channel, has proven to be an especially worthy vehicle for understanding the
link between molecular evolution and human disease. Aberrant functioning of this
physiologically important channel results in cystic fibrosis, the most common lethal genetic
disorder among European-derived populations.

Understanding evolutionary and pathological patterns of mutation in CFTR has required the
confluence of data from multiple databases. These include databases devoted to the storage of
disease-causing mutations, such as the Cystic Fibrosis Mutation Database and the Human Gene
Mutation Database. In addition, molecular databases such as GenBank provide an evolutionary
context in which the significance of single amino acid changes in CFTR can be ascertained.
Single amino acid changes can be characterized by changes in their biochemical properties (i.e.
hydrophobicity, alpha-helical tendencies, etc.). AAindex is the database of choice for accessing
amino acid property values for over 500 biochemical properties.

Relying upon the aforementioned databases, I generated a matrix to describe holistically the
biochemical consequences of 447 deleterious amino acid changes on CFTR. For each amino
acid change, the difference in each of 515 amino acid properties was calculated. The resulting
matrix is of size 447 x 515, with rows corresponding to the deleterious amino acid changes and
columns corresponding to the differences in the amino acid properties. The following nested
loop was used to generate the matrix:

For(each mutation) {
  For(each property) {
    Calculate_the_difference(wild_type_amino_acid, mutant_amino_acid);
  }
}

The average difference across all amino acid changes was then calculated for each property.

A second data set consisting of 888 “natural” amino acid changes observed in the course of
evolution of CFTR was prepared using PAML.2 The same nested loop for calculating amino
acid property differences was applied to this second set of amino acid changes in order to assess
their biochemical consequences and compare these differences to those associated with
pathological amino acid changes. A statistical analysis (including a Bonferroni correction) of the
two data sets (evolutionary and pathological) revealed significant differences between the natural
and deleterious amino acid changes for 286 properties. For 285 of these properties, the change
associated with deleterious amino acid changes was significantly greater than the change
associated with natural amino acid changes.

Thus, by comparing evolutionary and pathological sets of amino acid changes, we have
identified those properties that play an integral role in the determination of cystic fibrosis.
Likewise, we have identified those properties that are not inextricably linked to the pathology of
CFTR. Such properties evince equally large average differences among evolutionary and
pathological sets of single amino acid changes. For these properties, evolution may take a little
leeway when replacing a wild type amino acid with a mutant one.

While some patterns of mutation may be pathology-specific, others appear to be universal. An
example of a pattern that falls into the latter category is the overabundance of deleterious amino
acid changes at conserved sites. While this pattern has previously been observed in CFTR and
other genes associated with disease by Miller and Kumar1, we have demonstrated that this
pattern extends to genes such as p53 and BRCA1, which are associated with cancer (Fig. 1).
Such a non-random distribution of deleterious amino acid changes along the sequence of proteins
associated with disease appears to be a shared characteristic among diverse pathologies. I
automated the testing of overabundance at conserved sites by writing a program in Java that
takes the following parameters: an alignment of a set of orthologous protein sequences, a set of
known deleterious amino acid changes in the protein of interest, and a definition of conservation.

1. Miller, M. and S. Kumar. 2001. Understanding human disease mutations through the use of
interspecific genetic variation. Human Molecular Genetics 10:2319-2328.

2. Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood.
Comput Appl Biosci 13:555-6.

                                                                                                        Figure 1: T-statistics for five human
                              t-statistics for Human Proteins                                           proteins associated with disease.
                                                                                                        These data were obtained from a
 10
                                                                                             critical   program that I wrote in JAVA to test
  9
                                                                                             values*    the hypothesis of overabundance of
  8
                                                                                                        deleterious amino acid changes at
  7                                                                                          a = .001
                                                                                                        conserved sites. The analysis
  6
                                                                                                        returned positive results for all
  5                                                                                          a = .05
  4
                                                                                                        proteins tested. For four of the
  3
                                                                                                        proteins (CFTR, F9, p53, and PAH),
  2                                                                                                     significance was achieved at the
  1                                                                                                     alpha = .001 level (a very high level
  0                                                                                                     of confidence). For one protein
         BRCA1             CFTR              F9              p53            PAH                         (BRCA1), significance was
  *Note that while critical values vary slightly for each protein, this does not affect the levels      achieved at the alpha = .05 level
   of significance shown.                                                                               (less confidence, but still
                                                                                                        considered significant).

						
Related docs