Date Submitted: 28 September 2010
Introduction: Bioinformatics is the use of databases and other computer-based tools to store
and organize information about proteins and other macromolecules in ways that can be easily
searched and retrieved so that scientists and other interested persons can quickly call upon the
enormous wealth of biochemical data that has been compiled over the past several decades. In
this exercise, the amino acid sequence “gakqntgvwl vkvpkylsqq wakasgrgev” was identified and
analyzed using publically-available bioinformatics tools found on the internet.
Materials and Methods: This was a research exercise. No experimental procedures were
Results: The protein fragment was identified as a sequence within RAP30, the β subunit of
TFIIF (General Transcription Factor IIF). The protein fragment was identified as matching both
Homo sapiens and Mus musculus1 RAP30. The full protein sequence for the human form of
TFIIF is shown below.
RAP74 maalgpssqn vteyvvrvpk nttkkynima fnaadkvnfa twnqarlerd lsnkkiyqee
empesgagse fnrklreear rkkygivlke frpedqpwll rvngksgrkf kgikkggvte
ntsyyiftqc pdgafeafpv hnwynftpla rhrtltaeea eeewerrnkv lnhfsimqqr
rlkdqdqded eeekekrgrr kaselrihdl eddlemssda sdasgeeggr ipkakkkapl
akggrkkkkk kgsddeafed sddgdfegqe vdymsdgsss sqeepeskak apqqeegpkg
vdeqsdssee seeekppeed keeeeekkap tpqekkrrkd sseesdssee sdidseassa
lfmakkktpp krerkpsggs srgnsrpgtp saeggstsst lraaaskleq gkrvsempaa
krlrldtgpq slsgkstpqp psgkttpnsg dvqvtedavr ryltrkpmtt kdllkkfqtk
ktglsseqtv nvlaqilkrl nperkmindk mhfslke
RAP30 maergeldlt gakqntgvwl vkvpkylsqq wakasgrgev gklriaktqg rtevsftlne
dlanihdigg kpasvsapre hpfvlqsvgg qtltvftess sdklslegiv vqraecrpaa
senymrlkrl qieesskpvr lsqqldkvvt tnykpvanhq ynieyerkkk edgkraradk
qhvldmlfsa fekhqyynlk dlvditkqpv vylkeilkei gvqnvkgihk ntwelkpeyr
Figure 1: Human TFIIF Sequences. Human TFIIF is composed of two RAP74 α
subunits and 2 RAP30 β subunits.
TFIIF is required for the function of RNA polymerase II; TFIIF binds to PoIII, TFIIB,
TAF250, and DNA in a manner necessary to allow transcription to proceed4. The RAP30
subunit shown in Figure 1 is unique to Homo sapiens, but it is part of the TFIIF_beta
superfamily5 and thus has a vast amount of homologs from a wide variety of organisms.
Organisms that are genetically close to humans, such as Pan troglodytes (chimpanzees), Bos
taurus (cattle), and Mus musculus (mice) have the best primary sequence homology6.
Using an online tool, the molecular weight of RAP30 was calculated to be
28380.1758 g/mol. The pI was estimated to be 9.23, and the UV extinction coefficient was
estimated to be 30800 M-1cm-1 at 278 nm7.
Due to its central role in RNA transcription, many studies have been conducted on TFIIF
to attempt to understand the interaction of its subunits with each other and the environment.
Abstracts from three such studies are shown below.
The small subunit of transcription factor IIF recruits RNA polymerase II into the
We found that transcription factor IIF mediates the association of RNA
polymerase II with promoter sequences containing transcription factors IID, IIB, and IIA
(DAB complex). The resulting DNA-protein complex contained RNA polymerase II and
the two subunits of transcription factor IIF (RAP 30 and RAP 74). Cloned human RAP
30 was sufficient for the recruitment of RNA polymerase II to the DAB complex. This
ability of RAP 30 to recruit RNA polymerase to a promoter is also a characteristic of
sigma factors in prokaryotes.8
RNA polymerase II-associated protein (RAP) 74 binds transcription factor (TF) IIB and
blocks TFIIB-RAP30 binding.
A set of deletion mutants of human RNA polymerase II-associated protein (RAP)
30, the small subunit of transcription factor IIF (TFIIF; RAP30/74), was constructed to
map functional domains. Mutants were tested for accurate transcriptional activity, RAP74
binding, and TFIIB binding. Transcription assays indicate the importance of both N- and
C-terminal sequences for RAP30 function. RAP74 binds to the N-terminal region of
RAP30 between amino acids 1 and 98. TFIIB binds to an overlapping region of RAP30,
localized to amino acids 1-176 (amino acids 27-152 comprise a minimal binding region).
The C-terminal region of RAP74 (amino acids 358-517) binds directly and independently
to TFIIB. Interestingly, RAP74 blocks TFIIB-RAP30 binding, both by binding TFIIB and
by binding RAP30. When the TFIIF complex is intact, therefore, TFIIB-TFIIF contact is
maintained through RAP74. If the TFIIB-RAP30 interaction is physiologically important,
the TFIIF complex must dissociate within some transcription complexes.9
Novel dimerization fold of RAP30/RAP74 in human TFIIF at 1.7 A resolution.
General transcription factor IIF (TFIIF) is required for transcription by RNA
polymerase II; it consists minimally of a heterodimer of RNA polymerase-associated
proteins RAP30 and RAP74. According to solution and mutagenesis studies, the multiple
domains of RAP30 and RAP74 bind PolII, TFIIB, TAF250 and DNA in interactions that
are essential for transcription initiation and elongation. The X-ray structure of the
RAP30/RAP74 interaction domains at 1.7 A resolution reveals a novel "triple barrel"
dimerization fold and suggests with mutant data that interactions with the transcription
apparatus are mediated not only by this tripartite beta-barrel, but also via flexible loops
and alpha and beta-structures extending from it.4
Conclusions: This exercise was performed to demonstrate the wealth of bioinformatics
information that can be readily obtained from credible online sources. In principle, using such
online databases is not any different than consulting any other literature. In practice, searching
through up-to-date databases developed with bioinformatics in mind produces useful results in
far less time that it would take to sift through paper or electronic journals with the same
information. This paper provides an excellent example of this, as all the cited references were
located in less than an hour of research with only the 30 residue fragment given in the
introduction to use as a starting point.
1. Strausberg,R.L., et. al.; Generation and initial analysis of more than 15,000 full-length
human and mouse cDNA sequences, 2002. NCBI.
http://www.ncbi.nlm.nih.gov/protein/20072603?report=genpept (accessed 27 September
2. Finkelstein,A., Kostrub,C.F., Li,J., Chavez,D.P., Wang,B.Q., Fang,S.M., Greenblatt,J. and
Burton,Z.F.; A cDNA encoding RAP74, a general initiation factor for transcription by RNA
polymerase II, 1992. NCBI. http://www.ncbi.nlm.nih.gov/protein/CAA45404.1 (accessed 26
3. Horikoshi,M., Fujita,H., Wang,J., Takada,R. and Roeder,R.G.; Nucleotide and amino acid
sequence of RAP30, 1991. NCBI. http://www.ncbi.nlm.nih.gov/protein/CAA42419.1
(accessed 26 September 2010).
4. Gaiser F, Tan S, Richmond TJ.; Novel dimerization fold of RAP30/RAP74 in human TFIIF
at 1.7 A resolution. J. Mol Biol. [Online] 2000, 302(5), 1199-1127.
http://www.ncbi.nlm.nih.gov/pubmed/11183778?dopt=Abstract (accessed 26 September
5. Marchler-Bauer A et al.; CDD: specific functional annotation with the Conserved Domain
Database. Nucleic Acids Res. [Online] 2009, 37(D)205-10.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=cl02122 (accessed 26 September
6. HomoloGene Result. http://www.ncbi.nlm.nih.gov/homologene/37884 (accessed 26
7. Chris Putnam. Protein Calculator v3.3. http://www.scripps.edu/~cdputnam/protcalc.html
(accessed 26 September 2010)
8. Flores O, Lu H, Killeen M, Greenblatt J, Burton ZF, Reinberg D.; The small subunit of
transcription factor IIF recruits RNA polymerase II into the preinitiation complex. Proc Natl
Acad Sci USA. [Online] 1991, 15;88(22), 9999-10003.
http://www.ncbi.nlm.nih.gov/pubmed/1946469 (accessed 26 September 2010)
9. Fang SM, Burton ZF.; RNA polymerase II-associated protein (RAP) 74 binds transcription
factor (TF) IIB and blocks TFIIB-RAP30 binding. J. Biol Chem. [Online] 1996, 17;271(20),
11703-11709. http://www.ncbi.nlm.nih.gov/pubmed/8662660 (accessed 26 September 2010)