Assignment: Option1 (No programming)
Due: Wednesday, November 23, 2011
Genes and proteins
Problem1: Gene Finder
You have just sequenced a shot segment of DNA. You wish to analyze this DNA
sequence to determine whether it could encode a protein.
5’ TCAATGTAACGCGCTACCCGGAGCTCTGGGCCCAAATTTCATCCACT 3’
1) Find the longest possible coding region (also called open reading frame, or ORF).
Remember that there are six possibilities to check
2) Label which strand on the DNA will be the coding strand, and which will be the
template strand when this DNA is transcribed
3) Transcribe this ORF into mRNA, indicating the 5’ and 3’ ends
4) Translate this mRNA into a protein sequence, indicating the N and C termini.
Problem 2: Identification of a disease
A short fragment of a protein has been identified as a marker for a human genetic disease.
This fragment is:
1) Identify the wild type human protein associated with this fragment. (Blast can be
run at: http://www.ncbi.nlm.nih.gov/blast/)
2) Obtain the one letter AA code sequence of the full wild type protein
3) Describe the single mutation between the wild type sequence and the marker
4) This mutation has been associated with an inherited disorder. Find the name of
that disorder, and write a small paragraph on the nature, and consequences of this
5) Visit the Genome pages at Ensembl (http://www.ensembl.org). Search the
Human Genome using the wild type sequence corresponding to the fragment of
protein given above (make sure to use the wild type fragment of the same size;
use BLASTP, and use “exact match”). Show the location(s) of the corresponding
gene projected on a human karyotype.
Problem 3: Accessing and Analyzing Structures from the Protein Data Bank
The Protein Data Bank, or PDB, is the best place to obtain 3-Dimensional structures of
proteins, nucleic acids and other macromolecules. From the PDB site
http://www.rcsb.org, you can locate proteins by keyword searching or by entering the
PDB accession number for the structure file, like 5PTI. Details on the molecule (how the
structure was determined, pertinent research articles, position of secondary structures,
unusual amino acids, etc) can be found on the RCSB web site but also in the PDB file
PDB files are just formatted text files, so you can open them in a text editor or even Word
and read them. There is a wealth of information there! The molecular viewing programs
use the ATOM records in the file, which contain residue name, residue number, atom
name and atom number. Often, the structure files include other molecules besides the
protein, such as water molecules, nucleic acid and bound ligands.
1) Find a protein that contains the unusual amino acid HYP. What is this amino
2) We first look at the structure of an HIV protease. We choose the PDB file 1A30.
Download this PDB file, and display its content using RASMOL or PYMOL.
This structure shows HIV protease as a dimer (chain A and B), bound to a small
tri-peptide (chain C, sequence EDL). Generate an image in which the ligand is
shown in spacefilling mode (CPK), and the protein dimer is shown in cartoon
mode. Save this picture in GIF or PNG format, and include it in your report
3) Now we look at the PDB file 1CWA, which contains the structure of the
complex between two proteins. Find the protein names, sizes and secondary
structure contents. Identify the non standard amino acids in chain C.
a) Describe briefly the importance of this structure
b) Generate a GIF or PNG image containing a CPK representation of the
Good Luck !