Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Word document - Patrice Koehl's research web site by ajizai


									                                      ECS 129
                   Assignment: Option1 (No programming)

Due: Wednesday, November 23, 2011

Genes and proteins

Problem1: Gene Finder

You have just sequenced a shot segment of DNA. You wish to analyze this DNA
sequence to determine whether it could encode a protein.


   1) Find the longest possible coding region (also called open reading frame, or ORF).
      Remember that there are six possibilities to check
   2) Label which strand on the DNA will be the coding strand, and which will be the
      template strand when this DNA is transcribed
   3) Transcribe this ORF into mRNA, indicating the 5’ and 3’ ends
   4) Translate this mRNA into a protein sequence, indicating the N and C termini.

Problem 2: Identification of a disease
A short fragment of a protein has been identified as a marker for a human genetic disease.
This fragment is:

   1) Identify the wild type human protein associated with this fragment. (Blast can be
      run at:
   2) Obtain the one letter AA code sequence of the full wild type protein
   3) Describe the single mutation between the wild type sequence and the marker
      given above.
   4) This mutation has been associated with an inherited disorder. Find the name of
      that disorder, and write a small paragraph on the nature, and consequences of this
   5) Visit the Genome pages at Ensembl ( Search the
      Human Genome using the wild type sequence corresponding to the fragment of
      protein given above (make sure to use the wild type fragment of the same size;
      use BLASTP, and use “exact match”). Show the location(s) of the corresponding
      gene projected on a human karyotype.

Problem 3: Accessing and Analyzing Structures from the Protein Data Bank

The Protein Data Bank, or PDB, is the best place to obtain 3-Dimensional structures of
proteins, nucleic acids and other macromolecules. From the PDB site, you can locate proteins by keyword searching or by entering the
PDB accession number for the structure file, like 5PTI. Details on the molecule (how the
structure was determined, pertinent research articles, position of secondary structures,
unusual amino acids, etc) can be found on the RCSB web site but also in the PDB file
PDB files are just formatted text files, so you can open them in a text editor or even Word
and read them. There is a wealth of information there! The molecular viewing programs
use the ATOM records in the file, which contain residue name, residue number, atom
name and atom number. Often, the structure files include other molecules besides the
protein, such as water molecules, nucleic acid and bound ligands.

    1) Find a protein that contains the unusual amino acid HYP. What is this amino
    2) We first look at the structure of an HIV protease. We choose the PDB file 1A30.
       Download this PDB file, and display its content using RASMOL or PYMOL.
       This structure shows HIV protease as a dimer (chain A and B), bound to a small
       tri-peptide (chain C, sequence EDL). Generate an image in which the ligand is
       shown in spacefilling mode (CPK), and the protein dimer is shown in cartoon
       mode. Save this picture in GIF or PNG format, and include it in your report
    3) Now we look at the PDB file 1CWA, which contains the structure of the
       complex between two proteins. Find the protein names, sizes and secondary
       structure contents. Identify the non standard amino acids in chain C.
           a) Describe briefly the importance of this structure
           b) Generate a GIF or PNG image containing a CPK representation of the

Good Luck !

To top