Loci Worksheet - DOC by fce12223


More Info
									                   EXERCISE 3
        For your report, use the worksheet (available for download on WebCT), turn in all of
the printouts, in order, clearly labeled with the section heading (e.g. D3). Also turn in the
answers to all the questions asked throughout the exercise. Instead of answering these
questions in your laboratory notebook, type them into your worksheet as you work on the
exercise or you may hand-write them on the printouts.

First Lab Period

Analysis of Protein Structure

        In this exercise, you will visualize the structure of a protein of your choice that has
the coordinates deposited in the Protein Data Bank. You will use viewing software to
analyze and describe structural motifs found in the protein. You will then compare the
amount and location of secondary structural elements found in the crystal structure to those
predicted by a secondary structure prediction algorithm.

A.     Pick a protein with known structure.

1.     Decide on a protein you are interested in. You may choose any protein or enzyme of
       interest but it must have a solved crystal or NMR structure. Caution: Many
       membrane proteins do not have solved structures. (But the ones that do are very
2.     Just in case the protein you picked is not in the database, also choose another
       protein or two as a backup.

B.     Find the protein in the Protein Databank.

1.     Open the web browser of your choice and go to the PDB at the Research Consortium
       for Structural Biology (http://www.rcsb.org/pdb/).
2.     Find the structure of your protein of interest by typing the name of your protein in the
       search window. (Caution: A common name, like immunoglobulin, will pull up a
       huge list of proteins. Many of these will be the same protein crystallized under
       different conditions while others will be mutant versions of the wild-type protein.) If
       the protein you picked does not appear to be in the database, choose another
3.     You will see a list of structures that match your query. Search through them until you
       find one that interests you. (Alternatively, just pick the first structure on the list.)

4.     Record the full name of your protein (e.g., “A Ykr043C/ fructose-1,6-bisphosphate
       product complex following ligand soaking”) and the PDB ID of your protein. (The
       PDB ID is at the top left of the frame. It is usually a combination of four letters and
       numbers, e.g., 3LG2.)
5.     Link to the page for your protein of interest by clicking on the structure title.

6.     The biological function of the protein can often be found by choosing the “Biology &
       Chemistry” tab. Describe the biological function of the protein, along with any other
       information you find interesting.

C.     View and analyze the protein structure from the PDB.

         Once you have pulled up your protein in the PDB, there are numerous ways to view
the structure. Clicking on "View in Jmol" (in the right-hand menu panel) opens an
application that allows you to view the structure of your protein. Explore the capabilities of
Jmol by left- or right-clicking in the viewing window. Make notes as you investigate how to
zoom in, how to view individual atoms vs. secondary structures, etc. When you are finished,
click your browser’s back button, then click on “Other viewers”, then on “Kiosk”. Explore
the capabilities of Kiosk, also, making notes as you did for Jmol. Compare these two viewers
and describe their strengths and weaknesses in your worksheet. Use Kiosk to examine the
tertiary structure of your protein. How much helix and sheet do you see? How are the
secondary structural elements arranged? Look for motifs like , and  hairpin. Does
your protein fit into a particular category, such as an  barrel, or a helical bundle?
Describe the protein's structure in your worksheet.

D.     Retrieve the sequence of your protein.

1.     Return to the PDB page for your protein. Note the organism from which your protein
       came, and whether it was wild-type or mutant.

2.     View the sequence(s) of your protein by clicking on the “sequence” tab at the top.
       Note that the sequence is annotated, meaning that they show information about it,
       such as which areas are -helices and which are -sheets.

3.     Download the sequence of the protein by clicking on "fasta ". (If you don’t see this
       word, search for it with the “find” function in your browser.)

3.     Open the sequence using MS Word and SAVE the sequence in Word on your usb
       drive for later use.

4.      Print out the sequence to add at the back of your worksheet set. (It’s best to do this
       at home, so that you don’t have to wait for the printer in the computer lab.)

E.     Predict the secondary structure of your protein.

1.     Open a new browser window or tab.
2.     Use the following address to get to a program that will predict secondary structure
       from your sequence: http://bioinf.cs.ucl.ac.uk/psipred/. The name of this program is
3.     Copy the sequence you saved in Word in the previous section. Paste your sequence
       into the sequence window. Scroll down and enter your PDBID into the “short
       identifier”. Submit your sequence.
4.     Your sequence will again be annotated, indicating which residues are predicted to
       form  helices (H) and  strands (E). Click on “Download PDF version”, , then
       “OK”. Print out the results at home and attach them to your worksheet.

5.     Estimate the percent helix and sheet in your known protein, as determined from
       PSIPRED. Does this prediction match with the percent of -helices and -sheets
       shown in the three-dimensional picture of your protein? Based on this one
       “experiment”, how good do you think this prediction software is?

Second Lab Period

Homology Searching of the Arabidopsis Genome

       In this exercise you will identify sequences in the Arabidopsis genome that have
homology to yeast sequences of known biochemical function. You will then align those
sequences to evaluate whether overall homology or amino-terminal features correlate with
one aspect of protein function, the subcellular localization of the protein.

A.     Obtain a yeast protein sequence to query the Arabidopsis genome.

1.     Go to the NCBI Entrez Website at http://www.ncbi.nlm.nih.gov/Entrez/ and search
       the available databases using one of the enzyme names below, limiting the search to
       Saccharomyces cerevisiae by using the AND command (e.g., enter Pyruvate Kinase
       AND Saccharomyces cerevisiae):

       Enzyme Names:
       Pyruvate dehydrogenase (any subunit)
       Pyruvate kinase

2.     The results are grouped according to the various databases.         Choose “Protein:
       sequence database”.

3.     From the list of proteins that comes up, choose a result that contains the sequence of
       the enzyme of interest. (Some results may be of proteins that are related to the
       enzyme of interest, but are not the enzyme itself.) Write or copy and paste the exact
       name of the protein you selected into your worksheet.

5.     Display the FASTA format (obtained below the name of the enzyme) for the yeast
       protein. Copy the sequence and paste it into Notepad or Word. Copy and paste the
       sequence into your worksheet.
B.     Query the Arabidopsis genome with the yeast sequence.

1.     Open a new browser window or tab and go to the Arabidopsis Information Resource
       website at http://www.arabidopsis.org/.

2.     Under the “Tools” tab, click on WU-BLAST. Paste in the yeast sequence from part
       A, and then perform a BlastP search against the TAIR9 Proteins (Protein) dataset.
       Why do you use the BlastP program?

3.     Highlight the text and graphics from “Summary of BLAST Results” through the first
       sequence alignment. Copy and paste the highlighted section into your worksheet.

4.     Identify which sequences you think have the same functionality as the yeast
       sequence. Justify your choice based on the score, the probability value and the

5.     From the results of your homology search, copy and paste all of the locus names (e.g.
       “At1g32440”) for your gene family (the sequences you decided share the same
       function as the yeast sequence, approx. 8-10 loci) into your worksheet.

C.     Predict the subcellular localization of the enzymes encoded by the gene family.

1. Open a new window and again go to the Arabidopsis Information Resource website at
   http://www.arabidopsis.org/. Under the “Search” listing, open the “Proteins” page.

2. Click on “Bulk Protein Search” (in the first paragraph near the top of the page).

3. Choose the “text” format.

4. Under “Output”, check the “Intracellular Location” box (since our goal is to find the
   subcellular location of each of the proteins). Make sure that all of the other output boxes
   are not checked, or you will get a lot of information you don’t need.

5. Under “Perform search in following subset”, enter the locus ID’s of all the genes in your

6. Click “Get Protein Data”.

7. The results show the predicted subcellular location of each protein, base on its sequence.
   Copy and paste the results into your worksheet. (You’ll analyze them later.)

D.     Perform a Multiple Sequence Alignment and Create a Homology Tree

1.     To get the sequences of your family of proteins, go to “Search Overview” under the
       “Search” tab. Then click on “Sequence”.

2.     Enter the locus ID’s of all the genes in your family. Under “Dataset”, choose “AGI
       Protein Sequences. Click “Get Sequences”.
3.     Highlight and copy all the sequences, inclucding their locus names, to the clipboard.

4.     Open the EBI ClustalW site at http://www.ebi.ac.uk/Tools/msa/clustalw2/,

5.     Paste in your sequences and submit your sequences for analysis.

6.     The results show the alignment of the sequences of all of the proteins in your family.
       Each protein sequence is shown, one above the other. (When the sequences get to the
       right side of the page, they continue below the set.) Amino acids are denoted with
       their one-letter abbreviation. Amino acids that are missing are denoted as “-”.
       Examine the alignment. Do all of the sequences start at the same amino acid? Do
       they end at the same amino acid? Are there any obvious features associated with the
       subcellular localization? (Hint: Check the amino and carboxy termini.)

7.     Click on “Guide Tree”, then be patient as the results come up.

8.     At the bottom of the results page, there is a phylogram (family tree or homology
       tree). Draw the phylogram by hand (since it doesn’t allow you to copy it).

9.     Indicate the predicted subcellular localization next to each locus name on the
       cladogram. Compare the predicted subcellular localization to the groupings shown
       on the phylogram. Is there any relationship between the phylogram and the
       subcellular localization? Explain.

Answer the questions in the worksheet.

To top