EXERCISE 3 COMPUTER-AIDED ANALYSIS OF BIOLOGICAL MOLECULES III. PROCEDURE For your report, use the worksheet (available for download on WebCT), turn in all of the printouts, in order, clearly labeled with the section heading (e.g. D3). Also turn in the answers to all the questions asked throughout the exercise. Instead of answering these questions in your laboratory notebook, type them into your worksheet as you work on the exercise or you may hand-write them on the printouts. First Lab Period Analysis of Protein Structure In this exercise, you will visualize the structure of a protein of your choice that has the coordinates deposited in the Protein Data Bank. You will use viewing software to analyze and describe structural motifs found in the protein. You will then compare the amount and location of secondary structural elements found in the crystal structure to those predicted by a secondary structure prediction algorithm. A. Pick a protein with known structure. 1. Decide on a protein you are interested in. You may choose any protein or enzyme of interest but it must have a solved crystal or NMR structure. Caution: Many membrane proteins do not have solved structures. (But the ones that do are very cool!) 2. Just in case the protein you picked is not in the database, also choose another protein or two as a backup. B. Find the protein in the Protein Databank. 1. Open the web browser of your choice and go to the PDB at the Research Consortium for Structural Biology (http://www.rcsb.org/pdb/). 2. Find the structure of your protein of interest by typing the name of your protein in the search window. (Caution: A common name, like immunoglobulin, will pull up a huge list of proteins. Many of these will be the same protein crystallized under different conditions while others will be mutant versions of the wild-type protein.) If the protein you picked does not appear to be in the database, choose another protein. 3. You will see a list of structures that match your query. Search through them until you find one that interests you. (Alternatively, just pick the first structure on the list.) 4. Record the full name of your protein (e.g., “A Ykr043C/ fructose-1,6-bisphosphate product complex following ligand soaking”) and the PDB ID of your protein. (The PDB ID is at the top left of the frame. It is usually a combination of four letters and numbers, e.g., 3LG2.) 5. Link to the page for your protein of interest by clicking on the structure title. 6. The biological function of the protein can often be found by choosing the “Biology & Chemistry” tab. Describe the biological function of the protein, along with any other information you find interesting. C. View and analyze the protein structure from the PDB. Once you have pulled up your protein in the PDB, there are numerous ways to view the structure. Clicking on "View in Jmol" (in the right-hand menu panel) opens an application that allows you to view the structure of your protein. Explore the capabilities of Jmol by left- or right-clicking in the viewing window. Make notes as you investigate how to zoom in, how to view individual atoms vs. secondary structures, etc. When you are finished, click your browser’s back button, then click on “Other viewers”, then on “Kiosk”. Explore the capabilities of Kiosk, also, making notes as you did for Jmol. Compare these two viewers and describe their strengths and weaknesses in your worksheet. Use Kiosk to examine the tertiary structure of your protein. How much helix and sheet do you see? How are the secondary structural elements arranged? Look for motifs like , and hairpin. Does your protein fit into a particular category, such as an barrel, or a helical bundle? Describe the protein's structure in your worksheet. D. Retrieve the sequence of your protein. 1. Return to the PDB page for your protein. Note the organism from which your protein came, and whether it was wild-type or mutant. 2. View the sequence(s) of your protein by clicking on the “sequence” tab at the top. Note that the sequence is annotated, meaning that they show information about it, such as which areas are -helices and which are -sheets. 3. Download the sequence of the protein by clicking on "fasta ". (If you don’t see this word, search for it with the “find” function in your browser.) 3. Open the sequence using MS Word and SAVE the sequence in Word on your usb drive for later use. 4. Print out the sequence to add at the back of your worksheet set. (It’s best to do this at home, so that you don’t have to wait for the printer in the computer lab.) E. Predict the secondary structure of your protein. 1. Open a new browser window or tab. 2. Use the following address to get to a program that will predict secondary structure from your sequence: http://bioinf.cs.ucl.ac.uk/psipred/. The name of this program is PSIPRED. 3. Copy the sequence you saved in Word in the previous section. Paste your sequence into the sequence window. Scroll down and enter your PDBID into the “short identifier”. Submit your sequence. 4. Your sequence will again be annotated, indicating which residues are predicted to form helices (H) and strands (E). Click on “Download PDF version”, , then “OK”. Print out the results at home and attach them to your worksheet. 5. Estimate the percent helix and sheet in your known protein, as determined from PSIPRED. Does this prediction match with the percent of -helices and -sheets shown in the three-dimensional picture of your protein? Based on this one “experiment”, how good do you think this prediction software is? Second Lab Period Homology Searching of the Arabidopsis Genome In this exercise you will identify sequences in the Arabidopsis genome that have homology to yeast sequences of known biochemical function. You will then align those sequences to evaluate whether overall homology or amino-terminal features correlate with one aspect of protein function, the subcellular localization of the protein. A. Obtain a yeast protein sequence to query the Arabidopsis genome. 1. Go to the NCBI Entrez Website at http://www.ncbi.nlm.nih.gov/Entrez/ and search the available databases using one of the enzyme names below, limiting the search to Saccharomyces cerevisiae by using the AND command (e.g., enter Pyruvate Kinase AND Saccharomyces cerevisiae): Enzyme Names: Pyruvate dehydrogenase (any subunit) Pyruvate kinase 2. The results are grouped according to the various databases. Choose “Protein: sequence database”. 3. From the list of proteins that comes up, choose a result that contains the sequence of the enzyme of interest. (Some results may be of proteins that are related to the enzyme of interest, but are not the enzyme itself.) Write or copy and paste the exact name of the protein you selected into your worksheet. 5. Display the FASTA format (obtained below the name of the enzyme) for the yeast protein. Copy the sequence and paste it into Notepad or Word. Copy and paste the sequence into your worksheet. B. Query the Arabidopsis genome with the yeast sequence. 1. Open a new browser window or tab and go to the Arabidopsis Information Resource website at http://www.arabidopsis.org/. 2. Under the “Tools” tab, click on WU-BLAST. Paste in the yeast sequence from part A, and then perform a BlastP search against the TAIR9 Proteins (Protein) dataset. Why do you use the BlastP program? 3. Highlight the text and graphics from “Summary of BLAST Results” through the first sequence alignment. Copy and paste the highlighted section into your worksheet. 4. Identify which sequences you think have the same functionality as the yeast sequence. Justify your choice based on the score, the probability value and the alignments. 5. From the results of your homology search, copy and paste all of the locus names (e.g. “At1g32440”) for your gene family (the sequences you decided share the same function as the yeast sequence, approx. 8-10 loci) into your worksheet. C. Predict the subcellular localization of the enzymes encoded by the gene family. 1. Open a new window and again go to the Arabidopsis Information Resource website at http://www.arabidopsis.org/. Under the “Search” listing, open the “Proteins” page. 2. Click on “Bulk Protein Search” (in the first paragraph near the top of the page). 3. Choose the “text” format. 4. Under “Output”, check the “Intracellular Location” box (since our goal is to find the subcellular location of each of the proteins). Make sure that all of the other output boxes are not checked, or you will get a lot of information you don’t need. 5. Under “Perform search in following subset”, enter the locus ID’s of all the genes in your family. 6. Click “Get Protein Data”. 7. The results show the predicted subcellular location of each protein, base on its sequence. Copy and paste the results into your worksheet. (You’ll analyze them later.) D. Perform a Multiple Sequence Alignment and Create a Homology Tree 1. To get the sequences of your family of proteins, go to “Search Overview” under the “Search” tab. Then click on “Sequence”. 2. Enter the locus ID’s of all the genes in your family. Under “Dataset”, choose “AGI Protein Sequences. Click “Get Sequences”. 3. Highlight and copy all the sequences, inclucding their locus names, to the clipboard. 4. Open the EBI ClustalW site at http://www.ebi.ac.uk/Tools/msa/clustalw2/, 5. Paste in your sequences and submit your sequences for analysis. 6. The results show the alignment of the sequences of all of the proteins in your family. Each protein sequence is shown, one above the other. (When the sequences get to the right side of the page, they continue below the set.) Amino acids are denoted with their one-letter abbreviation. Amino acids that are missing are denoted as “-”. Examine the alignment. Do all of the sequences start at the same amino acid? Do they end at the same amino acid? Are there any obvious features associated with the subcellular localization? (Hint: Check the amino and carboxy termini.) 7. Click on “Guide Tree”, then be patient as the results come up. 8. At the bottom of the results page, there is a phylogram (family tree or homology tree). Draw the phylogram by hand (since it doesn’t allow you to copy it). 9. Indicate the predicted subcellular localization next to each locus name on the cladogram. Compare the predicted subcellular localization to the groupings shown on the phylogram. Is there any relationship between the phylogram and the subcellular localization? Explain. Conclusion Answer the questions in the worksheet.