Protein Structure and Proteomics

         BIOL/CHEM 4900
   Part I
       Introduction
       Primary Structure
       Secondary Structure
   Part II
       Tertiary Structure
       Quaternary Structure
   Reading
       Krane and Raymer: Chapters 7 and 8
                           Some useful links
   Protein Data Bank (
        A worldwide archive of 3-D structural data of biological macromolecules. The PDB collects,
         validates, and distributes as widely as possible the experimental models of proteins. The
         management of the PDB is the responsibility of the Research Collaboratory for Structural
         Bioinformatics (RCSB). The vision of the RCSB is to create a resource based on the most modern
         technology that facilitates the use and analysis of structural data and thus creates an enabling
         resource for biological research.
        A protein sequence database which strives to provide a high level of annotations (such as the
         description of the function of a protein, its domains structure, post-translational modifications,
         variants, etc.), a minimal level of redundancy and high level of integration with other databases.
   Protein Explorer (
        Protein Explorer is free software for visualizing the three-dimensional structures of protein, DNA,
         and RNA macromolecules, and their interactions and binding of ligands, inhibitors, and drugs.
   Cn3D (
        Cn3D is a helper application for your web browser that allows you to view 3-dimensional
         structures from NCBI's Entrez retrieval service. Cn3D displays structure, sequence, and alignment,
         and has powerful annotation and alignment editing features.
   Graves, P.R. and Haystead, T.A., “Molecular Biologist’s Guide to Proteomics,” Microbiol.
    Mol. Biol. Rev. 66, 39-63 (2002)
        Introduction to Proteomics
   Proteome
       Sum total of an organism’s proteins
       Essential to understanding how organisms work
       Characterization difficult
            Protein structure is complicated (20 AAs)
            Chemical modification can occur
            Proteins must be isolated and purified
            Proteins must be crystallized
            Structure predictions (bioinformatics) are difficult and often
   Structure reveals function
       Ex: transmembrane proteins
              Let’s Review…
   Protein separation and purification
   Protein structure
    Protein Separation/Purification
   In general, proteins contain > 40 residues
       Minimum needed to fold into tertiary structure
   Usually 100-1000 residues; percent of each AA varies
   Proteins separated based on differences in size and
   Proteins must be pure to analyze, determine
   Factors to control (to avoid denaturation or chemical
       pH
       Presence of enzymes
       Temperature
       Reactive thiol groups
       Exposure to air, water
Methods of Separation/Purification

   Solubility (salts, solvents, pH, temperature)
   Chromatography
       Ion exchange
       Gel filtration
       Affinity
   Electrophoresis
        Protein Structure Review
   Amino Acids
       Chemical structure, alpha carbon, side chain
       Amphoteric, pK1, pK2, pKR, pI, zwitterion
       Stereochemistry
       Hydrophobic, polar, charged
       3-letter and 1-letter abbreviations
   Peptide Bonds, residues, N-terminus, C-terminus
   Disulfide bridges
   Conformation, native structure, denatured protein
   Primary, secondary, tertiary, quaternary structures
   Supersecondary structure, motifs
                         Amino Acids

     pK1 ~ 2.2
(protonated below 2.2)

     pK2 ~ 9.4
  (NH3+ below 9.4)

       (when applicable)
                        Note 1-letter
                        Glu/E vs. Gln/Q
                        Asp/D vs. Asn/N
                         (converted via
                        Glx/Z or Asx/B

              Basic    X = undetermined
                       or nonstandard AA

            The Peptide Bond
   Peptide bonds connect AA residues (backbone)
   Polypeptide chains (note N- and C-termini)

              Rigid; restricted rotation
             Disulfide Bonds
   Formed from oxidation of cysteine residues
                       Protein Folding
   Folded shape = conformation
   Three-dimensional, functional
    structure = native
        Energy of native conformation?
   Molecular chaperones
   There are thousands of possible
    conformations, but not an infinite
   Conformations are restrained by
        planarity of peptide bond
        “allowed” angles
   No algorithm predicts the 3D
    shape with high accuracy
           Levels of Protein Structure
   Primary
        Linear AA sequence
        Covalent bonds
   Secondary
        Local structure; certain
         “motifs” are common
        Mostly H-bonds
   Tertiary
        Complete 3D shape
        H-bonds, hydrophobic
         interactions, ionic bonds,
         van der Waals interactions,
         disulfide bonds
   Quaternary
        >1 peptide chain
        Mostly H-bonds
             Primary Structure
   AA sequence of polypeptide chain(s)
   Linked by peptide bonds
   Linear sequence
   Predication of primary structure?
   Experimental determination: protein sequencing
                 Protein Sequencing
   Determination of primary structure
   Approach:
       Denature protein
       Break protein into small segments
       Determine sequences of segments
   End group Analysis
       Number and ID of terminal AAs
       Dansyl and dabsyl chloride
       Sanger method (FDNB)
       Edman degradation
   Cleave disulfide bonds
       Reduce with mercaptans
   Hydrolysis
   Endopeptidases and/or chemical cleavage
                  Secondary Structure
   Regular repeating structure
       Helices
       Sheets
   Torsion/dihedral angles
       Angles of rotation around Ca
       Clockwise (+) and counterclockwise (-)
       F = rotation around Ca-N
       Y = rotation around Ca-C
   How free is rotation?
       Not very (sterics)
       Avoid collision of C=O, N-H, R
       Calculations of allowed values = Ramachandran diagram
Ramachandran Diagram

             Not Pro
                 Limited range due to
                  cyclic structure
             Not Gly
                 Increased range due
                  to lack of steric strain
         Classification of Proteins by
            Secondary Structure
   Fibrous                                 Globular
       High composition of                     Majority of all proteins
        single secondary                        Contain several types of
        structure                                secondary structure
       Strong and flexible                      (regular and non-regular)
       Collagen                                Percentage of protein (on
            Triple helix (left-handed           average):
             helices, right-handed                   31% a-helix
             super helix)                            28% b-sheet
       Silk fibroin                                 13% turns/bends
            Anti-parallel b-sheet                   28% loops and random coil
       a-Keratin
            Left-handed coiled coil
Alpha Helix

   F = -57°, Y = -47°
   Discovered by Pauling: 1951
   Tightly wound, repeating
   “Right-handed”
   Each twist  5.4 Å; 3.6 residues
   Average length = 18 residues
   R-groups are on outside of helix
   Stabilized by H-bonds between
    C=O (i) and N-H (i + 4)
Figure courtesy of Dr. Loren Williams, Department of Chemistry and Biochemistry, Georgia Institute of Technology
                  Alpha helix, cont.
   Deviates from ideal conformation at ends
    (less H-bonding)
   Some amino acids are “α-helix breakers”
       Repeating like-charges
       Repeating “bulky” groups
       Pro and Gly
   Effects on helical stability:
       Electrostatic interactions between adjacent
       Steric interference between adjacent residues
       Interactions between residues 3-4 amino
        acids away
       Polarity of residues at both ends of helix
        (positive at amino end; negative at carboxyl)
                       Beta Sheet
   Pauling and Corey: 1951
   Extended, zigzag conformation
   Intrastrand H-bonding
   Average 6 residues/strand; up to 15
   2-12 strands/sheet; average 6
   R-groups alternate on opposite sides of sheet
   Distortions:
       Beta-bulge = extra residue
       Kink = Pro
           Anti-parallel vs. Parallel
   Anti-parallel b-sheet
       Opposite orientation
       F = -140°, Y = 135°
       More stable
       Can be twisted
       Can withstand distortions and
        exposure to solvent
   Parallel b-sheet
       Same amino-carboxyl direction
       Less twisted
       Tend to be buried
       F = -120°, Y = 115°
   Can have mix of parallel and
   Interacting strands can be
    many amino acids apart
   Turns are 180°; “connect”
    strands in folded (globular)
   Interaction is between carbonyl
    oxygen of AA 1 and amino
    hydrogen of AA 4
   Pro and Gly are often present
       Gly: small and flexible
        (Type II turns)
       Pro: Cis conformation makes
        inclusion in tight turn favorable
Ramachandran Plot for
 Secondary Structure
    Secondary Structure Prediction
   Chou-Fasman
       Conformational Parameters
            P (a), P (b), P (turn)
            Probability of an AA participating in
             various secondary structures
            Based on observed frequencies in
             known structures
            Uses a window of 6 residues
            50-60% accuracy
       CHOFAS in Biology Workbench
                    Let’s try it…
   Consider the following sequence:


   Using the Chou-Fasman algorithm and the
    parameters in Table 7.1, predict the regions
    of beta strands in this sequence.
       Excel file on course webpage
    Secondary Structure Prediction
   GOR method
       Garnier, Osguthorpe, and Robson
       Uses a window of 17 residues
            1 central, 8N and 8C terminus
       Based on probability of each window being located within a b
        sheet or a helix
            Probability values determined from known proteins
       65-75% accuracy
       GOR4 in Workbench

   PELE in Workbench
       Many different algorithms
   H = a helix
   E = b strand
   T = b turn
   C = random coil
   Algorithms listed to the right (initials indicate authors)
   JOI = Joint prediction
        Assigns the structure using a "winner takes all" procedure
         (using the other methods)
Secondary Structure Prediction in
      Biology Workbench

Shared By: