Docstoc

09

Document Sample
09 Powered By Docstoc
					Understanding Sequence,
 Structure and Function
 Relationships and the
 Resulting Redundancy

   Pharm 201/Bioinformatics I
       Philip E. Bourne
 Department of Pharmacology, UCSD


           Pharm 201 Lecture 09, 2010   1
                    Agenda

• Understand the relationship between sequence,
  structure and function. Consider specifically:
  – sequence-structure
  – structure-structure
  – structure-function
• Take home message: a non-redundant set of
  sequences is different than a non-redundant set
  of structures is different than a non-redundant
  set of functions


                   Pharm 201 Lecture 09, 2010       2
              Why Bother?
• Biology:
  – A full understanding of a molecular system
    comes from careful examination of the
    sequence-structure-function triad
  – Each triad is then a component in a biological
    process
• Method:
  – Bioinformatics studies invariably start from a
    non-redundant set of data to achieve
    appropriate statistical significance
                  Pharm 201 Lecture 09, 2010         3
       Background – RMSD Defined
                                                         Represents the overall distance
Protein A           d                                    between two proteins usually
                 d1 1                                    averaged over their Calpha
  a1                        b1                           atoms denoted here a and b
                                                                            i=N
                      d2
       a2                        b2                      RMSD = Sqrt (1/N   Σ | d| )
                                                                                  i
                                                                                      2

                                                                            i=1
 a3              d3         b3
                                                     Thus RMSD is the square root of the
                                                     sum of the squares of the distances
                      d4                             between all Calpha atoms
                                 b4
       a4
                                                      Rule of thumb:
                            Protein B
                                                      1-2 Å RMSD the proteins are close
            aN                                        <6 Å RMSD they are likely related
                           bN
                                                     Note: Assumes you know residues
                                                     correspondences
                                        Pharm 201 Lecture 09, 2010                         4
     Some Useful Observations
• Below 30% protein sequence identity detection of a
  homologous relationship is not guaranteed by sequence
  alone
• Structure is much more conserved than sequence
• Distinguishing between divergent versus convergent
  evolution is an issue
• Structure is limited relative to sequence or the order
  1:100 – 1:10000 (depending on how you count)
• Structure follows a power law with respect to function –
  each structural template has from 1 to n functions



                     Pharm 201 Lecture 09, 2010              5
 Relationship Between
Sequence and Structure




      Pharm 201 Lecture 09, 2010   6
The classic hssp curve from Sander and Schneider (1991) Proteins 9:56-68
                           Pharm 201 Lecture 09, 2010                      7
This Analysis was Updated by
        Rost in 1999
  http://peds.oupjournals.org/cgi/con
            tent/full/12/2/85



             Pharm 201 Lecture 09, 2010   8
 Sequence vs Structure – Another
          Perspective
     Random 1000 structurally similar PDB polypeptide chains from
     CE with z > 4.5 (% sequence identity vs alignment length)




% Seq. Id.



                                            Twilight Zone
                                           Midnight Zone

                              Alignment Length
                          Pharm 201 Lecture 09, 2010                9
There Are No Absolute Rules - Similar Sequences
             – Different Structures
           1PIV:1                      1HMP:A
    Viral Capsid Protein           Glycosyltransferase




                                                           10
 80 Residue Stretch (Yellow) with Over 40% Sequence Identity
Given This Complex Relationship
    a Non-redundant Set of
  Sequences Does not Imply a
Non-redundant Set of Structures




          Pharm 201 Lecture 09, 2010   11
Structure vs Structure




      Pharm 201 Lecture 09, 2010   12
  Structure Is Highly Redundant


                   The Russian Doll Effect
Homology
modeling
is used here




               Pharm 201 Lecture 09, 2010                     13
                            Structure Alignments using CE with z>4.0
We will be revisiting this in the next
         couple of lectures
• Specifically:
  – How do we capture this redundancy?
  – What systems are commonly used to express
    this redundancy and what do they bring to our
    understanding of biology?
• For now consider what this means using
  the most popular structure classification
  scheme - SCOP

                  Pharm 201 Lecture 09, 2010    14
Nature’s Reductionism
    There are ~ 20300 possible proteins
    >>>> all the atoms in the Universe



    11.2M protein sequences from
    10,854 species (source RefSeq)



   38,221 protein structures
   yield 1195 domain folds (SCOP 1.75)
       Pharm 201 Lecture 09, 2010         15
The SCOP Hierarchy v1.75
Based on 38221 Structures
                             7


                                          This is remarkable!
                                 1195     Explains the one fold
                                          many functions


                                        1962



                                           3902



                                                  110800
       Pharm 201 Lecture 09, 2010                           16
Specific Examples
 From the SCOP
    Hierarchy




                17
         Protein Domains

• Definition
  – Compact,
    spatially distinct
  – Fold in isolation
  – Recurrence




                 Pharm 201 Lecture 09, 2010   18
Structure vs Function




      Pharm 201 Lecture 09, 2010   19
   Some Basic Rules Governing
Structure-Function Relationships …
• The golden rule is there are no golden
  rules – George Bernard Shaw
• Above 40% sequence identity sequences
  tend to have the same structure and
  function – But there are exceptions
• Structure and function tend to diverge at
  the same level of sequence identity


                Pharm 201 Lecture 09, 2010    20
  Structure vs Function


This is even more complicated than the
 relationship between sequence and
 structure and not as well understood




            Pharm 201 Lecture 09, 2010   21
   Complication Comes from One
    Structure Multiple Functions

• We saw this from GO already
• phosphoglucose isomerase acts as a
  neuroleukin, cytokine and a differentiation
  mediator as a monomer in the extracellular
  space and as a dimer in the cell involved
  in glucose metabolism


                Pharm 201 Lecture 09, 2010   22
 Consider an Example Relative to
             SCOP
• lysozyme and alpha-lactalbumin:
  – Same class alpha+beta
  – Same superfamily – lysozyme-like
  – Same family C-type lysozyme
  – Same fold – lysozyme-like
  – different function at 40% sequence identity
    • Lysozyme – hydrolase EC 3.2.1.17
    • Alpha lactalbumin – Ca binding lactose
      biosynthesis
                  Pharm 201 Lecture 09, 2010      23
               More Details…

Lysozyme is an O-glycosyl hydrolase, but -lactalbumin
does not have this catalytic activity. Instead it regulates
the substrate specificity of galactosyl transferase through
its sugar binding site, which is common to both -
lactalbumin and lysozyme. Both the sugar binding site and
catalytic residues have been retained by lysozyme during
evolution, but in -lactalbumin, the catalytic residues have
changed and it is no longer an enzyme.




                     Pharm 201 Lecture 09, 2010           24
Why is It Not so Well Understood?
1. Function is often ill-defined e.g.,
   biochemical, biological, phenotypical
2. The PDB is biased – it does not have a
   balanced repertoire of functions and
   those functions are ill-defined
3. There are a number of functional
   classifications eg EC, GO that have
   differing coverage and depth
               Pharm 201 Lecture 09, 2010   25
                          Point 2 PDB Bias
                       PDB vs Human Genome
        EC – Hydrolases – Begins to Illustrate the Bias in the PDB

                                                                      2.5 Transferring alkyl or aryl groups
                                                                      over represented in PDB
          PDB
                                                                      2.4 Glycosyltransferases
                                                                      under represented in PDB




        Ensembl
        Human
        Genome
       Annotation
                                                Pharm 201 Lecture 09, 2010                                    26
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31    http://sg.rcsb.org
          Structure vs Function Follows a
              Power Law Distribution


                                                     • Some folds are
                                                       promiscuous and
                                                       adopt many different
                                                       functions - superfolds




                                                                            27
Qian J, Luscombe NM, Gerstein M. JMB 2001313(4):673-81
Examples of Superfolds..
                               1TIM




       Pharm 201 Lecture 09, 2010     28
Examples of Superfolds



                                    3ADK




                                   1FXI




      Pharm 201 Lecture 09, 2010           29
   Specific Examples of the
Relationship Between Structure
         and Function




          Pharm 201 Lecture 09, 2010   30
Same Structure and Function Low
      Sequence Identity




The globin fold is resilient to amino acid changes. V. stercoraria (bacterial)
hemoglobin (left) and P. marinus (eukaryotic) hemoglobin (right) share just
8% sequence identity, but their overall fold and function is identical.

                           Pharm 201 Lecture 09, 2010                            31
Same Structure Different Function - Alpha/beta
proteins characterized as different superfamilies




      1ymv                   1fla              1pdo




                  Pharm 201 Lecture 09, 2010          32
       Example – Same Structure Different
                   Function




     1ymv                      1fla                      1pdo
      CheY                 Flavodoxin
                                                   Mannose Transporter
Signal Transduction     Electron Transport


              Less than 15% sequence identity
                      Pharm 201 Lecture 09, 2010                     33
           Convergent Evolution




Subtilisin and chymotrypsin are both serine endopeptidases. They share no
sequence identity, and their folds are unrelated. However, they have an
identical, three-dimensionally conserved Ser-His-Asp catalytic triad, which
catalyses peptide bond hydrolysis. These two enzymes are a classic
example of convergent evolution.
                          Pharm 201 Lecture 09, 2010                          34
           150                                                            200
Ilk____PSS   ..........     .......... ........CC ....CEEEHH       HHCCCCCCEE         Example: Same Fold
Ilk____Seq   ..........     .......... ........FK ....QLNFLT       KLNENHSGEL
------------                                   -+     +L-+++       KL-+---GE-
                                                                                      but Not Function
1fmk--_Seq   KHADGLCHRL     TTVCPTSKPQ TQGLAKDAWE IPRESLRLEV       KLGQGCFGEV
1fmk--_SS    HCCCCCCCCC     CEECCCCCCC CCCCCCCCCE CCHHHEEEEE       EEEECCCEEE
                                                                     * * *
                                                                                      •“Integrin-linked kinase” (Ilk)
              200                                                               250
Ilk____PSS   EEEECCCCE.     EEEEEEECCC   CCCCCHHHHH   HHHHHHHHHC   CCCEEEEEEE         is a novel protein kinase fold
Ilk____Seq   WKGRWQGND.     IVVKVLKVRD   WSTRKSRDFN   EECPRLRIFS   HPNVLPVLGA         with strong sequence similarity
------------ W+G+W-G+-      +-+K+LK-      +T+++-+F-   +E---++-++   H++++-++++
1fmk--_Seq   WMGTWNGTTR     VAIKTLKP..   .GTMSPEAFL   QEAQVMKKLR   HEKLVQLYAV         to known structures (Hannigan
1fmk--_SS    EEEEECCCEE     EEEEEECC..   .CCCCHHHHH   HHHHHHHHCC   CCCECCEEEE
                               *                       *
                                                                                      et al. 1996 Nature 379, 91-96)
                250                                                           300     •Aligns to Src kinases with
Ilk____PSS     EECCCCEEEE   EEHHHHCCCC   HHHHHHCCCC   CCCCHHHHHH   HHHHHHHHHH
Ilk____Seq     CQSPPAPHPT   LITHWMPYGS   LYNVLHEGTN   FVVDQSQAVK   FALDMARGMA
                                                                                      BLAST e-value of 10-19 and
------------   ++++P   --   ++T--M++GS   L-++L-+-T+   --+--+Q-V+   +A+++A+GMA         27% identity (alignment shown
1fmk--_Seq     VSEEP...IY   IVTEYMSKGS   LLDFLKGETG   KYLRLPQLVD   MAAQIASGMA
1fmk--_SS      ECCCC...EE   EEEECCCCCE   HHHHHCCCCC   CCCCHHHHHH   HHHHHHHHHH         is to a known Src kinase
                                                                                      structure)
                300                                                           350
Ilk____PSS     HHHCCCCCEE   CCCCCCCCEE   ECCCCEEEEC   CCCCEEECCC   CCCCCCCCCC         •Several key residues are
Ilk____Seq     FLHTLEPLIP   RHALNSRSVM   IDEDMTARIS   MADVKFSFQC   PGRMYAPAWV         conserved, but residues
------------   ++++--- -    ---L-+++++   ++E+-+++++   ---+--           +---W-
1fmk--_Seq     YVERMNY..V   HRDLRAANIL   VGENLVCKVA   DFGLAR....   ....FPIKWT         important to catalysis, including
1fmk--_SS      HHHHHCC..C   CCCCCHHHEE   EECCCEEEEC   CCCCCC....   ....CCHHHC
                              *    *                  *
                                                                                      catalytic Asp, are missing
                             Cat. Loop
                350                                                             400   •Recent experimental evidence
Ilk____PSS     HHHHHHCCCC   CCCCEEEEEE   EEHHHHHHHH   H.CCCCCCCC   CHHHHHHHHH
Ilk____Seq     APEALQKKPE   DTNRRSADMW   SFAVLLWELV   T.REVPFADL   SNMEIGMKVA
                                                                                      suggests that Ilk lacks kinase
------------   APEA++++-      ---++D+W   SF++LL+EL+   T -+VP+-++   +N-E+-++V          activity (Lynch et al. 1999
1fmk--_Seq     APEAALYGR.   ..FTIKSDVW   SFGILLTELT   TKGRVPYPGM   VNREVLDQV.
1fmk--_SS      CHHHHHHCC.   ..CCHHHHHH   HHHHHHHHHH   CCCCCCCCCC   CHHHHHHHH.         Oncogene 18, 8024-8032)
               ***
                                                                                                                 35
 Non-Redundant Sets: Sequences
• NR dataset (NCBI) - All non-redundant
  GenBank CDS translations+RefSeq
  Proteins+PDB+SwissProt+PIR+PRF
• Refseq (NCBI) – Annotated
• CDhit http://bioinformatics.org/cd-hit/ -
  popular algorithm for fast clustering of
  sequences


                 Pharm 201 Lecture 09, 2010   36
       Non-Redundant Sets:
      Sequences with Structure
• PDBselect - http://bioinfo.tg.fh-
  giessen.de/pdbselect/
• Astral http://astral.berkeley.edu/
• Pisces
  http://dunbrack.fccc.edu/Guoli/PISCES_O
  ptionPage.php
• RCSB PDB queries
• RCSB Sequence Similaity
               Pharm 201 Lecture 09, 2010   37
Pharm 201 Lecture 09, 2010   38

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:133
posted:5/10/2011
language:English
pages:38