Analysis of Protein Geometry_ Particularly Related to Packing at by linzhengnd

VIEWS: 3 PAGES: 103

									      Permissions Statement




                                                                               1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
 This Presentation is copyright Mark Gerstein,
    Yale University, 2002. Feel free to use
images in it with PROPER acknowledgement.




                                                 Do not reproduce without permission
   Computational Proteomics:
 Genome-scale studies of protein
function, structure, and evolution
                      Mark B Gerstein




                                                                                     2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                          Yale U



     H Hegyi, J Lin, J Qian, N Luscombe, T Johnson,
            A Drawid, R Jansen, V Alexandrov,
M Snyder, A Kumar, H Zhu, D Greenbaum, N Lan, P Harrison,
    N Echols, S Balasubramanian, P Bertone, Z Zhang,
       R Das, Y Liu, Y Kluger, H Yu, D Greenbaum,
             P Miller, K Cheung, S Weissman



                   Talk at Harvard
                      02.02.11
                                                       Do not reproduce without permission
                             Understand Proteins,
                             through analyzing populations
                             Structures   (motions, packing, folds)




                                                                                      3 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                             Functions
                             Evolution

Integration of Information




    Structures           Sequences          Microarrays Do not reproduce without permission
                                1st

                 Google
                          Pub- Pub-    The central post-
                          Med Med
                  Hits
                          Hits  Hit     genomic term
                               Year
Genome          ~1880000 66171 1932
Proteome        ~63,000   703   1995
Transcriptome    3520     72    1997
Physiome          2980    15    1997
Metabolome        349     12    1998




                                                                                    4 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Phenome           4980     6    1995
Morphome          238      2    1996
Interactome        56      2    1999
Glycome            46      1    2000
Secretome          21      1    2000
Ribonome           1       1    2000
Orfeome            42      -     -
Regulome           18      -     -
Cellome            17      -     -
Operome            8       -     -
Transportome       1       -     -
Functome           1       -     -                    Do not reproduce without permission
                                1st

                 Google
                          Pub- Pub-    The central post-
                          Med Med
                  Hits
                          Hits  Hit     genomic term
                               Year
Genome          ~1880000 66171 1932
Proteome        ~63,000   703   1995
Transcriptome    3520     72    1997
Physiome          2980    15    1997
Metabolome        349     12    1998




                                                                                    5 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Phenome           4980     6    1995
Morphome          238      2    1996
Interactome        56      2    1999
Glycome            46      1    2000
Secretome          21      1    2000
Ribonome           1       1    2000
Orfeome            42      -     -
Regulome           18      -     -
Cellome            17      -     -
Operome            8       -     -
Transportome       1       -     -
Functome           1       -     -                    Do not reproduce without permission
                                1st

                 Google
                          Pub- Pub-            The central post-
                          Med Med
                  Hits
                          Hits  Hit             genomic term
                               Year
Genome          ~1880000 66171 1932
Proteome        ~63,000   703   1995
Transcriptome    3520     72    1997
Physiome          2980    15    1997
Metabolome        349     12    1998




                                                                                              6 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Phenome           4980     6    1995
                                                     Proteome
Morphome          238      2    1996
                                 PubMed Hits
Interactome        56      2    1999
Glycome            46      1    2000
Secretome          21      1    2000
Ribonome           1       1    2000
Orfeome            42      -     -
Regulome           18      -     -
Cellome            17      -     -
Operome            8       -     -
Transportome       1       -     -
Functome           1       -     -                              Do not reproduce without permission
          Proteins are central to
   the 2 major post-genomic challenges

1. Understanding
genes in detail




                                                                                       7 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Predicting protein
function on a
genomic scale

2. Understanding                   yy y                           y
what’s between
genes                       (Initial Step: genome sequence & genes)



Analyzing protein fossils
                                                         Do not reproduce without permission
          Proteins are central to
   the 2 major post-genomic challenges




                                                                 8 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Predicting protein
function on a
genomic scale




                                   Do not reproduce without permission
               How to predict function
               for 1000s of proteins?
                                                .……           ~650




                                                                                9 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
• 250 of 650 known on chr. 22 [Dunham et al.]

• >>30K+ Proteins in Entire Human Genome
 (alt. splicing)



                                                  Do not reproduce without permission
         How to predict functions
          for 1000s of proteins?
1) "Traditional" sequence patterns
2) Via fold similarity (structural genomics)
3) Clustering a microarray experiment




                                                                     10 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
4) Data integration 




                                       Do not reproduce without permission
             How to predict functions
              for 1000s of proteins?
1) "Traditional" sequence patterns
2) Via fold similarity (structural genomics)
3) Clustering a microarray experiment




                                                                                                       11 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
4) Data integration 
     Compare uncharacterized genome sequences
     against known sequences in DBs, transferring
     func. annotation for similar sequences

     Issue: Threshold is major parameter & limitation

     Also, look for motifs & sites [Sternberg, Thornton, Rose, Koonin]
                                                                         Do not reproduce without permission
           1000s of structurally based alignments
  of structurally and functionally characterized sequences


      Sequence                           Function
                 (Human)                     5.3.1.1    (TP Isomerase)

90%                          Same
                             Exact




                                                                                                        12 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                 (Chick)                     5.3.1.1    (TP Isomerase)



                 (E coli)    Both            5.3.1.1    (TP Isomerase)
45%                          Class 5
                             (isom.)
                 (E coli)                    5.3.1.24   (PRA Isomerase)



                 (B ster.)                   5.3.1.15   (Xylose Isom.)
20%                          Different
                             Classes
                 (E coli)                    4.1.3.3    (Aldolase)



                 (Yeast)                     4.2.1.11   (Enolase)

                                                                          Do not reproduce without permission
  Relationship of Similarity in Sequence to
              that in Function




                                                                                 % Same Function
                                                                     100




                                                                                           13 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                                     90
                    Percentage of pairs that have                    80
                    same precise function as                         70
                    defined by Enzyme & FlyBase                      60
                    functional classifications                       50
                                                                     40
                                                                     30
         Sequence similarity of pairs of proteins
                                                                     20
                                                                     10
                                                                     0
%ID 70      60     50      40     30      20        10        0
                                                         Do not reproduce without permission
  Relationship of Similarity in Sequence to
              that in Function




                                                               % Same Function
                                                   100




                                                                         14 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                   90
                                                   80
                                                   70
                                                   60
                                                   50
                                                   40
                                                   30
                                                   20
                                                   10
                                                   0
%ID 70   60   50   40   30   20   10        0
                                       Do not reproduce without permission
  Relationship of Similarity in Sequence to
              that in Function




                                                                           % Same Function
           Can transfer both
         Fold & Functional
              Annotation
                                                               100




                                                                                     15 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                               90
                                                               80
                                                               70
                                                               60
                                                               50
                                                               40
                                                               30
                                                               20
                                                               10
                                                               0
%ID 70       60       50       40   30   20   10        0
                                                   Do not reproduce without permission
  Relationship of Similarity in Sequence to
              that in Function




                                                                                           % Same Function
                                  Can transfer      Can not transfer
           Can transfer both    Annotation related Fold or Functional
         Fold & Functional       Fold but not          Annotation
              Annotation           Function         ("Twilight Zone")
                                                                               100




                                                                                                     16 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                                               90
                                                                               80
                                                                               70
                                                                               60
                                                                               50
                                                                               40
                                                                               30
                                                                               20
                                                                               10
                                                                               0
%ID 70       60       50       40      30        20        10           0
                                                                   Do not reproduce without permission
  Relationship of Similarity in Sequence to
              that in Function




                                                                                           % Same Function
                                  Can transfer      Can not transfer
           Can transfer both    Annotation related Fold or Functional
         Fold & Functional       Fold but not          Annotation
              Annotation           Function         ("Twilight Zone")
                                                                               100




                                                                                                     17 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                                               90
                                                                               80
                                                                               70
              Broad                                                            60
                v                                                              50
              Narrow                                                           40
                                                                               30
             Similarity
                                                                               20
                                                                               10
                                                                               0
%ID 70       60       50       40      30        20        10           0
                                                                   Do not reproduce without permission
    Caveats: Sequence Divergence of Multidomain
    Proteins , Implies a Practical Theshold is >40%
Single Domain Sequences         Multidomain Sequences
            (Human)



            (Chick)




                                                                              18 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
            (E coli)



            (E coli)



            (B ster.)



            (E coli)



            (Yeast)



           (Rat)
                                                Do not reproduce without permission
 Multi-domain proteins have greater
divergence in function with sequence
                                                 100%




                                                                   % Same Function
                                                 90%
                                                 80%
                                                 70%




                                                                                     19 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
        Single-
                                                 60%
        domain
                                                 50%
        Multi-
                                                 40%
        domain
                                                 30%
                                                 20%
                                                 10%
                                                 0%
50      40        30      20       10        0

(Very Close) Sequence Similarity [ -log(e-value) ]
                                                       Do not reproduce without permission
                How to predict functions
                 for 1000s of proteins?
1) "Traditional" sequence patterns
2) Via fold similarity (structural genomics)
3) Clustering a microarray experiment




                                                                                                20 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
4) Data integration 
     Structures of ORFs with unknown function,
     Use Fold & Site Similarity to Determine Function

     Rationale for Structure Prediction
     Issue:
     To what degree does fold determine function?

     [Kim, Edwards & Arrowsmith, Montelione, Burley, Eisenberg]   Do not reproduce without permission
   Fold Function
   Combinations


Many Functions on




                                                       21 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Same Fold (TIM-barrel)




Different Folds with
Same Function
(Carbonic
Anhydrases, 4.2.1.1)
                         Do not reproduce without permission
 Global View of Fold-
Function Combinations    229 Folds


Non-Enz




                                                                   22 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
91 Enzymatic Functions




                                     Do not reproduce without permission
                          Correlation with
                                                           Architectural
                         Structural Features   229 Folds      Class
                               all-a   all-b                       small

Non-Enz




                                                                                          23 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
91 Enzymatic Functions




                                                            Do not reproduce without permission
                          Correlation with        Slight Overpopulation
                                                              Architectural
                         Structural Features   229 Folds         Class
Enzyme Class                   all-a   all-b                           small

Non-Enz




                                                                                              24 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
91 Enzymatic Functions




                                                                Do not reproduce without permission
 Global View of Fold-
Function Combinations           229 Folds

                         Sort
Non-Enz




                                                                          25 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
91 Enzymatic Functions




                                            Do not reproduce without permission
                                      To what degree is fold associated with
                                     function? Folds with multiple functions
Frequency in database of 229 folds




                                                                                                                  26 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                                                  [Similar results
                                     Number of functions associated with a fold    by Thornton]
                                                                                    Do not reproduce without permission
             How to predict functions
              for 1000s of proteins?
1)   "Traditional" sequence patterns
2)   Via fold similarity (structural genomics)
3)   Clustering a microarray experiment




                                                                               27 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
4)   Data integration 




                                                 Do not reproduce without permission
             Microarray experiments


Expression
  Arrays
   [Brown]




                                                                    28 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Proteome
  Chips
 [Snyder]



                                      Do not reproduce without permission
                                                                          [Brown, Davis]
Clustering
    the                                       4




              mRNA expression level (ratio)
 yeast cell                                   3
                                                       RPL19B

  cycle to                                    2
                                                       TFIIIC



  uncover                                     1




                                                                                                                  29 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
interacting                                   0


  proteins                                    -1


                                              -2
                                                   0            4   8       12                 16




                                                                                 Time->

                                                         Microarray timecourse of
                                                           1 ribosomal protein
                                                                                    Do not reproduce without permission
Clustering
    the                                       4




              mRNA expression level (ratio)
 yeast cell                                   3
                                                        RPL19B

  cycle to                                    2
                                                        TFIIIC



  uncover                                     1




                                                                                                                  30 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
interacting                                   0


  proteins                                    -1


                                              -2
                                                   0             4   8      12                 16




                                                                                 Time->
                                                       Random relationship from ~18M

                                                                                    Do not reproduce without permission
                                                                        [Botstein; Church, Vidal]
Clustering
    the                                       4




              mRNA expression level (ratio)
 yeast cell                                   3
                                                       RPL19B

  cycle to                                    2
                                                       RPS6B



  uncover                                     1




                                                                                                                     31 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
interacting                                   0


  proteins                                    -1


                                              -2
                                                   0            4   8          12                 16




                                                                              Time->
                                                      Close relationship from 18M
                                                   (2 Interacting Ribosomal Proteins)
                                                                                       Do not reproduce without permission
Clustering
    the                                       4




              mRNA expression level (ratio)
                                                          RPL19B

 yeast cell                                   3           RPS6B
                                                          RPP1A
  cycle to                                    2           RPL15A
                                                          ?????
  uncover                                     1




                                                                                                                  32 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
interacting                                   0


  proteins                                    -1

                                              -2
                                                   0               4   8       12             16




                                                                                  Time->
                                                       Predict Functional Interaction of
                                                        Unknown Member of Cluster
                                                                                    Do not reproduce without permission
Simultaneous         Local
  Traditional    Clustering
   Global
  Correlation     algorithm
                  identifies
                    further
                (reasonable)




                                                     33 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
   Time-
  Shifted          types of
                 expression
                   relation-
                     ships

 Inverted
                         [Church]
                       Do not reproduce without permission
   Examples                               3
                                                   C
                                          2
    inverted                              1
 relationships                            0

                                          -1
Documented
                                                           YME1
                                          -2
                                                           YNT20
YME1 : mito. protease                     -3




                                                                                                               34 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                            Expr. Ratio
        involved in cplx.                      0       4           8           12                        16
        assembly
YNT20 : known                                                Time
        surpressor of
        YME1                              3
                                                                       PUT2
                                          2
                                                                       SER3

                                          1
Suggestive
                                          0

PUT2 : involved in Pro                    -1
       degradation
                                          -2
SER3 : involved in Ser
       synthesis                          -3
                                               0       4           8                                   16
                                                                              12 Do not reproduce without permission
                                          4
    Examples                              3
  time-shifted                            2
                                                       ARC35
                                                       ARP3
 relationships                            1

Suggestive                                0

                                          -1
ARP3 : in actin




                                                                                                          35 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                            Expr. Ratio
        remodelling cplx.                 -2
ARC35 : in same cplx.                          0       4       8      12                  16
        (required late in                                      Time
        cell cycle)                            2

                                               1

                                               0
Predicted
                                           -1                         J0544
J0544 : unknown                            -2
                                                                      ATP11
        function                                                      MRPL17
                                                                      MRPL19
MRPL19: mito.ribosome                      -3
                                                                      YDR116C
                                           -4
                                                   0   4       8       12                    16
                                                                            Do not reproduce without permission
                                          4
    Examples                              3
  time-shifted                            2
                                                       ARC35
                                                       ARP3
 relationships                            1

Suggestive                                0

                                          -1
ARP3 : in actin




                                                                                                          36 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                            Expr. Ratio
        remodelling cplx.                 -2
ARC35 : in same cplx.                          0       4       8      12                  16
        (required late in                                      Time
        cell cycle)                            2

                                               1

                                               0
Predicted
                                           -1                         J0544
J0544 : unknown                            -2
                                                                      ATP11
        function                                                      MRPL17
                                                                      MRPL19
MRPL19: mito.ribosome                      -3
                                                                      YDR116C
                                           -4
                                                   0   4       8       12                    16
                                                                            Do not reproduce without permission
               CDC47

               CDC46
               CDC54


                       CDC45




                       POL32
               MCM3
               MCM6

               MCM2




                               ORC2
                               ORC6
                               ORC5
                               ORC4
                               ORC3
                               ORC1
                       CDC2
                       CDC7
                       DPB3

                       DPB2



                       HYS2
                       POL2


                       DBF4
                                               MCM3
                                               MCM6
                                               CDC47
                                               MCM2
                                               CDC46
Expression                                     CDC54
                                               DPB3
Correlations                                   CDC45




                                                               37 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                               DPB2
 Segment                                       CDC2
                                               CDC7
   Large                                       POL2
                                               HYS2
Replication                                    POL32
                                               DBF4
 Complex                                       ORC2
                                               ORC6
    into                                       ORC5
                                               ORC4
Component                                      ORC3
                                               ORC1
   Parts
                                 Do not reproduce without permission
                 CDC45




                 POL32
                 CDC47

                 CDC46
                 CDC54
                 MCM3
                 MCM6

                 MCM2




                 CDC2
                 CDC7
                 DPB3

                 DPB2


                 POL2
                 HYS2

                 DBF4
                 ORC2
                 ORC6
                 ORC5
                 ORC4
                 ORC3
                 ORC1
                                                     MCM3
                                                     MCM6
                                                     CDC47
                                                     MCM2
               MCMs                                  CDC46
Expression     prots.                                CDC54
                                                     DPB3
Correlations                                         CDC45




                                                                     38 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                     DPB2
 Segment                                             CDC2
                                                     CDC7
   Large                Polym.                       POL2
                         d&e                         HYS2
Replication                                          POL32
                                                     DBF4
 Complex                                             ORC2
                                                     ORC6
    into                                             ORC5
                                                     ORC4
Component                        ORC                 ORC3
                                                     ORC1
   Parts
                                       Do not reproduce without permission
                Range of Expression
            Correlations within Complexes
Replication Cplx     Proteasome     Ribosome
Overall .05          Overall .43    Overall .80
ORC .19, MCMs .75    20S .50        Large .80
Pol. d .45, e .75,   19S .51        Small .81




                                                                              39 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                Do not reproduce without permission
Permanent v. Transient
    Complexes




                                                       40 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                         Do not reproduce without permission
 Global Network
  of 3 Different
    Types of
 Relationships




                                                 41 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
    ~313K
  significant
relationships
 from ~18M
   possible
                   Do not reproduce without permission
  Global Network
   of 3 Different
     Types of
  Relationships
Simultaneous 188K
      Inverted 63K




                                                   42 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
       Shifted 67K




     ~313K
   significant
 relationships
  from ~18M
    possible
                     Do not reproduce without permission
  Globally, how well
    do expression
    relationships
    predict known
    interactions?




                                                                         43 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Coverage of the    Enrichment
8250 Known         Compared to
Interactions in    Randomized
Complexes          Expression
Found              Relationships

Random       ~2%          1x       CC: 313K
(313K/18M)
CC           42%         24x       relationships from
                                   ~18M possible
                                   from clustering
                                   cell-cycle expt.
                                           Do not reproduce without permission
       Combining
     Expression Data
      Sets Increases
       Coverage &
     Decreases Noise




                                                                                      44 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Coverage of the    Enrichment
8250 Known         Compared to
Interactions in    Randomized
Complexes          Expression
Found              Relationships


                                   KO: 278K
                                   relationships
KO           34%        22x        from clustering
                                   knock-out
                                   profiles [Rosetta]
                                                        Do not reproduce without permission
     Combining
   Expression Data
    Sets Increases
     Coverage &
   Decreases Noise




                                                                                              45 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Coverage of the    Enrichment
8250 Known         Compared to
Interactions in    Randomized
Complexes          Expression
Found              Relationships


                                   KO: 278K             CC: 313K
CC           42%        24x        relationships        relationships from
KO           34%        22x        from clustering      ~18M possible
                                   knock-out            from clustering
KO  CC
   v         55%       111x
                                   profiles [Rosetta]   cell-cycle expt.
KO  CC
   ^         21%       254x
                                                                Do not reproduce without permission
               How to predict function
               for 1000s of proteins?
1)   "Traditional" sequence patterns
2)   Via fold similarity (structural genomics)
3)   Clustering a microarray experiment




                                                                                      46 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
4)   Data integration 

        Obviously integration of orthogonal info. is good
        but how to achieve it?
        And what are the issues.

        An example for subcellular localization in yeast


                                                        Do not reproduce without permission
            Subcellular Localization,
       a standardized aspect of function

      Cytoplasm
                               Nucleus




                                                                         47 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                         Membrane


             ER


Extra-
cellular
[secreted]          Golgi
                            Mitochondria
                                           Do not reproduce without permission
  "Traditionally" subcellular localization is
     "predicted" by sequence patterns

      Cytoplasm                    NLS
                                Nucleus




                                                                          48 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                          Membrane
                                          TM-helix
             ER
              HDEL

Extra-
cellular
[secreted]           Golgi             Import Sig.

 Sig. Seq.                   Mitochondria
                                            Do not reproduce without permission
   Subcellular localization is associated with
         the level of gene expression
[Expression Level
  in Copies/Cell]


               Cytoplasm
                                      Nucleus




                                                                                49 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                Membrane


                     ER


Extra-
cellular
[secreted]                 Golgi
                                   Mitochondria
                                                  Do not reproduce without permission
      Combine Expression Information &
   Sequence Patterns to Predict Localization
[Expression Level
  in Copies/Cell]


               Cytoplasm                   NLS
                                        Nucleus




                                                                                  50 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                  Membrane
                                                  TM-helix
                     ER
                      HDEL

Extra-
cellular
[secreted]                   Golgi             Import Sig.

 Sig. Seq.                           Mitochondria
                                                    Do not reproduce without permission
     Issues in Combining Many Features

                 Total of 30
                 diverse                    NLS
                 features (also         Nucleus
                 including




                                                                               51 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                 essentiality,
                 coiled-coils,
                 expression                     Membrane
                 fluc., & obscure               TM-helix
                 seq. patterns)   How to
             ER                   standardize
                                  features?
              HDEL                How to
                                  weight
Extra-                            them?
cellular
[secreted]              Golgi              Import Sig.

 Sig. Seq.                       Mitochondria
                                                 Do not reproduce without permission
Bayesian System                                                             Prior
 for Localizing                                 Feature 1: NLS
    Proteins




                                            # NLS
                                                                       New
Everything expressed in                                              Estimate




                                                                                                  52 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
standardized probabilitic
terms
(Features as freq. in training set)

Sequentially apply
features to refine prior                                              Better
assumed estimate using           Feature            2: High Expr.    Estimate
Bayes Rule
(Feature x Prior / Normalization)

 Final estimate that
 naturally weights                                                    Final
 features comes out                   Feature 3: Is Essential?       Estimate
                                                                    Do not reproduce without permission
 Results on
                                        ER
Testing Data                                 Cyt.
                                   TM


• 7-fold cross-
  validation




                                                                           53 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
   training & test sets   Mito.
• Overall
  compartment
  population
                             Nuc.
  96% accuracy




                                             Do not reproduce without permission
                                              Mito.
 Extrapolation to                  Cyt.       452
                                  1570
  Compartment                                          Nuc.
                                                       975
 Populations of
  Whole Yeast                                          Sec.
                                                       Path
                                 Not
     Genome                localized
                                                       346
                               2789




                                                                        54 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
~3300 Known
Localizations
                              Cyt.
from Exisiting DB +           47%
Expt. Localizations by
Transposon Tagging &
Direct Overexpression                                       Mito.
[Snyder]                                                    13%
                           Sec.
                           Path
                           13%                       Nuc.
+ Predictions
  (from Bayesian System)                             27%
                                          Do not reproduce without permission
                How to predict function
                for 1000s of proteins?
1) "Traditional" sequence patterns
  o Limitation: tight threshold
  o Developed 40% threshold for sequence comparison




                                                                                                   55 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
2) Via fold similarity (structural genomics)
  o Limitation: multifunctionality on a fold, so weak relationship
  o Measured extent of this in current DB
3) Clustering a microarray experiment
  o Limitation: suggestive relationships but not yet predictive
  o Local clustering found ~130K relationships beyond ~180K
    simultaneous ones
4) Data integration 
  o Limitation: power is obvious but complicated to achieve
  o Increased power. Works for localization of all proteins in yeast.

                                                                     Do not reproduce without permission
          Proteins are central to
   the 2 major post-genomic challenges

1. Understanding
genes in detail




                                                                                       56 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Predicting protein
function on a
genomic scale

2. Understanding                   yy y                           y
what’s between
genes                       (Initial Step: genome sequence & genes)



Analyzing protein fossils
                                                         Do not reproduce without permission
          Proteins are central to
   the 2 major post-genomic challenges




                                                                 57 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                            yy y            y




Analyzing protein fossils
                                   Do not reproduce without permission
Pseudogenes (yG) as Disabled Homologies
 S/T Protein Phosphatase PP1 (C-term)
 …SRILCMHGGLSPHLQTLDQLRQLPRPQDPPNPSIGIDLLWADPDQWVKGWQAN
 TRGVSYVFGQDVVADVCSRLDIDLVARAHQVVQDGYEFFASKKMVTIFSAPHYC
 GQFDNSAATMKVDENMVCTFVMYKPTPKSMRRG*




                                                                             58 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                      Worm
                                      Genome


                                           Most Multiply
    I    II    III   IV    V    X           Disabled
                     Pseudogenic fragment
                                               #
                     TKRTSNGFGQDVVVDLFSILDSGLVARAHX
                     VLQDIFEFFASKKMVTIFS#APHSPHSAPH
                     YCAQFDNSAATVKV
                                               Do not reproduce without permission
    Types of yG: Duplicated & Processed

 Original Gene
                                 Duplicated yG
                                 Duplicated Gene




                                                                       59 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                      AAAAAA>
                                 Spliced mRNA

                       AAAAAA>   Processed yG
                       AAAAAA>
                                 Processed yG
                                 with disablements
[Heidmann]
                                         Do not reproduce without permission
   Large-scale Assignment of Pseudogenes
       Size Genes   Total       Pseudogenes
       (Mb) (apx)    yG      Duplicated Processed
Yeast    12 6200 221 (4%)           221         0
Worm    100 19000 2168 (11%)       1960       208




                                                                        60 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Fly     116 14000 <100 (1%)
Human*   69   930 350 (38%)         178       172
(*Chr 21+22 only)




                                          Do not reproduce without permission
                        yG Calculations
• Large Basic Calculation                                               Protein
   "blasting" ~1M fragments                     chromosome               DB
    against protein DB
   50 CPU days for worm
   Parallel computing + large DBs




                                                                                             61 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
 • Integrating
                                     Repeats 1    Sequence 1
   heterogeneous,
   dynamically                                    Sequence 2
   changing annotation               Genes A
    Changing sequences,                          Sequence 3
     gene predictions,                                          Repeats 2
     repeats                         Genes B



                                                     yG
                                                               Do not reproduce without permission
                     yG: Questions
       Size Genes   Total       Pseudogenes
       (Mb) (apx)    yG      Duplicated Processed
Yeast    12 6200 221 (4%)           221         0
Worm    100 19000 2168 (11%)       1960       208




                                                                                       62 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Fly     116 14000 <100 (1%)
Human*   69   930 350 (38%)         178       172

1)   What are they composed of? (composition)
2)   Where are they? (which organisms, chr. position)
3)   Which type of proteins are they? Why? (functions)
4)   Do processed & duplicated varieties differ?


                                                         Do not reproduce without permission
           Worm yG: Lots! Good Stats
       Size Genes   Total       Pseudogenes
       (Mb) (apx)    yG      Duplicated Processed
Yeast    12 6200 221 (4%)           221         0
Worm    100 19000 2168 (11%)       1960       208




                                                                                       63 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Fly     116 14000 <100 (1%)
Human*   69   930 350 (38%)         178       172

1)   What are they composed of? (composition)
2)   Where are they? (which organisms, chr. position)
3)   Which type of proteins are they? Why? (functions)
4)   Do processed & duplicated varieties differ?


                                                         Do not reproduce without permission
            Amino-acid composition of Pseudogenes is
                     midway between Genes
                 and translated Intergenic DNA
Frequency




                                                                                64 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Worm




                            Amino Acid (sorted)
                                                  Do not reproduce without permission
                   Amino-acid composition
        of Pseudogenes is midway between Genes and
         translated Intergenic DNA in many genomes




                                                                  Human
Worm




                                                                          65 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Yeast




                                                                  Fly
                                            Do not reproduce without permission
Midway composition also applies to codons




                                                                 66 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                   Do not reproduce without permission
              Pseudogenes
               elevated at
              ends of worm
                  chr I
  Genes
                             Pseudogene




                                                                 67 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
elevated in                  distribution
  middle                      on worm
                               chomo-
                               somes:
                              On Ends



                                   Do not reproduce without permission
         YG
          --
         G

         29%
        (max)

YG
 --
          Pseudogene




                                                 68 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
 G        distribution
 16%       on worm
(min)
            chomo-
            somes:
           On Ends
           ~50% YG in
         terminal 3Mb vs
             ~30% G
                   Do not reproduce without permission
                    Default: #yG  #genes, in a family
# genes in family




                                                                                    69 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
       28                                            RT


                      # pseudogenes in family   59
                                                      Do not reproduce without permission
                                             Completely Dead
                                             Families in Worm
                                                Genome

                                       0




                                                                                                      70 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
# worm
            organism of
 yG in
           closest match
 family                     Function

   3         E. coli        acrAB operon repressor               Extinction?
   7          Yeast         unknown function
   3       Vaccinia virus   unknown function
   3      Vaccinia virus Host range protein
   5          Human         Complementing protein
   4          Human         SEX gene
   3          Human         Initiation factor 4A
   4           Frog         Thyroid hormone receptor
   4           Fly          Multidrug resistance protein 1
   5           Cow          Polyadenylation specificity factor          Do not reproduce without permission
                                             Completely Dead
                                             Families in Worm
                                                Genome

                                       0




                                                                                                      71 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
# worm
            organism of
 yG in
           closest match
 family                     Function

   3         E. coli        acrAB operon repressor               Extinction?
   7          Yeast         unknown function
   3       Vaccinia virus   unknown function
   3      Vaccinia virus Host range protein
                                                                 Horz. Transfer?
   5          Human         Complementing protein
   4          Human         SEX gene
   3          Human         Initiation factor 4A
   4           Frog         Thyroid hormone receptor
   4           Fly          Multidrug resistance protein 1
   5           Cow          Polyadenylation specificity factor          Do not reproduce without permission
                                             Completely Dead
                                             Families in Worm
                                                Genome

                                       0




                                                                                                      72 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
# worm
            organism of
 yG in
           closest match
 family                     Function

   3         E. coli        acrAB operon repressor               Extinction?
   7          Yeast         unknown function
   3       Vaccinia virus   unknown function
   3      Vaccinia virus Host range protein
                                                                 Horz. Transfer?
   5          Human         Complementing protein
   4          Human         SEX gene
   3          Human         Initiation factor 4A                 Contamination?
   4           Frog         Thyroid hormone receptor
   4           Fly          Multidrug resistance protein 1
   5           Cow          Polyadenylation specificity factor          Do not reproduce without permission
           Worm yG families:
chemoreceptors & transposon functions
                                   Associated
                                 Gene Family in
                                  Worm vs. Fly
                            yG   worm Relative
Families                         only expansion




                                                                              73 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                  ?    in worm
RT                          59           6x
7TM chemoreceptor fam. #1   51    *
fam. unk. func. #1          31    *
7TM chemoreceptor fam. #2   27    *
7TM chemoreceptor fam. #3   22    *
Major sperm protein         21           28x
fam. unk. func. #3          20    *
fam. unk. func. #4          19    *                         Environ-
TcA transposase             19           23x                 mental
                                                           Response
7TM receptor fam. #4        17           1.4x                Family
                                                Do not reproduce without permission
           Worm yG families:
Unique or highly expanded relative to fly
                                   Associated
                                 Gene Family in
                                  Worm vs. Fly
                            yG   worm Relative
Families                         only expansion




                                                                              74 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                  ?    in worm
RT                          59           6x
7TM chemoreceptor fam. #1   51    *
fam. unk. func. #1          31    *
7TM chemoreceptor fam. #2   27    *
7TM chemoreceptor fam. #3   22    *
Major sperm protein         21           28x
fam. unk. func. #3          20    *
fam. unk. func. #4          19    *                         Environ-
TcA transposase             19           23x                 mental
                                                           Response
7TM receptor fam. #4        17           1.4x                Family
                                                Do not reproduce without permission
                  Common worm Pseudofolds
           yG                    yG    Genes
                 Genes
                                rank    rank
          rank    rank




                                                                             75 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
[Scop,
Murzin]
                                               Do not reproduce without permission
                  Common worm Pseudofolds
           yG                    yG    Genes
                 Genes
                                rank    rank
          rank    rank




                                                                             76 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
[Scop,
Murzin]
                                               Do not reproduce without permission
               Fly yG: Broken Fossils
       Size Genes   Total       Pseudogenes
       (Mb) (apx)    yG      Duplicated Processed
Yeast    12 6200 221 (4%)           221         0
Worm    100 19000 2168 (11%)       1960       208




                                                                                       77 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Fly     116 14000 <100 (1%)
Human*   69   930 350 (38%)         178       172

1)   What are they composed of? (composition)
2)   Where are they? (which organisms, chr. position)
3)   Which type of proteins are they? Why? (functions)
4)   Do processed & duplicated varieties differ?


                                                         Do not reproduce without permission
 Fly fossils are "broken up" by deletions

Explanation for having only 5% of yG of worm
is high rate of small genomic deletions (~10 - 50 bp)




                                                                                  78 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
[Petrov, Hartl]




                                         pseudomotifs
                                                    Do not reproduce without permission
          Same density of pseudomotifs
                in fly and worm
   Analyze occurrence of "pseudomotifs" (ancient, broken-
      up fossils) in intergenic regions relative to statistical
      expectation




                                                                                          79 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
   1329 Prosite Motifs
      (e.g ZnF C C f H H & Tubulin)
                   -x(2,4)-   -x(3)-   -x(8)-   -x(3,5)-



   TM-helices
                                                    worm fly
Size Whole Genome                                     97 116
 in Pure Intergenic Region (no repeats, ygenes, &c)   42   61
Mb Pseudomotifs                                      1.3 2.1

Number of over represented motifs in Prosite               40            70
                                                            Do not reproduce without permission
        Human yG: Duplicated v. Processed
         Size Genes   Total       Pseudogenes
         (Mb) (apx)    yG      Duplicated Processed
  Yeast    12 6200 221 (4%)           221         0
  Worm    100 19000 2168 (11%)       1960       208




                                                                                                      80 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
  Fly     116 14000 <100 (1%)
  Human*   69   930 350 (38%)         178       172
(*Chr 21+22 only)

   1)   What are they composed of? (composition)
   2)   Where are they? (which organisms, chr. position)
   3)   Which type of proteins are they? Why? (functions)
   4)   Do processed & duplicated varieties differ?

 Expansion of genome papers [Venter et al., Dunham et al., Lander et al.]
                                                                        Do not reproduce without permission
              Top Functional Families
     in Genes and yG on Human Chr. 21 and 22
                      Pseudogenes
                               Ig*           70

                       Ribosomal protein     43
                      Transcription factor   12
                       Other DNA binding     8




                                                                                81 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                           Receptor          8

                            Kinase           5




 Environmental
Response Family

                                                  Do not reproduce without permission
              Top Functional Families
     in Genes and yG on Human Chr. 21 and 22
                    Genes                  Pseudogenes
                     Ig*              69            Ig*           70

          Other DNA binding           32   Ribosomal protein      43
          Nucleotide binding          28   Transcription factor   12
          Transcription Factor        28   Other DNA binding      8




                                                                                                     82 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
         Other Nucleic-acid binding   16        Receptor          8

                  Kinase              14         Kinase           5




 Environmental
Response Family

                                                                       Do not reproduce without permission
              Top Functional Families
     in Genes and yG on Human Chr. 21 and 22
                    Genes                                    Pseudogenes
                     Ig*                   69                            Ig*               70

          Other DNA binding                 32               Ribosomal protein             43
          Nucleotide binding                28               Transcription factor          12
          Transcription Factor              28               Other DNA binding              8




                                                                                                                                               83 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
         Other Nucleic-acid binding         16                    Receptor                  8

                  Kinase                    14                     Kinase                   5



                                        Duplicated                                        Processed
                                                 Ig*                70                Ribosomal protein          42

                                      Transcription factor           6               Transcription factor        7

                                      Nucleotide binding             5               Other DNA binding           7

                                            Kinase                   4                     Receptor              4
 Environmental                           Transferase                 4              Other Nucleic-acid binding   4
Response Family
                                           Receptor                  4

                                                                                                                 Do not reproduce without permission
 Distribution of processed yG roughly
matches expectation of random insertions
                                      human
Constant                                    worm
                                      21+22
density
                           total         69    97
between          size (Mb)
                           non-coding    67    70




                                                                               84 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
human & worm
                 processed total        178 208
                    yG     per Mb       2.6   3.0
Occurrence related to expression  ribosomal proteins
Among ribosomal proteins:
     Proportional selection between subunits
     Uniform insertion across 21 & 22

                                                22
                                                21
                                                 Do not reproduce without permission
     Yeast yG: Simple Story & Mechanism
       Size Genes   Total       Pseudogenes
       (Mb) (apx)    yG      Duplicated Processed
Yeast    12 6200 221 (4%)           221         0
Worm    100 19000 2168 (11%)       1960       208




                                                                                       85 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Fly     116 14000 <100 (1%)
Human*   69   930 350 (38%)         178       172

1)   What are they composed of? (composition)
2)   Where are they? (which organisms, chr. position)
3)   Which type of proteins are they? Why? (functions)
4)   Do processed & duplicated varieties differ?


                                                         Do not reproduce without permission
Yeast yG concentrated near telomeres




                                                              86 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
             Pseudogenes

                ……


                ……
               Genes
                                Do not reproduce without permission
     Environmental response
      functions of yeast yG

                                                                yG

             Growth Inhibitor                                   16




                                                                                                   87 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                              (GIN11)

                     Flocculins                                 11

                             DUP                                6
         (hypothet. TM prot. associated with pheromone resp.)

          DEAD box helicase                                     6         Environ-
                                                                           mental
                                                                         Response
             Stress response                                    3          Family
                 (SRP/TIP, cell-wall mannoproteins)




       5 most common families in yG
Not same as the most common families in genes
                                                                     Do not reproduce without permission
Yeast yG come from yeast-specific families


         Fraction having a non-yeast homolog




                                                                             88 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
          40%                     80%


                yG        genes
                                               Do not reproduce without permission
  Resurrecting
 pseudogenes:             FLO8
                         disabled
 is it possible?           in lab
  Hypothetical            strain
                           (S288C)
  example of a




                                                                    89 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
     flocculin
Idea of                 Functioning
"untranslatable             FLO8
intermediates"             causes
in protein evolution    filamentous
has been around for       growth in
a while                      most
                            strains
[Nei, '70; Koch, '72]
       [Walsh]                        [ Fink ]
                                      Do not reproduce without permission
     A speculative mechanism for
   resurrecting yeast yG, via [PSI+],
  perhaps in environmental response
[PSI+] [ Lindquist ]

• Prion of Sup35p, translation-termination protein




                                                                                   90 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
• Causes read-through of stops
• Causes phenotypic diversity, through the
  expression of new or altered proteins
We find suggestive evidence that PSI resurrects yG:
• 35 yG easily resurrectable with only 1 stop
• Microarrays show some of these are expressed
   [M Snyder]
• Many involved in environmental response

Perhaps testable with selection experiments
                                                     Do not reproduce without permission
Evolutionary Implications of a Reservoir
Resurrectable yG for Creating New Folds




                                                                 91 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                   Do not reproduce without permission
                           Not all folds shared
                          between phylogenetic
                                groups 
                            Evolution of new
                                   folds




                                                                          92 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                             Plants
                                       20

  46     156     73                    104


                                       90
Eubacteria   Eukaryotes     Animals
                                            Do not reproduce without permission
Evolutionary Implications of a Reservoir
Resurrectable yG for Creating New Folds

Paradox:
going between folds
A & Z with all




                                                                 93 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
intermediates
functional


Pseudogenes free
of constraints of
being transcribed &
translated

[Koch '72]
                                   Do not reproduce without permission
                       yG: Summary
1) What are they composed of?
    Intermediate composition
    between genes & translated intergenic DNA
2) Where are they?
    On chromosomal ends.




                                                                                         94 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
    In all organisms, though reduced to pseudomotifs in the fly.
3) Which type of proteins are they? Why?
    Environmental response proteins.
    yG may be extra parts that can be resurrected.
    Potential mechanism suggested for yeast involving [PSI+].
4) Do processed & duplicated varieties differ?
    YES. (Duplicated yG described above.) Processed yG appear
    to be just randomly inserted from mRNA pool. Hence, they
    show obvious relationship to mRNA level & intergenic region
    size

                                                           Do not reproduce without permission
 Practical Backdrop to Integrated Gene
Annotation & Interpretation of DNA Arrays

 I               II                                    III

IV




                                                                                                95 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
 V                             VI

 VII                                          VIII
 137 potential new yeast genes Human DNA Arrays (ongoing)
 • Integrated approach:            • All of chr22 on a chip in ~1kb
   homology search + transposons +   chunks, probe for expression, TF
   microarrays                       binding
 • Small ORFs &                    • Need to have mapped landscape
   anti-sense to existing ORFs       (genes, yGs, repeats, SNPs, &c)
                                     to design chip & interpret results
 [Snyder]
                                                                  Do not reproduce without permission
              GeneCensus.org     Detailed
                                 Tables

  Alignment
  Database




                                                                          96 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
  Alignment                       ORF
  Server                          Query


                 Ranks   Trees
                                 PDB
                                 Query



PartsList.org
                                            Do not reproduce without permission
 Acknowledgements

Predicting Protein Function
on a Genomic Scale
Jiang Qian, Ronald Jansen,
Amar Drawid, Cyrus Wilson,
Hedi Hegyi, Dov Greenbaum,
Hayiuan Yu, Jimmy Lin,




                                                                                            97 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
Ning Lan, Yuval Kluger
Analysis of Pseudogenes
Paul Harrison, Zhaolei Zhang,
Nathaniel Echols,
Suganthi Balasubramanian,     Collaborators
Nicholas Luscombe,            M Snyder
Paul Bertone, Ted Johnson,      (A Kumar, H Zhu, M Bilgin, C Horack …)
Patrick McGarvey              S Weissmann (Z Lian, S Yamaga…)
Other Projects                P Miller (K Cheung), M Schultz
Yang Liu, Jochen Junker,      G Montelione et al.
Vadim Alexandrov, Rajdeep Das,
Werner Krebs, Brad Stenger      PartsList.org, GeneCensus.org, nesg.org
                                                              Do not reproduce without permission
                 Assessing Function
                      Globally




                                                                    98 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
   Known
 associated
  pairs have
same "cellular
     role"
(according to
MIPS, GO, &c)



                                      Do not reproduce without permission
          15


Odds rati
                            timeshifted
                            simultaneous
          10
                  Results on Function Prediction
                  5

                  0
                  0
                  6
                          C. Functions           P-value Cutoff
                  5




                                                                                                      99 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                                    2.7e-3
   Odds ratio R




                  4
                  3
                  2
                  1
                  0
                      9       10      11   12   13    14      15   16
                                           Score S

                                                                        Do not reproduce without permission
                    Correlation:

                      Always
                    Significant

                    Sometimes
                    Significant
                     (depends
                     on expt.)




                                                  100 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                      Never
                    Significant




 Based on Distributions,
Correlation of Established
 Functional Categories,
 Computer Clusterings
                    Do not reproduce without permission
                         25

                         20          B. Interactions
                                          unmatched




              Odds ratio R
                                           inverted
                         15                timeshifted

Results on               10
                                           simultaneous


Interaction                      5

Prediction                       0




                                                                                                                 101 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                 0
                                 6
                                         C. Functions            P-value Cutoff
                                 5
                                                                    2.7e-3
                  Odds ratio R


                                 4
                                 3
                                 2
                                 1
                                 0
                                     9       10       11   12   13    14      15     16
                                                           Score S



                                                                                   Do not reproduce without permission
  Protein-Protein
  Interactions &
       Expression

         Cell Cycle
      CDC28 expt. (Davis)




                                                                                               102 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                             Sets of interactions
                                                              (all pairs,
                                                               control)

                                            Pairwise interactions
                                                           (from MIPS)

                                                            (Uetz et al.)




                                          (strong interactions in perm-
between selected expression timecourses   anent complexes, clearly diff.)
                                                                 Do not reproduce without permission
  Protein-Protein
  Interactions &
       Expression

         Cell Cycle
      CDC28 expt. (Davis)




                                                                                               103 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu
                                             Sets of interactions
                                                              (all pairs,
                                                               control)

                                            Pairwise interactions
                                                           (from MIPS)

                                                            (Uetz et al.)




                                          (strong interactions in perm-
between selected expression timecourses   anent complexes, clearly diff.)
                                                                 Do not reproduce without permission

								
To top