my_talk by wanghonghx


									         Protein Analysis Workshop 2010

Bioinformatics group         Hung Ta
Institute of Biotechnology
University of helsinki
   Why are protein-protein interactions (PPIs) so important?.
   Experimental methods (high throughput) for discovering PPIs:
     •   Yeast-two-hybrid.
     •   AP-MS.
   PPIs databases: DIP, Biogrid, Intact, HPRD…
   Computational prediction of PPIs
     •   Genomics methods
     •   Biological context methods
     •   Integrative methods
     •   STRING (EMBL)
Why are PPIs so important?

Gene is the basic   Proteins, the working   Proteins function by
unit of heredity.   molecules of a cell,    interacting with other
Genomes are         carry out many          proteins, DNA, RNA,
availabe.           biological activities   small molecules.

genome                  proteome                 interactome
                                          Search for drug




                       Introduce The body onea new enterssay Y ,
                            X interactsthe body of proteins,
                                  into with bacteria) molecule, body
                       A pathogen (virus orproduces a list of the P1such
                               Diseases emerge
                             and produces its own, protein, sayto P
                               inhibiting it from , P routine than X.
                                     is more P1its 2 P3,… PN.
                             that X proteins:attracted to Y activities.1,
                              freeing P1 to get back to routine work.
                                  Search for drug

Bring out an effective drug
into the market could:
   Take 10-15 years
   Cost up to US$800 million
   Test up to 30,000 candidate

Databases of molecules interactions or linkages could
     help to cut down the search for drug molecules.
The types of PPIs
   Binary (physical) interactions: refer to the binding
    between two proteins whose residues are in contact at
    some point in time.
   Funtional linkages: implicate pairwise relationships
    between proteins that work together (participate in a
    common structural complex or pathway) to implement
    biological tasks.
Protein physical interactomes and
functional linkage maps are available for

    S. cerevisiae (Uetz et al. 2000; Ito et al. 2001; Ho et al. 2002; Gavin
     et al. 2002, 2006; Krogan et al. 2006, Tarassov et al. 2008; Yu et al.

    E. coli (Butland et al. 2005; Arifuzzaman et al. 2006)
    C. elegans (Li et al. 2004)
    D. melanogaster (Giot et al. 2003)
    Humans (Rual et al. 2005; Stelzl et al. 2005, Ewing et al. 2007)
    …
High throughput experimental methods
for discovering PPIs
   Yeast-two-hybrid (Y2H)
       Ito T. et al., 2001; Uetz P. et al., 2000; Yu H. et al., 2008

       Rual et al. 2005; Stelzl et al. 2005

   Affinity purification followed by mass spectrometry (AP-
       Gavin AC et al., 2002, 2006

       Ho Y. et al., 2002

       Krogan NJ et al., 2006
Y2H experiments
   Use a protein of interest as bait in order to
    discover proteins that physically interact with
    the bait protein; these are called prey.

   A single transcription factor is cut into two
    pieces called Binding Domain (BD) and
    Activation Domain (AD). Bait (prey) protein is
    fused to the BD (AD).

   If bait and prey proteins interact, the
    transcription of the reporter gene is initiated.

   High throughput screening the interactions
    between the bait and the prey library.
AP-MS experiments

   Fuse a TAP tag consisting of protein A and
    calmodulin binding peptide separated by TEV
    protease cleavage site to the target protein

   After the first AP step using an IgG matrix,
    many contaminants are eliminated.

   In the second AP step, CBP binds tightly to
    calmodulin coated beads. After washing which
    removes remained contaminants and the TEV
    protease, the bound meterial is released under
    mild condition with EGTA.

   Proteins are identified by mass spectrometry
AP-MS experiments
   Data output by MS is lists including bait
    protein and its co-purified partners
    (preys); each accompanied by a
    reliability score.

   Use a scoring system combining
    spokes and matrix models to generate
    a network of binary PPIs. Each
    interaction has a confidence score

   Eliminate low scoring links to obtain
    high confident network.

   The network is partitioned into densely
    connected regions, which are named
Computational methods of prediction

   Comparative Genomic methods      Biological context methods
        Gene neighbourhood               Co-expression

        Gene fusion                      GO

        Domain-based method              Text mining

        Phylogenetic

   Intergrative methods
Gene neighbourhood based method

   Protein a and b whose genes are close in different
          genomes are predicted to interact.

               Dandekar, T. et al. (1998). Conservation of gene order: A fingerprint of proteins that
               physically interact. Trends in Biochemical Sciences, 23(9), 324–328
Gene fusion (Rosetta stone)

  Protein a and b are predicted to interact if they combine (fuse) to
               form one protein in another organism.

                   Enright, A. Jet al. (1999). Protein interaction maps for complete genomes based on
                   gene fusion events. Nature, 402(6757), 86–90.
   Domain based methods

                                                                  Protein A         Protein B

       Validation of inferred DDIs remains difficult due to lack of sufficient
        and unbias benchmark datasets.
       The methods show limited performance at predicting PPIs.
        H.X. Ta, L. Holm,AS, MLE, PE
                          Biochem. Biophys. Res. Commun. (2009)

Well-known experimental
                                        Inferred domain-domain
       PPIs data
                                          interactions (DDIs)

AS: association; MLE: Maximum Likehood Estimation; PE: Parsimony Explanation
Phylogeny based methods

  Protein a and c are predicted to interact if they have
             similar phylogenetic profiles.

               Pellegrini, M. et al. (1999). Assigning protein functions by comparative genome
               analysis: Protein phylogenetic profiles. PNAS, 96(8), 4285–4288
Biological context methods

   Gene expression: Two protein whose genes exhibit very
    similar patterns of expression across multiple states or
    experiments may then be considered candidates for functional
    association and possibly direct physical interaction.
   GO (Gene Ontology) annotations: two interacting proteins
    likely have the same GO term annotations.
   Text-mining: Extract interacting protein information from
    literature (PubMed..): ”is protein K mentioned with protein I in

The techniques are used to validate PPIs discovered by other
    approaches or are integrated with others in integrative approaches.
Integrative methods

                 Naive bayes
                 Random Forest
                 Decision Tree
                 Logistic Regression
                 Support Vector Machines

                                     Jansen R. et al., Science 2003
                                     Bader J.S. et al., Nat Biotech 2004
                                     Lin N. et al., BMC Bioinformatics 2004
                                     Zhang L. et al., BMC Bioinformatics 2004
Databases of PPIs
   DIP(
             71,275 interactions
             23,200 proteins
             372 organisms
   BioGRID (
             247,366 non-redundant interactions
             31,254 unique proteins
             17 organisms
   IntAct (
             232,793 interactions
             69,335 proteins
   MINT (
             89,956 interactions
             31,631 proteins
   SGD (
    Saccharomyces Genome Database
   HPRD (
             39,194 interactions
             30,047proteins
   MIPs: interactions, complexes

   STRING: Known and Predicted Protein-Protein Interactions
 Protein function
 Protein-protein relationship
 Evolution of protein-protein interaction
 The network of interacting proteins
 Unknown protein-protein interaction
 The best interaction conditions
DIP-Searching information
Find information about your protein
DIP Node (DIP:1143N)
Graph of PPIs around DIP:1143N

                     Nodes are proteins
                     Edges are PPIs
                     The center node is DIP:1143N
                     Edge width encodes the number
                      of independent experiments
                      identyfying the interaction.
                     Green (red) is used to draw core
                      (unverified) interactions.
                     Click on each node (edge) to
                      know more about the protein
List of interacting partners of
STRING: Search Tool for the Retrieval of
Interacting Genes/Proteins
   A database of known and predicted protein interactions
   Direct (physical) and indirect (functional) associations
   The database currently covers 2,590,259 proteins from 630
   Derived from these sources:

   Supported by
Searching information

  Query infomation via protein names or protein sequences.
Graph of PPIs
   Nodes are proteins
   Lines with color is an evidence of
    interaction between two proteins.
    The color encodes the method
    used to detect the interaction.
   Click on each node to get the
    information of the corresponding
   Click on each edge to get
    information of the interaction
    between two proteins.
List of predicted partners

   Partners with discription and confidence score.
   Choose different types of views to see more detail
Neighborhood View

   The red block is the queried protein and others are its neighbors in
    organisms. Click on the blocks to obtain the information about
    corresponding proteins.
   The close organisms show the similar protein neighborhood patterns.
   Help to find out the close genes/proteins in genomic region.
Occurence Views

   Represents phylogenetic profiles of proteins.
   Color of the boxes indicates the sequence similarity between the proteins
    and their homologus protein in the organisms.
   The size of box shows how many members in the family representing the
    reported sequence similarity.
   Click on each box to see the sequence alignment.
Gene Fusion View

   This view shows the individual gene fusion events per species
   Two different colored boxes next to each other indicate a fusion
   Hovering above a region in a gene gives the gene name; clicking on
    a gene gives more detailed information
   Skrabanek L, Saini HK, Bader GD, Enright AJ. Computational prediction of
    protein-protein interactions. Methods Mol Biol. 2004;261:445-68
   Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein
    Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol
    3(3): e42. doi:10.1371/journal.pcbi.0030042
   Benjamin A. Shoemaker, Anna R. Panchenko. Deciphering Protein–Protein
    Interactions. Part II. Computational Methods to Predict Protein and Domain
    Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043
   Pitre S, Alamgir M, Green JR, Dumontier M, Dehne F, Golshani A. Computational
    methods for predicting protein-protein interactions. Adv Biochem Eng
    Biotechnol. 2008;110:247-67.
   Wodak SJ, Pu S, Vlasblom J, Séraphin B. Challenges and rewards of interaction
    proteomics. Mol Cell Proteomics. 2009 Jan;8(1):3-18
   Yanjun Qi, Ziv Bar-joseph, Judith Klein-seetharaman. Evaluation of different
    biological data and computational classification methods for use in protein
    interaction prediction. PROTEINS: Structure, Function, and Bioinformatics.
Why protein-protein interactions (PPI)?
 PPIs are involved in many biological processes:
        Signal transduction
        Protein complexes or molecular machinery.
        Protein carrier.
        Protein modifications (phosphorylation)
        …

 PPIs help to decipher the molecular mechanisms
 underlying the biological functions, and enhance the
 approaches for drug discovery
Assessment of large–scale datasets
of PPIs
    Benchmarking high-throughput interactions:
         Y2H: Uetz et al. 2000; Ito et al. 2001
         AP-MS: Gavin et al. 2006; Krogan et al. 2006
    Binary gold standard (GS): positive reference set (PRS)
     and random reference set (RRS).
    MIPs co-complex gold standard.
    Measure large-scale datasets against Binary-GS and

Yu H, et al. (2008). Science 322: 104-110
Assessment of large–scale datasets of PPIs


   AP/MS performs well at detecting co-complex associations according to
   Y2H performs well at detecting binary interactions according to Binary-GS

                                      Yu H, et al. (2008). Science 322: 104-110

To top