Docstoc

Protein Protein Interaction Network

Document Sample
Protein Protein Interaction Network Powered By Docstoc
					Protein-Protein Interaction
Network

        Gautam Chaurasia
            08.07.04
    Overview
   Introduction.

   Three Different Models:

       Structure of the protein-protein interaction network.
            Non-power law.

       Evolutoin of the network.
            Power Law Random Graphs.

       Detection of functional modules from protein interaction
        networks.
            Clustering algorithm



                      Seminar on Protein-Protein Interaction Network   2
    Introduction
   The network is viewed as a graph whose nodes correspond to
    proteins. Two proteins are connected by an edge if they interact.

   The collection of all interactions between the proteins of an
    organism is called interactome.

   The Y2H system (yeast-two-hybrid) is used to yield a
    comprehensive map of protein-protein interaction network.

   The network resembles a random graph in that it consists of many
    small subnets (groups of proteins that interact with each other but
    do not interact with any other protein) and one large connected
    subnet comprising more than half of all interacting proteins.



                    Seminar on Protein-Protein Interaction Network        3
    Structure of PPI Network
   Yeast protein
    interaction network.
    (Uetz et al. 2000)

   A: A two-dimensional
    drawing of the entire
    network.

   B: The giant (hub)
    component of this graph
    consists of 466 proteins.

   C: A small section of the
    hub component, with
    gene or open reading
    frame names shown next
    to each node.

                     Seminar on Protein-Protein Interaction Network   4
Structure of PPI Network

   Degree:
        Described by the connectivity k of the node, which tells us how
         many links the node has to other nodes.


   Degree distribution:
        The degree distribution p(k), gives the probability that a selected
         node has exactly k links.

        P(k) is obtained by:




                     Seminar on Protein-Protein Interaction Network            5
    ER Random Graph
   ER Random Graphs:
      An ER random graph consists of n nodes and k edges, where any
       pair of nodes is equally likely to be connected by one of the k
       edges.
      Start with a given number of nodes and add links randomly.
      which creates a graph with approximately pN(N–1)/2 randomly
       placed links.
   The node degrees follow a Poisson distribution.




                    Seminar on Protein-Protein Interaction Network       6
Scale-Free Network
   Scale-free networks: Rich getter richer.
   Scale-free networks are characterized by a power-law degree
    distribution; the probability that a node has x links follows,



   where γ > 0, so that a plot of log(degree) by log(frequency)
    shows a decreasing linear trend.




                  Seminar on Protein-Protein Interaction Network     7
Model-I: Non-Power Law-I
   The essence of this model is to observe that parts of proteins, called
    domains, contains sites into which complementary parts of other
    protein can bind.
   These complementary parts are referred to as positive and negative
    aspects of domain.
   Bipartite sub-graph-graphs comprising two disjointed sets of nodes
    in which each node in one set is connected to every node in the other
    set.


    Fig 2: In this figure, a particular domain for which
    the positive form is present in three proteins A, B,
    and C, and whose negative form is in four proteins
    W, X, Y, Z.




                    Seminar on Protein-Protein Interaction Network           8
Model-I: Non-Power Law-II
   We assume that there are n proteins and m domains with a negative
    and positive form.
   A domains may be any of the 2m types; 1+, 1-, 2+, 2-,....,m+, m-.
   Each of the n proteins contains each of the 2m possible domains with
    constant probability p.
   Let Xi be the number of domains that the ith protein has is distributed
    binomially:



   All the Xi are independent and identically distributed.

   Thus, the average number of sites per protein l = 2mp.



                     Seminar on Protein-Protein Interaction Network           9
Model-I: Non-Power Law-III
   Let Yi be the number of interactions of the ith protein.
   So the probability that any other protein j will not connect to i only if it
    does not contain any of the x complementary domain aspects.



   Since there are n-1 such proteins, we have:



   Where q = (1-p). Hence, the unconditional distribution of Yi is a
    binomial mixture of binomials




                     Seminar on Protein-Protein Interaction Network                10
Model-I: Non-Power Law-IV
   By Using Inclusion-Exclusion property-type expression we get:




   Binomial distribution: An experiment with a fixed number of
    independent trials, each of which can only have two possible outcomes.
        For example: Tossing a coin 20 times to see how many tails occur.


   Inclusion-Exclusion: Let A denote a finite set and let P1, ...,Pn be
    any given properties. We want to express the number of elements of A
    which have none of these properties in terms of numbers of elements
    which have some of these properties.


                      Seminar on Protein-Protein Interaction Network         11
Log-log plot of the distribution
   f(y) is plotted for n=6000 proteins, m=1000 domains and
    λ = 1,2.

   The resulting graph shows clear non-linearity.

    Fig: Log–log plot of the distribution
    of vertex degrees in the modelled
    interactome with 6000 proteins,
    1000 domains and an average of 1
    or 2 domains per protein, shown as
    solid and dotted lines respectively




                     Seminar on Protein-Protein Interaction Network   12
   Degree distribution of sampled sub-
   graphs
     A total of 450 proteins were sampled at random.
     The mean number of neighbors for each protein in this sample
      was 5.
     The resulting graph has approximately the same number of
      vertices and edges as the Uetz datasets.

Fig:The Ito and Uetz datasets are
plotted in black and blue,
respectively. A straight line (power
law) fit is shown as a dotted line. The
distribution is obtained by sampling
from this model with 6000 proteins,
1000 domains and an average of 1
domain per protein is plotted in red.


                       Seminar on Protein-Protein Interaction Network   13
Degree distribution of sampled sub-
graphs
    A total of 1500 proteins were sampled at random.
    The resulting graph shows the fit of this model to datais better
     than power law.

    Fig: The DIP dataset is plotted
    in black. The distribution
    obtained by sampling from this
    model with 6000 proteins, 1000
    domains and an average of 2
    domains per protein is plotted
    in red. A straight line (power
    law) fit is shown as a dotted
    line.




                     Seminar on Protein-Protein Interaction Network     14
Conclusions

   The degree distribution predicted by
    this model fit the data better than do
    power law distribution.

   This model fits better to the subnet as
    compared to the power law


              Seminar on Protein-Protein Interaction Network   15
    Example
   This model can be used to infer the existence of interactions not
    yet detected experimentally, by using the predicted bipartite
    structure of sub-graphs.

    In this figure strongly suggests that o-
    Raf1, PLC-, RALGDS, AF-6, RLF and
    SUR-8 contain a motif that interacts
    with a complementary motif in R-Ras,
    Rap1A, KRAS2B, RIN, RIBB, N-Ras
    and H-Ras. This would imply that for
    instance RLF and AF-6 should interact
    with Rap1A and R-Ras in order to
    complete the bipartite graph.



                       Seminar on Protein-Protein Interaction Network   16
Model-II

The Yeast Protein Interaction Network
Evolves Rapidly and Contains Few
Redundant Duplicate Genes.




           Seminar on Protein-Protein Interaction Network   17
Evolution of Function
   Examples:

       Partially redundant duplicates
            CLN1/2/3: Involves in regulation of activity of yeast cyclin dependent
             kinase. Ks = 2.4, over 200 Myr.
            TPK1/2/3: Catalytic subunits of yeast cyclic AMP-dependent protein
             kinase. Ks = 1.31


       Diverged gene function
            EDN vs. ECP: EDN has high RNAse activity, act as antivetroviral agent,
             whereas ECP is an antibacterial toxin exertings.
            dopa carboxylase and amd: Duplicates are expressed in different parts
             of the cell, therefore having different biological functions.



                      Seminar on Protein-Protein Interaction Network                  18
Objective
   Two main questions are addressed in this model are:

       At what rate does functional divergence occur after gene
        duplication for a large sample of duplicated gene in
        genome?

       Which effects have the products of the duplicated genes in
        the protein-protein interaction network?




                   Seminar on Protein-Protein Interaction Network    19
Data for Analysis
   The required information on protein-protein interaction data
    comes from a large experiment (Uetz et al. 2000) using the
    yeast two-hybrid system (Field and Song 1989).

        985 proteins, 899 interactions.
        45 self intearctions.

   Data for duplicated genes were obtained from the University of
    Oregon and described by using the fraction Ks .

        Ks is the measure of the similarity between two genes.
        Only those genes pairs were considered for further analysis whose
         Ks < 5 cutoff.
        There were such 9,059 pairs among ~6,000 genes with Ks < 5.



                     Seminar on Protein-Protein Interaction Network          20
Power Law Random Graphs-I
   PL random graphs are random graphs whose degree probability
    distribution P(d) is proportional to d-t for some constant t.

   First, n = 6279 isolated nodes were generated, and a random
    integer d > 0 was assinged to these node.

   This random number d was generated in the following way,




   where r is a random real number uniformly distributed in the
    interval (0, 1), and g > 0 , is a constant.



                  Seminar on Protein-Protein Interaction Network    21
Power Law Random Graphs-II
   Second, this number d was accepted with probability d-t.
   The resulting distribution of d is a Power law with an weighing
    function.




   If d was discarded, a new d was generated according to same
    prescription, and this process was repeated untill a d was accepted

   Once d was accepted, it was assigned to the randomly chosen
    node.


                  Seminar on Protein-Protein Interaction Network      22
Power Law Random Graphs-III
   Another node was chosen at random (without replacement of
    the previous chosen node), an integer d was assigned to it in
    same way, and this process was repeated untill the sum S of all
    the integers assigned to the chosen nodes first exceed 2k,
    where k is the number of edges.

   The integer assigned to each node correspond to the node‘s
    degree.

   Nodes were connected as per the number of edges and this was
    done untill the number of edges is S/2 = k.



                  Seminar on Protein-Protein Interaction Network      23
Interaction Network vs. Random Graph

   Comparison of protein contact network (n = 985 nodes, k = 899
    edges) with random graphs.
   The PPI network has an excess of proteins with degree 1, but
    fewer proteins with a higher degree than the ER Random graph.
   Whereas degree distribution of PPI network is consistent with
    the Power Law Random graph.




                  Seminar on Protein-Protein Interaction Network    24
Duplications and Interactions
   This figure illustrate the effect of gene duplication on gene
    products involved in protein interactions.




                   Seminar on Protein-Protein Interaction Network   25
Divergence of Interactions
   20% of duplicate gene pairs share an interaction partner with
    0.5 < Ks < 1.0, whereas 80% of genes have no common
    interaction partner with their duplicates approximately 100 Myr
    after duplication.
   Ks > 2 approaches the value expected for randomly chosen
    gene pairs.

                                                 The histogram of the
                                                 fraction of duplicates genes
                                                 whose products have at
               200-300 myr                       least one interacting protein
                                                 in common as a function of
                                                 Ks.

                                                 Intercation turn over
                                                 every 200-300 Myr


                   Seminar on Protein-Protein Interaction Network                26
Divergence of Interactions
   Only 57% of the most closely related duplicate gene pairs
    (0<Ks<.5) for which both genes interact with other proteins
    share any protein interaction partner in the same subnet.
   For 380 gene pairs with Ks > 0.5 the fraction of duplicate
    partners with shared interaction is < 20%.
   Ks > 1.5 is close to the random expected value.




                  Seminar on Protein-Protein Interaction Network   27
The Rate of Interaction Loss
   The divergence in protein interaction after gene
    duplication is largely due to interaction loss.

   127 pairs with KS < 2, where both duplicates engage
    in protein-protein interaction network.

        920 interactions were present after duplication.
        429 of which have been lost since at the rate of 2.3e-3/Myr.


   Is this estimate low or high?
        interaction data noise leads to overestimates.
        young pairs and double-losses lead to underestimates.
                    Seminar on Protein-Protein Interaction Network      28
Divergence of Self-interactions

   Loss or gain of interactions between a pair of paralogs
    due to self-interaction.



                                                      Self-Interactions and
                                                      interactions between
                                                      products of duplicate
                                                      genes.




                Seminar on Protein-Protein Interaction Network                29
Divergence of Self-interactions
   Total of 25 paralogs.

        Only few conserved self-interactions
         was found.


   New interactions

        13/25 new interactions at the rate of
         2.88 x 10-6 /Myr per pair Ks = 1
         corresponds to 100 Myr.




                     Seminar on Protein-Protein Interaction Network   30
Conclusions
   Protein-protein interaction network shows a power-law degree
    distribution.

   Total 6280 ORF in yeast genome with 1.97 x 107 possible pair-
    wise interactions.

   New interactions forming at slow rates/pair, and evolved at a
    rate of 2.88x10-6 per protein pair per million year.

   Extrapolating the above estimate to entire yeast proteome would
    thus yield (1.97 x 107 x 2.88x10-6) = 57 newly evolved
    interaction per million years.




                  Seminar on Protein-Protein Interaction Network      31
  Model-III- Cluster Analysis

Detection of Functional Modules from
Protein Interaction Networks of
S.cerevisiae.




           Seminar on Protein-Protein Interaction Network   32
Cluster Analysis
   CA is an obvious choice of methodology for the extraction of functional
    modules from protein interaction networks.

   Clustering is defined as the grouping of objects based on their sharing
    discrete, measureable properties.

   In functional genomics, clustering algorithm have been devised for
    multiple tasks, such as mRNA expression analysis and the detection of
    protein families.

   The aim of this model is to detect biologically meaningfull patterns in
    the entire known protein interaction network of S.cerevisiae.



                     Seminar on Protein-Protein Interaction Network           33
Clustering Algorithm
   The protein interaction data were obtained from DIP database.

   The network of proteins is first transformed into a weighted graph.

   The weights attributed to each intearaction reflect the degree of
    confidence level, represented by the number of experiments that
    support the interactions.

   The score of 3.0 was assigned for the first instance of interaction, and
    increased by 1 if the interaction supported by another method or 0.25
    if the interaction had already been observed by that method.




                     Seminar on Protein-Protein Interaction Network            34
Clustering Algorithm
   The resulting graph is weighted network of proteins connected by
    edges.

   Now this weighted graph is converted into a line graph L(G), in which
    edegs now represent nodes and nodes represent edges.




                    Seminar on Protein-Protein Interaction Network          35
Clustering Algorithm
   The scores for the original
    constituent interaction are
    then averaged and assigned to
    each edge.

   The TribeMCL software, an
    algorithm for clustering graph,
    was used to cluster the
    interaction network and
    recover cluster of associated
    interactions.

   These clusters range in size
    from 2 to 292 components
    (average size is 8.05), and
    form a scale-free protein
    network.



                       Seminar on Protein-Protein Interaction Network   36
Results
   Total of 1046 clusters were obtained.

   In this analysis, each protein was on average present in 2.1 clusters.

   Only 76 interactions and 146 proteins (represent only < 1% of
    total data), which were weakly connected to the main interaction
    network, were discarded by the clustering method.

   The found Clusters were classified in three categories according to the
    functional involvement of proteins in different machanism.
        KEGG: regulatory and metabolic classifications (20%).
        GQFC: Genequiz automatic functional classification (45%).
        MIPS: Cellular localization (48%).



                   Seminar on Protein-Protein Interaction Network            37
    Validation of the Clustering Method-I
   Scoring the cluster: Cluaters are validated by assesing the
    consistency of protein classification within an individual cluster.
   This is measured, for each of three classifiaction schemes, by
    calculating the redundancy of each cluster j




   Rj = redundancy (Rj) of each cluster j.
   n = represents the number of classes in the classification
    scheme,
   Ps = represents the relative frequency of the class in cluster j,
   The numerator represents the information content in bits given
    by entropy (H),
   The denominator is a normalizing factor representing the
    maximum entropy for the cluster j (Hmax).
                      Seminar on Protein-Protein Interaction Network      38
Validation of the Clustering Method-II




  Fig.: Module validation using biological classification
  schemes
               Seminar on Protein-Protein Interaction Network   39
Validation of the Clustering Method-III




Fig.: Module validation using biological classification schemes

                Seminar on Protein-Protein Interaction Network    40
Validation of the Clustering Method-IV




Fig.: Module validation using biological classification schemes

                Seminar on Protein-Protein Interaction Network    41
    Example-I Cluster 55
   Here, cluster 55 recovers
    a set of protein interactions
    (inset) that are involved in
    vaculor transport and
    fusion from ER via pre-
    vacuolar compartment.




                        Seminar on Protein-Protein Interaction Network   42
Examples-II clusters 32 and 86
   Recovery of signal transduction pathway controlling cell wall
    biogenesis, from the membrane protein (Fks1) to the trancription
    factors activated by this pathway (Swi4, Swi6 and Rlm1).Pathway was
    recovered as a set of two clusters connected by two proteins (Pkc1p
    and Smd3p), shows one-to-many relationship.




                 Seminar on Protein-Protein Interaction Network           43
Network of functional modules
   This graph shows the connection between 40 functional
    modules connected by shared proteins.




               Seminar on Protein-Protein Interaction Network   44
Conclusions
   This model can be used to predict poorly characterized proteins
    into their functional context according to their interacting
    partners within a module.

   The predictve power of this model allows us to examine the
    organization and coordination of multiple complex cellular
    processes and determine how they are organized into pathways.

   One-to-many relationship can be used for pathway discovery.




                  Seminar on Protein-Protein Interaction Network      45
References
   On the structure of protein–protein interaction
    Networks A. Thomas, R. Cannings, N.A.M. Monk, and C.
    Cannings. Biochemical Society Transactions (2003) Volume 31,
    part 6.

   The Yeast Protein Interaction Network Evolves Rapidly
    and Contains Few Redundant Duplicate Genes. Andreas
    Wagner; Mol. Biol. Evol. 18(7):1283–1292. 2001.

   Detection of Functional Modules From Protein
    Interaction Networks Jose B. Pereira-Leal,1 Anton J.
    Enright,2 and Christos A. Ouzounis1 PROTEINS: Structure,
    Function, and Bioinformatics 54:49–57 (2004).



                  Seminar on Protein-Protein Interaction Network   46

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:10/9/2012
language:English
pages:46