Protein protein interaction networks by cqb17097

VIEWS: 34 PAGES: 43

									                      Protein protein interaction networks
Most ideas on biological networks based on initial HT data for
Sacharomyces cerevisae (Uetz Y2H, Ito) – unicellular, compartmentalized, all
proteins

Program for today:

Further experimental data sets
(1) C. elegans – multicellular, focus on proteins that promote cell-cell interactions

(2) Drosophila melanogaster

(3) Classification of networks

(4) Essentiality of proteins




   8. Lecture WS 2004/05              Bioinformatics III                          1
              Current status of research on real networks




► Many systems show small-world property
► Statistical abundance of „hubs“, p(k) follows a power-law distribution
 Robustness to damages, vulnerability to attacks

► Realize that most complex networks are the result of a growth process
► View networks as dynamical systems that evolve through the subsequent
addition and deletion of vertices and edges. Requires dynamical theory.

What is left?

                                                          Eur. Phys. J. B 38, 143 (2004)

    8. Lecture WS 2004/05            Bioinformatics III                                    2
                                           10 questions
Are there formal ways of classifying the structure of different growing models?

Open questions concerning the universality of some topological properties, the correlations introduced by
the dynamic process and the interplay between clustering, hierarchies and centralities in networks.
Aim for rigorous mathematical analysis.


Are there further statistical distributions (except p(k), <C>, <L> and the degree-degree
correlation p(k,k‘)) that can provide insights on the structure and classification of
complex networks?

Why are most networks modular?

Are there universal features of network dynamics?

Networks are not only specified by their topology but also by the dynamics of information or traffic flow
taking place along the links. Aim at mathematical characterization for general principles describing the
networks‘ dynamics.



                                                                      Eur. Phys. J. B 38, 143 (2004)

     8. Lecture WS 2004/05                       Bioinformatics III                                     3
                                      10 open questions
How do the dynamical processes taking place on a network shape the network
topology?

Dynamics, traffic and underlying topology of networks are mutually correlated.
 Need to obtain large empirical datasets that simultaneously capture the topology of the network and the
time-resolved dynamics taking place on it.


What are the evolutionary mechanisms that shape the topology of biological networks?

Uncovering evolution is relatively simple in technological and large infrastructure networks, but not in
biological networks. Here, the role of evolution and selection in shaping biological networks is still unclear,
especially if we want a quantitative dynamical implementation of evolutionary principles in network
modeling.




                                                                       Eur. Phys. J. B 38, 143 (2004)

     8. Lecture WS 2004/05                        Bioinformatics III                                      4
                                          10 questions
How to quantify the interaction between networks of different character (network of
networks)?

Most networks are interconnected among them forming networks of networks. E.g. Internet, energy and
power distribution, or gene network interconnected with protein-protein interaction network and with the
metabolic network.
Understanding and characterization of the complicate set of regulatory and feedback mechanisms
connecting various networks is one of the most ambitious tasks in network research.


How to characterize small networks?

Many real world networks are far from being large scale objects that are well described by statistical
measures. Scaling behavior or average properties are not well defined for small networks  new concepts
and mathematics required


Why are social networks all assortive, while all biological and technological networks
are disassortative?
In social networks, hubs tend to connect to eachother („assortive“). In biological and technological networks,
hubs mostly connect to less connected hubs.

                                                                     Eur. Phys. J. B 38, 143 (2004)

     8. Lecture WS 2004/05                      Bioinformatics III                                     5
         C. Elegans protein-protein interaction network
As Y2H baits, select set of 3024 worm predicted proteins that relate directly or
indirectly to multicellular functions.

Cloned ORFs that are not autoactivated by Y2H:GAL1::HIS3 reporter : 1873

Only consider those that activate at least 2 out of 3 different Gal4-responsive
promoters.

Divide in 3 confidence classes.
Core1: 858 interactions, 3 times found independently
Core 2: 1299 interactions, found < 3 times, passed retest
Non-core: 1892 other interactions found in Y2H screen




                                                         Li et al., Science 303, 5657 (2004)

  8. Lecture WS 2004/05             Bioinformatics III                                         6
       C. Elegans protein-protein interaction network
Coaffinity purification assays. Shown are 10 examples from the Core-1,
Core-2, and Non-Core data sets. The top panels show Myc-tagged prey
expression after affinity purification on glutathione-Sepharose,
demonstrating binding to GST-bait. The middle and bottom panels show
expression of Myc-prey and GST-bait, respectively. The lanes alternate
between extracts expressing GST-bait proteins (+) and GST alone (–).
ORF pairs are identified in table S1 with the lane number corresponding
to the order in which they appear in the table.




                                                     Li et al., Science 303, 5657 (2004)

8. Lecture WS 2004/05           Bioinformatics III                                         7
                        Confidence of interactions




                                                      Li et al., Science 303, 5657 (2004)

8. Lecture WS 2004/05            Bioinformatics III                                         8
         C. Elegans protein-protein interaction network
Estimate coverage:

out of 108 known interactors in WormPD („literature data set“)
only 8 Core and 2 Non-core interactions are found in this benchmark data set

 coverage is ca. 10% of all interactions

In silico searches for potentially conserved interactions, „interologs“ whose
orthologous pairs are known to interact in one or more other species.

High-confidence yeast interaction data  949 potential worm interologs
+ other data
5534 interactions for worm, connecting 15% of the C. elegans proteome.




                                                          Li et al., Science 303, 5657 (2004)

  8. Lecture WS 2004/05              Bioinformatics III                                         9
       Analysis of the C. elegans protein-protein network
(A) Nodes (proteins) are colored
according to their phylogenic
class: ancient (red), multicellular
(yellow), and worm (blue).
Giant network component
contains 2898 nodes connected
by 5460 edges.
The inset highlights a small part
of the network.
(B) The proportion of proteins,
                                       (C) Do evolutionary recent proteins preferentially
P(k), with different numbers of
                                       interact with eachother?
interacting partners, k, is shown
                                       „Ancient“: 748 proteins with yeast ortholog
for C. elegans proteins used as
                                       „Multicellular“: 1314 proteins with ortholog in
baits or preys and for S.
                                       Drosophila, Arabidopsis or human but not in yeast
cerevisiae proteins.
                                       „Worm“: 836 proteins with no ortholog outside of worm.
Again, the worm interactome
                                       The 3 groups connect equally well with each other
network exhibits small-world and
                                        new cellular functions rely on a combination of
scale-free properties.
                                       evolutionarily new and ancient elements.
 Li et al., Science 303, 5657 (2004)
     8. Lecture WS 2004/05                Bioinformatics III                           10
       Relate interactome with transcriptome and phenome
(D) Overlap with transcriptome in yeast, Pearson correlation coefficients (PCCs) were calculated
and graphed for each pair of proteins in the interaction data sets and their corresponding
randomized data sets. Red area corresponds to interactions that show a significant relationship
to expression profiling data (P < 0.05). 9.5% of interacting core proteins are co-expressed.
Note: 75% of literature pairs (= biologically relevant) do not co-express.

(E) Example of highly connected cluster (Y2H interactions) where both proteins belong to
common C. elegans expression clusters.

(F) Proportion of interaction pairs where both genes are embryonic lethal (P < 10-7).




                                                          Li et al., Science 303, 5657 (2004)
    8. Lecture WS 2004/05                 Bioinformatics III                                    11
                Example of highly connected subnetwork
Conclusions:
- Y2H data set provides functional
hypotheses for 1000s of uncharacterized
proteins of C. elegans.

Integration with other functional
genomic data indicates that the
correlation between
transcriptome and interactome
data is lower than found in yeast.

Explanation? Biological processes in
multicellular organisms may occur
differently in the organism, across various
organs, tissues, or single cells.

                                                           Li et al., Science 303, 5657 (2004)

    8. Lecture WS 2004/05             Bioinformatics III                                         12
Drosophila is an important model for
human biology.

High-throughput Y2H screen identified
20405 interactions involving 7048
proteins.




Giot et al. Science 302, 1727 (2003)

  8. Lecture WS 2004/05                Bioinformatics III   13
     Confidence scores for protein-protein interactions
Manual method: expert biologist reviewed list of interactions on the basis of the
names of the proteins in each interaction pair.
High-confidence interactions: those published previously or those involving two
proteins of the same complex.
Low-confidence interactions: unlikely to occur in vivo, e.g. an interaction between
a nuclear and an extracellular protein.

Automated method: compare Drosophila interactions with those in yeast.
Positive examples: interacting proteins whose yeast orthologs interact as well
Negative examples: Drosophila interactions whose yeast orthologs are a distance
of 3 or more protein-protein interaction links apart (pair of random yeast proteins
has distance of 2.8)

Positive training set: 129 examples (70 manual, 65 automated, 6 common)
Negative training set: 196 examples (88/112/4)

                                                          Giot et al. Science 302, 1727 (2003)

   8. Lecture WS 2004/05             Bioinformatics III                                      14
       Confidence scores for protein-protein interactions
Fit generalized linear model to
training set with predictors
- number of times each
interaction was observed
- number of interaction partners
of each protein
- local clustering of network
- gene region (5‘ untranslated
region, coding sequence, 3‘
                                                                   Validate selection:
untranslated region)
                                                                   confidence score for
                                                                   interaction correlates
Find dividing surface.
                                                                   well with correlation
                                                                   of gene ontology (GO)
                                                                   annotation (see C)


                                                        Giot et al. Science 302, 1727 (2003)

     8. Lecture WS 2004/05         Bioinformatics III                                      15
    Confidence scores for protein-protein interactions

(D) p(k) for all interactions (black circles) and
for the high-confidence interactions (green
circles). Linear behavior in this log-log plot
would indicate a power-law distribution.
Although regions of each distribution appear
linear, neither distribution may be adequately fit
by a single power-law. Both may be fit,
however, by a combination of power-law and
exponential decay.

Faster decay of high-confidence interactions
may indicate that highly connected proteins
may be suppressed in biological networks.



                                                          Giot et al. Science 302, 1727 (2003)

  8. Lecture WS 2004/05              Bioinformatics III                                      16
       Statistical properties of refined Drosophila PI map
The high-confidence Drosophila protein-protein interactions
form a small-world network with evidence for a hierarchy of
organization. Network properties are presented for the giant
connected component, in which 3659 pairwise interactions
connect 3039 proteins into a single cluster.

(A) The probability distribution for the shortest path between
a pair of proteins in the actual network (green points) peaks
at 9 to 11 links, with a mean of 9.4 links. In contrast, an
ensemble of randomly rewired networks shows a mean
separation of 7.7 links between proteins.

Biological organization may be responsible for flattening the
actual network by enhancing links between proteins that are
already close.




                                                                 Giot et al. Science 302, 1727 (2003)

     8. Lecture WS 2004/05                  Bioinformatics III                                      17
     Statistical properties of refined Drosophila PI map
(B) Clustering is analyzed quantitatively by
counting the number of closed loops
(triangles, squares, pentagons, etc.) in which
the perimeter is formed by a series of proteins
connected head-to-tail, with no protein
repeated.
The actual network (green points) shows an
enhancement of loops with perimeter up to 10
to 11 relative to the random network (red
points).
In both (A) and (B), the one-level and two-level
models produce nearly indistinguishable fits
for the random networks, indicating the
absence of structured clustering (= a hierarchy
where proteins connect to protein complexes,
and a second level where protein complexes
interact with eachother).

                                                              Giot et al. Science 302, 1727 (2003)

   8. Lecture WS 2004/05                 Bioinformatics III                                      18
           Protein family/human disease orthlog view
                                             Proteins are color-coded according
                                             to protein family as annotated by the
                                             GO hierarchy.

                                             Proteins orthologous to human
                                             disease proteins have a jagged,
                                             starry border.
                                             Interactions were sorted according to
                                             interaction confidence score, and the
                                             top 3000 interactions are shown with
                                             their corresponding 3522 proteins.

                                             This representation is particularly
                                             relevant to understanding human
                                             diseases and potential treatment.

                                                 Giot et al. Science 302, 1727 (2003)

8. Lecture WS 2004/05       Bioinformatics III                                      19
                                Subcellular location view
(B) Subcellular localization view. This
view shows the fly interaction map
with each protein colored by its GO
Cellular Component annotation.
This map has been filtered by only
showing proteins with less than or
equal to 20 interactions and with at
least one GO annotation.
We show proteins for all interactions
with a confidence score of 0.5 or
higher. This results in a map with 2346
proteins and 2268 interactions.

View allows annotation of subcellular
localizations, and potential function of
proteins not annotated sofar.
                 Giot et al. Science 302, 1727 (2003)

     8. Lecture WS 2004/05                       Bioinformatics III   20
                        Local interaction networks




                                Local interaction networks involved in
                                A transcription
                                B Splicing
                                C Signal transduction




                                                       Giot et al. Science 302, 1727 (2003)

8. Lecture WS 2004/05             Bioinformatics III                                      21
         Cell cycle regulation: Drosophila Skp pathway
Network surrounding the Skp
protein complex that targets
proteins to ubiquitin-mediated
proteasomal degradation.
Target proteins are recruited to
the Skp complex by F-box
proteins.
Among the Skp proteins, only
SkpA was reported to bind F-box
proteins Morgue and Slmb.




                                                        Giot et al. Science 302, 1727 (2003)

  8. Lecture WS 2004/05            Bioinformatics III                                      22
                            Local pathway views

10 examples of local pathway
views identified in the interaction
network




                                                           Giot et al. Science 302, 1727 (2003)

    8. Lecture WS 2004/05             Bioinformatics III                                      23
The functional significance of a gene is
defined by its essentiality.

A pair of non-essential genes can be
synthetically lethal (cell-death occurs
when boh genes are deleted simultaneously.

Hypothesis of ‚marginal benefit‘: many
 non-essential genes make significant
but small contributions to the fitness of the cell although the effects may not be
sufficiently large to be detected by conventional methods.


                                                           Yu et al. Trends Gen 20, 227 (2004)

   8. Lecture WS 2004/05              Bioinformatics III                                     24
                           ‚Marginal essentiality‘
Define ‚marginal essentiality‘ (M) as a quantitative measure of the importance of a
non-essential gene to a cell.

Incorporate results from diverse set of 4 large-scale knockout experiments that
examined different aspects of the impact of a protein on the fitness of a yeast cell:
(i) growth rate
(ii) phenotypes under diverse environments
(iii) sporulation efficiency
(iv) sensitivity to small molecules.

Previously findings (Jeong & Barabasi): hubs tend to be essential
(Fraser et al.) effect of an individual protein on cell fitness correlates with p(k)

Analyze data set: 4743 yeast proteins – 23294 unique interactions



                                                            Yu et al. Trends Gen 20, 227 (2004)

   8. Lecture WS 2004/05               Bioinformatics III                                     25
                                Basic analysis


Essential proteins have ca. twice
as many links as non-essential
proteins.




Essential proteins have a
shallower slope  a larger
proportion of them are ‚hubs‘




                                                         Yu et al. Trends Gen 20, 227 (2004)

  8. Lecture WS 2004/05             Bioinformatics III                                     26
                            determine hubs that are essential
‚hub‘: proteins that belong to upper 25% of proteins with most links
(a) 43% of the hubs are essential vs. 20% for random proteins
Within the network, essential proteins tend to be more cliquish and tend to be
more closely connected to eachother (see mean path length).

                                                                      (b) out-degree: transcription
                                                                      factors with many (> 100)
                                                                      targets are more essential
                                                                      than the other TFs.

                                                                      (c) in-degree: genes regula-
                                                                      ted by many TFs are less
                                                                      likely to be essential than
                                                                      genes regulated by few TFs.

                                                                      (d) Genes with more
                                                                      functions are more likely to
                                                                      be essential.

                                                              Yu et al. Trends Gen 20, 227 (2004)
    8. Lecture WS 2004/05                Bioinformatics III                                     27
                            properties of essential genes
Most essential genes are ‚house-keeping‘ genes = their expression level is much
higher and the fluctuation of their expression is much lower compared with
non-essential genes.

 The regulation of essential genes tends to have less regulation
non-essential genes often use more regulators to control the expression of gene
products.

Explanation? essential proteins perform the most basic and important functions
within the cell and always need to be switched ‚on‘.
Their expression does not need to be regulated by many factors because this
makes the essential genes dependent on the viability of more regulators, and the
cell less stable.




                                                            Yu et al. Trends Gen 20, 227 (2004)
    8. Lecture WS 2004/05              Bioinformatics III                                     28
                             Analysis of non-essential genes




The more marginally essential a protein is the more likely
- is it to have a large number of interaction partners (a, b)
- it will be closely connected to other proteins (short characteristic path length) (c)
- will it be one of the 1061 hubs (d)
                                                              Yu et al. Trends Gen 20, 227 (2004)

     8. Lecture WS 2004/05               Bioinformatics III                                     29
      Evolution of Drosophila melanogaster PI network
Various network growth and evolution models have been developed to explain
the properties of real-world networks
- small world (Watts & Strogatz)
- preferential attachment (Barabasi et al.)
- duplication-mutation mechanisms

However, often model parameters can be tuned such that multiple models of
widely varying mechanisms perfectly fit the motivating real network in terms
of single selected features such as the scale-free exponent and the
clustering coefficient.

Aim here: use a discriminative classification technique from machine learning to
classify a given real network as one of many proposed network mechanisms by
enumerating local substructures.



                                   Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05             Bioinformatics III                                30
  Analysis of Drosophila melanogaster protein interaction network
Data set: protein-protein interaction map for Drosophila by Giot et al.

Problem: data set is subject to numerous false positives.
Giot et al. assign a confidence score p  [0,1] to each interaction measuring how
likely the interaction occurs in vivo.

What threshold p* should be used?

Measure size of the components for all possible values of p*.
Observe: for p*= 0.65, the two largest components are connected
 use this value as threshold. Edges in the graph correspond to interactions for
which p > p*.

Remove self-interactions and isolated vertices
 3359 (4625) nodes with 2795 (4683) edges for p*= 0.65 (0.5)


                                    Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05              Bioinformatics III                                31
                     Network evolution models considered
Duplication-mutation-complementation (DMC) algorithm:
based on model that proposes that most of the duplicate genes observed today have been
preserved by functional complementation. If either the gene or its copy loses one of its
functions (edges), the other becomes essential in assuring the organisms‘s survival.

Algorithm: duplication step is followed by mutations that preserve functional
complementarity.
At every time step choose a node v at random.
A twin vertex vtwin is introduced copying all of v‘s edges.
For each edge of v, delete with probability qdel either the original edge or its corresponding
edge of vtwin.
Cojoin twins themselves with independent probability qcon representing an interaction of a
protein with its own copy.
No edges are created by mutations  DMC algorithm assumes that the probability of
creating new advantageous functions by random mutations is negligible.




                                         Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
    8. Lecture WS 2004/05                  Bioinformatics III                                32
                    Network evolution models considered
Variant of DMC: Duplication-random mutations (DMR) algorithm:
Possible interactions between twins are neglected.
Instead, edges between vtwin and the neighbors of v can be removed with probability qdel
and new edges can be created at random between vtwin and any other vertices with
probability qnew/N, where N is the current total number of vertices.
DMR emphasizes the creation of new advantageous functions by mutation.

Other models:
- linear preferential attachment (LPA) (Barabasi)
- random static networks (Erdös-Renyi) (RDS)
- random growing networks (RDG – growing graphs where new edges are created randomly
between existing nodes)
- aging vertex networks (AGV – growing graphs modeling citation networks, where the
probability for new edges decreases with the age of the vertex)
- small-world network (SMV – interpolation between regular ring lattices and randomly
connected graphs).



                                      Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05                Bioinformatics III                                33
                                Training set
Create 1000 graphs as training data for each of the seven different models.

Every graph is generated with the same number of edges and nodes as
measured in Drosophila.

Quantify topology of a network by counting all possible subgraphs up to a given
cut-off, which could be the number of nodes, number of edges, or the length of a
given walk.

Here: count all subgraphs that can be constructed by a walk of length=8 (148 non-
isomorphic subgraphs) or length=7 (130 non-isomorphic subgraphs).

Use these counts as input features for classifier.
Note that the average shortest path between two nodes of the Drosophila
network‘s giant component is 11.6 (9.4) for p*=0.65 (0.5).
 Walks of length=8 can traverse large parts of the network.
                                  Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05            Bioinformatics III                                34
          Learning algorithm: Alternating Decision Tree
Rectangles: decision nodes.

A given network‘s subgraph counts
determine paths in the tree dictated
by inequalities specified by the
decision nodes.

For each class, the ADT
outputs a real-valued
prediction score, which is the
sum of all weights over all paths.
The class with the heightest score
wins.




                                     Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05               Bioinformatics III                                35
                           Performance on training set
The confusion matrix shows truth and prediction for the test sets.
5 out of 7 have nearly perfect prediction accuracy.

AGV is constructed as an interpolation between LPA and a ring lattice
 the AGV, LPA and SMW mechanisms are equivalent in specific parameter
regimes and show a non-negligible overlap.




                                    Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05              Bioinformatics III                                36
                           Discriminating similar networks




Ten graphs of two different mechanisms exhibit similar average geodesic lengths
and almost identical degree distribution and clustering coefficients.
(a) cumulative degree distribution p(k>k0), average clustering coefficient <C> and
average geodesic length <L>, all quantities averaged over a set of 10 graphs.
(b) Prediction score for all ten graphs and all five cross-validated ADTs. The two
sets of graphs can be perfectly separated by the classifier.


                                      Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05                Bioinformatics III                                37
          Learning algorithm: Alternating Decision Tree
Figure shows the first few descision
nodes (out of 120) of a resulting ADT.
The prediction scores reveal that a
high count of 3-cycless suggest a
DMC network.

DMC mechanism indeed
facilitates creation of many
3-cycles by allowing 2 copies
to attach eachother, thus
creating 3-cycles with their common
neighbors.

A low count in 3-cycles but a high
count in 8-edge linear chains is a
good precictor for LPA and DMR
networks.                            Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05               Bioinformatics III                                38
        Prediction for Drosophila melanogaster network
Use this classifier (ADT) with good prediction accuracy now to determine the
network mechanism that best reproduces the Drosophila network (or any network
of the same size).
Prediction scores for the Drosophial protein network for different confidence
threshold p* and different cut-offs in subgraph size. Drosophial is consistently
classified as a DMC network, with an especially strong prediction for a
confidence threshould of p*=0.65 and independently of the cut-off in subgraph
size.




                                  Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05            Bioinformatics III                                39
                           Visualization of subgraphs
A qualitative and more intuitive
way of interpreting the
classification result is
visualizing the subgraph
profiles.

Subgraphs associated with
Figures 3 and 1.
A representatie subset of 50
subgraphs out of 148 is
shown.




                                   Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05             Bioinformatics III                                40
                                 Subgraph profiles
The average subgraph count of the
training data for every mechanism is
shown for the 50 representative
subgraphs S1-S50.
Black lines indicate that this model is
closest to Drosophila based on the
absolute difference between the
subgraph counts.

For 60% of the subgraphs (S1-S30),
the counts for Drosophila
are closest to the DMC model.
All of these subgraphs contain one or
more cycles, including highly                            The DMC algorithm is the only
connected subgraphs (S1) and long                        mechanism that produces such
linear chains ending in cycles (S16,                     cycles with a high occurrence.
S18, S22, S23, S25).


                                          Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
    8. Lecture WS 2004/05                   Bioinformatics III                                41
                                   Robustness against noise
Edges in Drosophila network are randomly
replaced and the network is classified.
Plotted are prediction scores for each of the 7
classes as more and more edges are replaced.
Every point is an average over 200 independent
random replacements.

For high noise level (beyond 80%), the
network is classified as an Erdös-Renyi (RDS)
graph. For low noise (< 30%), the confidence
in the classification as a DMC network is even
higher than in the classification as an RDS
network for high noise. The prediction score
y(c) for class c is related to the estimated
probability p(c) for the tested network to be in
class c by                   e 2 y c 
                        p c  
                                   1 e 2 y c    Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
     8. Lecture WS 2004/05                           Bioinformatics III                                42
                                Conclusions
Very nice (!) method that allows to infer growth mechanisms for real networks.

Method is robust against noise and data subsampling,
no prior assumption about network features/topology required.

Learning algorithm does not assume any relationships between features
(e.g. orthogonality). Therefore the input space can be augmented with various
features in addition to subgraph counts.

The protein interaction network of Drosophila is confidently classified as DMC
network.




                                   Middendorf et al., DOI: q-bio.QM/0408010, arXiv, 2004/08/15
   8. Lecture WS 2004/05             Bioinformatics III                                43

								
To top