Document Sample
ppi Powered By Docstoc
					Evolutionary core of a protein interaction
network in Plasmodium falciparum

S. Wuchty1, A.-L. Barabasi2, M.T. Ferdig3 and J. H.


1 Northwestern Institute of Complexity, Donald P. Jacobs Center, Northwestern
University, 2001 Sheridan Road, Evanston, IL 60208, 2 Department of Physics,
225 Nieuwland Science Hall, University of Notre Dame, Notre Dame, IN 46556, 3
Department of Biological Sciences, Galvin Life Sciences Building, University of
Notre Dame, Notre Dame, IN 46556.

To whom correspondence should be addressed: E-mail:

Keywords: malaria; Plasmodium falciparum; interactome; gene expression.


Running head: Interactome of malaria parasites
         The topology of biological networks reflects functional and evolutionary
relationships among the constituents of a cell. In particular, protein interaction networks
of different organisms though distinct share underlying evolutionary relationships that
allow the theoretical inference of protein interactions. Modern bioinformatic methods
can reconstruct these complex underpinnings of novel network architectures of non-
model organisms, such as the devastating pathogen Plasmodium falciparum. Knowledge
about the architecture of the prime networks of malaria parasites will enable us to
understand the unique ways this important pathogen evolved to evade the hostile
responses of the host’s immune system and to cope with the intracellular environments in
host tissues. Utilizing protein interactions of S. cerevisiae and matching orthologous
proteins in P. falciparum enables us to assess the probability that a link among yeast
proteins has been conserved in malaria parasites by evaluating an interaction’s degree of
local clustering of orthologous proteins. Characterizing the evolutionary core of
interactions, we elucidate the underlying modular structure of the P. falciparum
interactome shared with yeast to find that primary clusters contain core activities
governing gene expression. In particular, we observe that exosome-proteasome functions
are enriched in the inferred core of the P. falciparum interactome, specifying these
pathways as key mechanisms controlling gene expression. These findings emphasize the
role of cis elements and the corresponding trans RNA binding proteins as fundamental
regulatory components for control of P. falciparum gene expression and indicate a
critical role for post-transcriptional mRNA stability. In addition, we find that activities of
the proteins of the core network remain cohesive within developmental stages. Nearly all
functions of the core network occurred in the ring or the schizont phase while,
remarkably, almost no expression activity of evolutionarily conserved interactions is
observed in the trophozoite phase, suggesting that genes expressed during this period of
parasite development result from evolutionarily adaptations distinct from the yeast
lineage. The identification of this particular core of interactions potentially can lead to
understanding of pathways and processes of the parasite, illuminate basic biological
functions and, consequently, prioritize targets for therapeutic intervention.
         An important challenge confronting modern biology is whether the wealth of
information accruing from the study of model organisms can be applied to the pervasive,
intractable microbial diseases that plague human kind. In particular, the global burden of
malaria continues to worsen in many developing countries with a devastating impact on
human health and corresponding impediment to economic improvement (Snow et al.
2005). Recent sequencing efforts yielded extensive annotations of the Plasmodium
falciparum genome as well as several other malaria parasites (Gardner et al. 2002a; Hall
et al. 2002; Hyman et al. 2002; Bozdech et al. 2003a; Le Roch et al. 2003). Despite this
abundance of primary genomic and proteomic information, little is known about the web
of the protein interactions that governs the unique biology of malaria parasites,
illustrating the need for novel comparative tools to tap the information hidden in the
genomic sequence. Tools that can facilitate searches of massive genome databases for
evolutionary relatedness are now powerfully employed to explore gene function in a few
model organisms. It is becoming increasingly evident that evolutionary information is
conserved at higher orders of genome organization as clarified by the recent proliferation
of studies of protein network topologies and their biological implications (Wuchty et al.
2003; Han et al. 2004; Li et al. 2004). Since network architecture contains fundamental
information about the complex web of cellular and sub-cellular components, knowledge
of the web of protein interactions facilitates our understanding of an organism’s basic
biology. Knowledge of the P. falciparum interactome will provide a valuable insight into
the global protein interaction network, which underlies parasite-specific organization of
protein-protein interactions and will ultimately help to determine factors that contribute
to the lethally efficient biology of this parasite.
         The challenges of working with this non-model organism led us to find an
informatic approach to determine basic Plasmodium-specific protein interactions.
Comparison of comprehensive data sets of interactions of network topologies in a few
model organisms’ networks suggests that a few organizing principles give rise to the
emergence of complex protein web structures in the malaria parasite as well (Barabasi
and Oltvai 2004). Therefore, we wondered if this abundant information about protein
interaction webs from well-studied model organisms could be used to drive the inference
of an evolutionarily conserved core set of protein interactions in P. falciparum. Known
protein interaction networks feature a small number of highly connected proteins, that
secure the integrity and connectivity among modules despite the tremendous differences
in the organism’s developmental cycle (Jeong et al. 2001; Han et al. 2004). The critical
role of highly connected nodes is demonstrated by their elevated propensity to be
essential for survival, and by their evolutionary conservation (Jeong et al. 2001; Wuchty
2002, 2004). This focus on linkages among proteins leads to a higher-order network
view in which the presence of conserved modules or bundles of cohesively bound nodes
indicate archetypal patterns of evolutionary building blocks (Wuchty et al. 2003), a
blueprint reinforced by the tendency of the modules to be co-expressed (Ge et al. 2001).
Because evolutionary signal is strongly retained in the network topology, we tapped this
information to better understand the interactome of P. falciparum.
         A possible obstacle to this approach is the severe error-proneness of experimental
methods for the determination of protein interactions, possibly limiting the integrity and
usefulness of existing protein interaction sets. A recent estimate of the accuracy of
protein interactions in S. cerevisiae uncovered a startling false negative rate of 90%, and
a 50% false positive error rate (von Mering et al. 2002). Fortunately, this experimental
noise can be overcome, since the topology of a protein interaction network contains
information to assess an individual link's quality. Using a link-based clustering
coefficient that reflects the degree of clustering of an interaction’s immediate
neighborhood, Goldberg and Roth identified pronounced correlations between local
clustering and the actual presence of a protein interaction in S. cerevisiae (Goldberg and
Roth 2003). Independent observations that well-embedded interactions were highly
reliable and their protein constituents were preferentially conserved allows us to propose
that a significant and direct correlation exists, providing a rational basis to use
orthologous information for the inference of protein interactions in another species.
        The method we introduce here aims to access the information in the network
structure that has been retained between yeast and malaria parasites. Identification of this
cohesion will highlight fundamental, conserved modular units operating in P. falciparum,
that, when overlaid with high-resolution Plasmodium-specific transcriptional profiles
(Bozdech et al. 2003a), will point to basic properties and their prominence at various
stages throughout the Plasmodium life cycle. The availability of transcriptional data is a
third and key dimension (in addition to primary sequence identity and network topology)
that effectively superimposes the dynamic nature of the living cell onto the static inferred
protein network, allowing us to parse out interactions as they relate to biological activity
of the modules in the malaria parasite. Given a comprehensive set of orthologs, a
framework that incorporates network architecture represents a novel means to elucidate
evolutionary relevant protein interaction networks in a targeted organism. Therefore, we
have developed a method that can identify a core set of protein interactions of organisms
for which extensive genome annotation is available, but no comprehensive protein
interaction data yet exist. These methods could serve as the starting point of future
systematic experimental investigations of the interactomes of other species that lack
direct protein interaction data.

Results and Discussion

Stability of evolutionary signals
        Among the continuously growing pool of organisms for which experimentally
derived protein interactions are currently available, the interactome of S. cerevisiae
remains the most finely characterized and validated. Utilizing a set of high quality
interaction data compiled by Han et al., we pool 1,330 proteins that are involved in a total
of 2,448 manually curated interactions.       Despite their phylogenetic differences, S.
cerevisiae and P. falciparum have a considerable number of identifiable orthologous
proteins. Utilizing the InParanoid scripts (see Methods) for the determination of
orthologous groups of proteins, we find 856 yeast proteins with an ortholog in P.
falciparum. Investigating the possibility that highly clustered links also contain an
evolutionary signal, we measure the local clustering around a link’s immediate network
neighborhood by determining the simple hypergeometric clustering coefficient Cvw
(Goldberg and Roth 2003) for every interaction in the yeast protein network (see
Methods). Logarithmically binning the interactions according to their Cvw, we determine
the excess retention (Wuchty 2004) of the orthologous proteins in P. falciparum that
constitute these groups of interaction. If topology did not carry an evolutionary signal,
the propensity of interacting proteins to be evolutionary conserved would not scale with
the degree of local clustering around a specific interaction. Instead, we observe a
significant and positive correlation between the simple hypergeometric clustering
coefficients and evolutionary excess retention of proteins that constitute the
corresponding interactions (Fig. 1a).

Co-expression of Conserved Protein Interactions
        Searching for preferential co-expression of these protein pairs we determine
Pearson's correlation coefficients rP from a comprehensive set of Plasmodium specific
co-expression data (Bozdech et al. 2003a). Logarithmically binning all protein
interactions in yeast according to their hypergeometric clustering coefficient Cvw and
determining the mean coexpression correlation in each bin, we observe that the
distribution of protein pairs that are fully conserved shows a clear and significant positive
trend. Conversely, the distribution as maintained by interacting proteins without
orthologs in Plasmodium shows an indifferent behavior (Fig. 1b). Subsuming the
coincidence of high local clustering around conserved interacting proteins which also
display a high propensity to be coexpressed in yeast we argue that a combination of these
observations can reliably infer protein interactions (Fig. 1c). Since the data set of high
quality interactions we used for the preliminary analyses is rather small, we have inferred
interactions in Plasmodium from a larger protein interaction set in Yeast. Utilizing a large
but noisy data set of Yeast as of the DIP database (Xenarios et al., 2002) that combines
3,833 proteins involved in 11,942 interactions, we apply a logistic regression method as a
classification scheme to assess the quality of the newly added interactions. In particular,
we feed the model with a positive and negative training set that features the parameters of
hypergeometric clustering coefficient Cvw and coexpression correlation rP of interactions.
As a classification parameter we label each interaction if its corresponding proteins have
orthologs in Plasmodium. A crucial point in the set up of such a model is the choice of
appropriate training sets. As for the choice of a positive training set we focus on our
observations in Fig. 1a, suggesting that there exists a useable correlation between Cvw and
rP when the corresponding interacting pairs of proteins are both conserved. In turn,
distributions of interactions in which proteins are not conserved fail to provide a clear
correlation. Enabling a reliable classification of interactions, there exist many ways to
determine positive and negative trainings set that. Here, we find the best results by
randomly choosing 500 interacting pairs of proteins as positive training set that both have
orthologs in Plasmodium. In turn, as a negative control counterpart we randomly choose
500 from the pool of non conserved interactions. Applying a leave-one-out strategy to
assess the prediction accuracy we obtain up to 99% exact predictions and apply this
trained model on the set of yeast interactions, as of the DIP database, which are
conserved in Plasmodium, thereby allowing us to pool 377 proteins and 630 interactions.
We observe that the distribution of the interactions that are classified as correct
significantly accumulate around high values of the co-expression coefficient while non-
correct links distribute much more evenly (Fig. 1d). Accounting for interactions that
appear between conserved proteins in Plasmodium, we find a strong shift toward co-
expression of the correctly classified interactions in the target organism (Fig. 1d, inset).

Modular structure
         To determine functional modules in the obtained network, we utilize the Markov-
Clusterin-Algorithm (MCL) (Enright et al, 2003), which makes use of the topological fact
that nodes which are embedded in well inter-connected parts of a network share most of
their links with nodes of the same cluster, while a small fraction of links connect remote
clusters. Therefore, we expect that a random walker will predominantly travel within a
cluster and jump to other ones only sporadically. Mathematically, we construct a k x k
dimensional matrix M, where Mij = wij, and wij is the weight of the interaction between i
and j. For our case, we applied the function wij = 1 + rij, where rij is the correlation
coefficient of interaction ij, to avoid the occurrence of negative weights. This adjacency
matrix is a stochastic matrix; each entry of this particular matrix tells the probability that
a random walker will take edge ij to neighbor j if beginning from node i. Alternatively,
we (i) expand by matrix multiplication (i.e. matrix squaring) and (ii) renormalize by an
inflation procedure resulting again in a stochastic matrix (see Methods). This process of
alternating inflation and expansion is repeated until the resulting stochastic matrix takes
the form of a doubly [idempontent?] matrix, i.e it does not change with further
inflation/expansion cycles, leading to final matrix composed of several connected
components, i.e. the sought clusters. The MCL algorithm has a tuneable[?] parameter, the
exponent of the inflation parameter r. To get an assessment of the clustering’s quality we
applied recently introduced functional modulation measure (Marcotte :04)[lit].
Essentially, in each module we obtain from the markov-clustering procedure we count
the occurrence of protein pairs that share the same annotation as of the Gene Ontology
(GO), a fraction that is balanced by the size of each cluster (see Methods).

We find 134 clusters, many of them as disconnected components of the underlying
network. As shown in Fig. 2, most clusters are rather small, while a minority of clusters is
larger in size, indicating the presence of a power-law shaped tail in the frequency
distribution of cluster sizes (data not shown). Noteably, we find a variety of larger
clusters that represent functionally consistent accumulations of nodes, as indicated by
shades. Predominantly, we find proteasome components, ribosomal proteins such as
snRNPs, a large group of translation initiation factors and related proteins, exosome and
spliceosome components, replication factor c, DNA related functions such as DNA
polymerase and helicase subunits, ribosomal proteins and translation factor subunits.
Nearly all of the identified clusters display a relatively high degree of co-expression (Fig.
2), where the co-expression patterns of intramodular links closely reflect the nature of
underlying clusters. For example, the cluster that harbors proteins that play a dominant
role in translation show a clear tendency for its interactions to be active at distinct points
in the cell cycle. This observation supports earlier findings that varying environmental or
cellular conditions imply changes in the activity of protein-protein interactions (Han et al.
2004), regulation webs (Luscombe et al. 2004), and metabolic paths (Almaas et al. 2004).
We find similar patterns for replication factors and proteins that are involved in the
exosome and chromosome condensation. From a systemic point of view, our network
representation allows us to recover the stage-specific expression of protein complex
constituents. For example, the subunit complements of ATP-synthase and RNA
polymerase are necessarily co-expressed to ensure functional protein complexes. In
contrast, translation comprises a series of distinct protein activities, operations that are
occur sequentially. So, the clusters we found in this evolutionary core network either
associate with large protein complexes that consist of simultaneously transcribed,
physically interacting proteins (e.g., exosome) or proteins that belong to a spectrum of
complex multi-component processes (e.g., transcription and translation).

Special clusters of the Interactome Implicated in Control of Gene Expression
        In metazoan organisms transcription factors play a dominant role in the control of
gene expression. A lack of many identifiable transcription factors in the Plasmodium
genome is a conundrum in light of the apparent tight regulation of P. falciparum gene
expression, which is coordinated through a continuous cycle of development (Blair et al.
2002a; Blair et al. 2002b; Le Roch et al. 2002; Bozdech et al. 2003a; Le Roch et al.
2003). Similar to higher eukaryotes, initial steps of gene activation occur through
chromatin remodeling by SWI/SNF complexes and histone acetylation (Aravind et al.
2003; Duraisingh et al. 2005; Freitas-Junior et al. 2005; Ralph et al. 2005), followed by
promoter recognition mediated through a TATA binding protein ortholog. Although it is
possible that Plasmodium organisms do possess elaborate sets of transcription factors that
simply have not been identified because they lack known orthologs, it is unlikely. In the
absence of regulators of initiation of transcription after chromatin remodeling, previous
studies have argued that post-transcriptional regulation may play the dominant role in
controlling gene expression in malaria parasites (Carlton et al. 2002; Gardner et al.
2002b; Gardner et al. 2002a). The exosome is one of the most important protein
complexes for post-transcriptional RNA processing in eukaryotic cells (Butler 2002;
Haile et al. 2003; Raijmakers et al. 2004). In our elucidated network, the exosome scores
high as a cohesive cluster and is identified as a prime component of the P. falciparum
interactome (Fig. 2).
        The components of the exosome include 3’5’ exonuclease functions in both the
nucleus and in the cytoplasm, along with associated proteins that coordinate exonuclease
activity for specific functions in each of the cellular compartments (Murase et al. 1993;
Koonin et al. 2001; Butler 2002; Mukherjee et al. 2002; Raijmakers et al. 2002; Haile et
al. 2003; Lehner and Sanderson 2004). Functions in the nucleus include trimming the 3’
ends of the rRNA subunits and mRNA surveillance for incorrect splicing or
polyadenylation. [RRP6 (PF14_0473) and Mtr4 (PFF0100w,)does not exist anymore)]
Are exosome factors necessary to regulate this nuclear-specific processing and
degradation of rRNA and pre-mRNA, respectively? Chaperone proteins that facilitate
movement through nuclear pores control access to the nucleus for the components of the
exosome and most other nuclear proteins, which may be more important in organisms
like P. falciparum that maintain a nuclear envelope throughout its development cycle.
Reflecting this critical role in regulating parasite growth, importin alpha (PF08_0087) is
one of the most highly connected hubs in this network. This protein is the primary
chaperone protein for controlling protein import into the nucleus in association with
importin beta (PF08_0069) and RAN (PF11_0183) NTF2 (PF14_0122), all present in this
network (Fig. 2).
        The unexpectedly high number of P. falciparum proteins with RNA-recognition
motifs (Aravind et al. 2003) further supports post-transcriptional processing as critical for
controlling gene expression. Two surveillance mechanisms, nonsense mediated decay
(NMD) and AU-rich elements (ARE), are generally important for controlling mRNA
stability in the cytoplasm of eukaryotes (Wilusz et al. 2001; Lehner and Sanderson 2004).
The NMD is a general decay mechanism that targets aberrant transcripts that are
incorrectly spliced or lack a stop codon, and ARE are gene-specific cis regulatory
elements typically important in stage-specific degradation or stabilization of mRNA.
Only recently has evidence emerged that ARE’s are potentially important in regulating
gene expression in Plasmodium (Hall et al. 2005), although there is strong evidence that
ARE have an integral role regulating gene expression in other parasitic protozoa (Haile et
al. 2003). In most eukaryotic cells, mRNA degradation begins by 3’ deadenylation
followed by 5’ decapping, then degradation is completed by either 5’3’ (yeast) and
3’5’ exonucleolytic pathways (metazoans) (Wilusz et al. 2001). It is not clear how
mRNA degradation proceeds in Plasmodium, since there is no clear homologue to yeast
DCP (pfam06058). Based on our network interactions the exosome 3’5’ pathway
emerges as a major exonucleolytic decay mechanism similar to higher eukaryotes.
Additional RNA decay pathways, such as LSM complex, may carry out 5’3’
degradation although this would be unusual to act independently of DCP. The relative
lack of the typical cellular components attacking 5’ mRNA while the retaining
3’exonuclease activity, as well as nonsense mediated decay mechanisms, suggests that
regulation of mRNA stability through poly-A nuclease complexes has the dominant role
in controlling mRNA degradation and gene expression.
         Exonuclease activities are likely to be coordinated through stage-specific
recognition of AU-rich elements in the 3’UTR, such as suggested for gametocyte-specific
gene expression (Golightly et al. 2000; Cann et al. 2004; Shue et al. 2004; Hall et al.
2005). This mechanism is recognized in a number of organisms as an important fast-
response mechanism regulating mRNA degradation or stability in a stage-specific manner
through recognition of RNA binding proteins (Mukherjee et al. 2002; Vasudevan et al.
2002; Duttagupta et al. 2003; Haile et al. 2003). Such coordinated functions of RNA
degradation and protein degradation coupled with translation and transcription,
respectively, reflects the evolutionary conserved nature of the proteasome-exosome
complexes (Koonin et al. 2001) and corroborates the highly linked hubs in the elucidated
P. falciparum network. When ARE-containing transcripts are stabilized by ARE-binding
proteins, such as HuR, up-regulation of a ubiquitin-dependent proteasome mechanism
leads to rapid destabilization of the ARE-mRNA (Laroia et al. 2002). Although this
mechanism is still poorly characterized, other ARE-binding proteins can exert the
opposite effect (e.g., AUF1) and their abundance is regulated by ubiquitin-dependent
proteasome degradation (D'Orso and Frasch 2002; Moraes et al. 2003; Donnini et al.
2004; Mawji et al. 2004). Based on the prominent role of the proteasome-exosome roles
in the P. falciparum interactome, we propose that this mechanism is a ubiquitous
regulator of gene expression and not just important in sexual stage gene regulation. For
example, a consensus AU-rich element (WWAUUUAUUUAWW) is evident in the
transcript for EBA175, a merozoite protein whose expression is tightly controlled (Blair
et al. 2002b; Bozdech et al. 2003b; Le Roch et al. 2003). Expressed at high levels only at
the end of intraerythrocytic development, the transcripts of eba175 rapidly disappear
once this stage completes development. Consistent with this interpretation of the
exosome-proteasome in regulating gene expression, AREs also are considered to be
important elements in governing gene expression in the Kinetoplastida, an evolutionarily
distant family of parasitic protozoa (Estevez et al. 2001; Haile et al. 2003)(D'Orso and
Frasch 2002; Milone et al. 2002).

Characterisation of Nodes
        As an outcome of the algorithm, the final partition of clusters also allows a
characterization of nodes with respect to their roles in connecting and/or facilitating the
integrity of the networks modules. The intra-modular degree Z of a node i reflects its
well-connectedness among nodes in the same module. In Fig. 3a, we observe that
proteins of the inferred protein interaction network in Plasmodium tend to increase their
Z with ascending degree k. We utilize these measures for a heuristic classification
according to their placement in the k-Z space. As such, we define nodes that have at least
5 neighbors as hubs (Han et al, 2004). In particular, we refine this definition by labeling
hubs with a Z that scores above 0 as party hubs because most of a hubs links accumulate
in its given cluster. In contrast, the remainder represents the set of date-hubs, reflecting
the hubs propensity to connect different clusters.

         Determination of the mean co-expression level of all interactions in which a
given node participates (Fig. 3b) suggests that hubs tend to be largely co-expressed with
their interaction partners, as indicated by strong signal around rP = 0.9. However, for
date hubs, we also find a considerable strong signal around 0.5. Furthermore, we
surprisingly find another peak around -0.5 and -0.7 for party hubs which indicates the
presence of an inhomogeneity in the expressed neighborhood. Recently, Han et al.
suggested a distinction of hubs according to the level of co-expression with their
interaction partners (Han et al. 2004). A hub that was predominantly co-expressed with
its neighbors was called a party hub. In contrast, party hubs have an inhomogeneous
expression profile, indicating that they are predominately active with their neighbors at
different times (Han et al., 2004). In quantitative terms, nodes were distinguished
according to the averaged co-expression correlations with their neighboring nodes,
allowing them to find peaks around 0.5 (party hubs) and 0.1 (date hubs) in frequency
distributions. We do not observe such properties of date and party hubs in our network.
In fact, we find that the transition from a party to a date hub in terms of a node’s
propensity to participate mainly in a cluster and vice versa is surprisingly smooth. In the
same way, we find slight differences in the distribution of mean fractions of shared GO
annotations. While date hubs tend to have a more homogeneous distribution of functional
neighbors we observe a slight deviation toward functionally inhomogeneous interaction
partners of hubs (inset, Fig 3b)., an observation that meets our expectations. Specifically,
we argue that party hubs share similar functionality with their neighborhood since their
simultanenous activity indicates the presence of a functional cluster. In turn, date hubs
whose neighbors have a scattered activity profile might connect and integrate different
functions in a timeframe. To underline this point, we find in a list of the 50 highest
connected nodes (Table 1) numerous regulatory subunits of the 26S proteasome, which
have been classified as party hubs. These are generally coexpressed, suggesting that the
functional homogeneous apparatus that facilitates protein degradation acts as a timely and
spatially integrated unit. In contrast, we find translation initiation factor EIF-2b
(PF08_0009), which is largely anti-expressed with its neighbors (rP = -0.4), though
classified as a party hub, an observation supported by reports that translation factor
related proteins are active in many aspects of the cell-cycle. These and other hubs can be
found in Table 1 (Think this part needs work, any suggestions….the relevance must be
stated clearly.

Time Dependant Protein Interactions
        Malaria parasites’ transcriptional activity has the superficial appearance of a
continuous cascade that masks the coordinated co-expression of functionally linked
protein interactions. We have deduced the pattern of conserved co-expression in our
network of core interactions to discover that the subnet modularity is time-dependent.
Highly interactive proteins representing major hubs in the parasite’s metabolic activity
vary with parasite development. Plasmodium proteins sort according to their maximum
expression in the parasite’s cell cycle (Fig. 4) group according to their appearance in the
three major stages of asexual development in the erythrocytic stage: ring, trophozoite,
and schizont. Activities of the proteins of the core network remain cohesive within
developmental stages, but somewhat surprisingly the separate clusters are highly
segregated in time during development. As the parasite progresses through its
intraerythrocytic growth cycle the flow of metabolic activity changes course and alters
protein interactions within the network. Nearly all activity of the core network occurred
in the ring or the schizont phase while remarkably almost no expression activity is
observed in the trophozoite phase, suggesting genes expressed during this phase of the
Plasmodium life cycle evolved as a result of evolutionarily recent adaptations. In the ring
stage parasites the major clusters are involved with gene expression, metabolism and
cellular transport. In the late-stages, the dominant hubs of the cellular network are the
components of the proteasome and chromosome condensation, reflecting the parasite’s
need to discard the trash and pack its genes for travel. The absence of any major
evolutionarily conserved hubs during trophozoite developmental stage indicates that
Plasmodium genes expressed during this time (i.e., lack an orthologue in yeast) are those
metabolic activities that have evolved as the organism evolved its parasitic life style.
This provides an important evolutionary insight, indicating that protein clusters involved
in inter-cluster interactions tend to co-evolve.

        Here we provide the first biological characterization of a protein interaction
network of Plasmodium that is completely inferred from the model organism yeast with
out the knowledge of any experimental observation of interactions in questions.
Although our process of obtaining interactions is a theoretical one, our qualitative
analysis not only indicates that interactions among orthologous proteins are potentially
conserved as well. Our approach recovers evolutionary and transcriptionally relevant
entities underlining the prevalence of organizational structures that persisted in
evolutionary time. It is important to note that because we did not identify an interaction
in our inferred network this does not necessarily imply its absence. Expected-but-
missing parts of modules or corresponding module interactions may represent unique
divergences that distinguish the network structure of the malaria parasites from yeast.
Such divergences may be especially interesting since it has been shown that the rate of
evolution may be focused at the connections between modules (Guimera and Nunes
Amaral 2005). Therefore, even though the modules themselves are highly conserved
units, unique Plasmodium-specific links between modules may well highlight critical
features of the parasite that can be exploited as therapeutic targets. We expect that
further analysis of the inferred network in Plasmodium will uncover biologically
significant information about novel interactions between the conserved clusters. Another
type of novelty in the parasite interactome will involve absence of conserved clusters and
their interactions. This is seen where known yeast partners are missing their orthologous
partners in the Plasmodium, which represent either a loss of function or acquisition of
novel cluster evolved for the metabolic needs of the malaria parasite.
        Several extensions of our method can be envisioned to benefit from the
proliferation of whole-genome databases that continuously enrich the power of network
inference. Our initial elucidated network for malaria parasites can be strengthened by
deeper searches for orthologs using all Plasmodium species to query the non-redundant
database. Similarly, a concatenated network comprising all known (and validated)
interactions across the tree of life will be an important tool to recognize distant
phylogenetic relationships with yeast and other organisms for which refined network data
exist. More functional relationships can be elucidated within the dimensions of malaria
genome expression by profiling transcription, at much higher resolution under a variety
of conditions of cellular life (e.g. perturbations, and strain-specific variants). Eventually
we expect to construct a comprehensive phylogenetic scaffold of networks using the
methods developed in this project onto which new protein-protein interaction data can be
placed. Since there is still considerable noise in the protein interaction data available
from current technologies, such as from yeast two-hybrid determinations, a universal
scaffold will be a powerful tool to interpret proposed interaction data.

Materials and Methods
        Protein Interactions: As a reliable source of protein interactions we chose the
manually curated set of yeast interactions presented in (Han et al., Nature 2004), allowing
for 1,330 nodes and 2,448 interactions. Since the network obtained here is rather small,
we also utilized yeast orptein interaction data from the DIP database (Xenarios et al.,
2002). The current version contains 3,833 proteins involved in 11,942 interactions
derived from combined, non-overlapping data which are mostly obtained from the high-
throughput application of the two-hybrid method.

        Orthologous Data: Utilizing all-versus-all BLASTP searches determined by the
InParanoid script (Remm et al. 2001) in protein sets of two species, sequence pairs with
mutually best scores were selected as central orthologous pairs. Proteins of both species
showing an elevated degree of homology were clustered around these central pairs, a
procedure that forms orthologous groups. The quality of the clustering was then assessed
by a standard bootstrap procedure. The central orthologue sequence pair that provides a
confidence level of 100% was considered as the real orthologous relationship while
proteins with a lower level of confidence were considered as their in-paralogues. In our
study, we selected only the central orthologue sequence pairs of each group, resulting in
856 proteins with orthologues in P. falciparum.
        Orthologous Excess Retention: According to their hypergeometrical clustering
coefficient C of the interactions they are involved in, we grouped all proteins in bins of
logarithmically increasing C. For each group of NC proteins, the fraction of proteins that
also have an ortholog is defined as eC,o = nC,o/NC. In the absence of a correlation between
evolutionary conservation of interacting proteins and their position in the network, eC,o
has the general C-independent value e = no/N, where no is the total number of yeast
proteins having an ortholog, and N is the total number of yeast proteins in the underlying
network. Thus, we define the clustering-dependent excess retention of such proteins as
ERC,o = eC,o/eo, which has the C-independent value ERC,o = 1 for a random distribution of
orthologous proteins (Wuchty 2004).

        Hypergeometric Clustering Coefficient: Recently, a network topology based
approach uncovered a remarkable correlation between enhanced quality of protein
interactions and the degree of clustering of their immediate network neighborhood
(Goldberg and Roth 2003). Considering a protein-protein interaction network with N
nodes, we define the hypergeometric clustering coefficient as where N(x) represents the
neighborhood of a vertex x and N is the total number of proteins in the network. Given
fixed neighborhood sizes N(v) and N(w) of proteins v and w, the hypergeometric
clustering coefficient increases with elevated overlap between the protein's
neighborhoods. Provided that the neighborhoods are independent, the summation can be
interpreted as a P-value, reflecting the probability of obtaining a number of mutual
neighbors between proteins v and w at or above the observed number by chance
(Goldberg and Roth 2003).

        Intramodular degree: While the degree k alone is a very local measure of a
node’s connectivity, the role of a node in a module can be assessed by its intramodular
degree. If i is the number of intramodular links of node i while s is the mean number
of intramodular links in module s, we define the intramodular degree as the Z-score of i:

                                                i   s
                                         Zi 
reflecting the degree of a nodes well-connectedness to other modules (Guimera and
Nunes Amaral 2005).

         Expression coefficients: Genes with similar expression profiles are likely
encoding interacting proteins. For P. falciparum, we utilized gene expression data,
compiling 4,318 genes over 48 timepoints (Bozdech et al. 2003a). As a gene similarity
metric we calculated Pearson’s correlation coefficient for every protein interaction. For
yeast, we downloaded data sets of 1,051 different experiments from the Stanford
Microarray database (ftp://), and calculated Pearson’s correlation coefficient for every
protein interaction, accordingly.

        Cell-cycle specific expression data: By determining their maximum expression
in the parasites cell cycle), we grouped the proteins of P. falciparum according to their
appearance in the three major stages of asexual development in the erythrocytic stage:
ring, trophozoite, and schizont (Bozdech et al. 2003a).

        Logistic regression: In order to get an estimate of an interactions reliability, we
employed a logistic regression model. According to the logistic regression, the
probability of a true interaction Tvw given the two input variables hypergeometric
clustering coefficient x1 = Cvw and correlation coexpression x2 = rP, X = (x1, x2)

                                             exp(  0   1 x1   2 x 2 )
                          Pr(Tvw | X ) 
                                           1  exp(  0   1 x1   2 x 2 )

where n are the parameters of the distribution. Given training data we optimized the
distribution parameters by maximizing the likelihood of the data. Here, applied the
corresponding routines as of the Biopython package ( , where we
randomly chose 500 high quality protein interactions as of Han et al as true positives if
they were fully conserved in Plasmodium as positive examples. Because of the vast
abundance of false positives we randomly selected 500 intercation as of the DIP data set
as negative examples. Evaluating the prediction accuracy of the our model we found from
a leave-one-out analysis, in which the model is recalculated from the training data after
removing the interaction to be predicted, allowing us to achieve a 70% accuracy.

        Markov-Cluster-Algorithm (MCL): In order to uncover the community
structure of the inferred core protein interaction network of Plasmodium, we utilized a
Markov-Cluster-Algorithm (MCL) (Enright et al.) designed specifically for
computational graph clustering. Topologically , nodes which are embedded in well inter-
connected parts of a network share most of their links with nodes of the same cluster,
while a small fraction links connect remote clusters. Hence, we expect that a random
walker will predominantely travel within cluster and jump to other ones sporadically.
        Mathematically, an undirected network consist of k nodes that can be represented
as a k x k dimensional matrix M, where Mij = wij, where wij is the weight of the
interaction between i and j. In our case, we applied the function wij = 1 + rij, where is the
correlation coefficnet of interaction ij, in order to avoid the occurrence of negative
weights. Due to convergence reasons for this algorithm we introduce self-loops on each
node, i.e Mii = 2. M turns into a column stochastic matrix T by normalizing each column
sum to unity through the diagonal matrix d, whose entries are dkk = i Mik, giving T =
Md-1. Thus, the entry Tij represents the probability for a random walker to directly jump
from node i to j.
        The stochastic matrix T is alternately (i) expanded by matrix multiplication (i.e.
matrix squaring) and (ii) renormalized by an inflation procedure resulting again in a
stochastic matrix. Formally, the inflation operator r is defined as

                                                       (T pq ) r
                                  (r T ) pq         k
                                                      (T
                                                     i 1
                                                             pq   )   r
This process of alternating inflation and expansion Is repeated until the resulating
stochastic matrix T takes the form of a doubly idempontent matrix, i.e it does not change
anymore with further inflation/expansion cycles. The final matrix is composed of several
connected components which take the form of star like shapes, forms which are
interpreted as the sought after clusters.
        Modularity of the network: In order to elucidate meaningful partitions of the
network, we applied the MCL algorithm, chosing the inflation parameter r = 1, …, 4 in
steps of 0.25. Evaluating the partitions thus obtained, we defined protein modules by
optimizing the functional coherence and size of thje clusters (Marcotte). In particular, the
functional coherence of cluster I fci is calculated as the fraction of annotated gene pairs
that share at least one functional annotation fpi given cluster with a total of pi annotated
pairs in the ith cluster

                                               fp i
                                      fc i         ,

a measure that tends to be high for small clusters while diminishes if more proteins are
included. As a source of reliable annotation information we utilized the Gene Ontology
(cite GO) In turn, we balance that trend by maximizing the size of the given clusters ,
defining the modulation efficiency EM as

                                EM  N      1
                                                  fc  N ,
                                                 i 1
                                                        i     i

where n is the number of clusters, N is the total number of proteins while Ni is the number
of proteins in the ith cluster. Thus, the partition with the highest modulation efficiency
reflects the best compromise between efficiency of clustering and degree of functional
association between proteins in a cluster.
Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Barabasi AL (2004) Global organization of
        metabolic fluxes in the bacterium Escherichia coli. Nature 427(6977): 839-843.
Aravind L, Iyer LM, Wellems TE, Miller LH (2003) Plasmodium biology: genomic
        gleanings. Cell 115(7): 771-785.
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional
        organization. Nat Rev Genet 5(2): 101-113.
Blair PL, Kappe SH, Maciel JE, Balu B, Adams JH (2002a) Plasmodium falciparum
        MAEBL is a unique member of the ebl family. Mol Biochem Parasitol 122(1): 35-
Blair PL, Witney A, Haynes JD, Moch JK, Carucci DJ et al. (2002b) Transcripts of
        developmentally regulated Plasmodium falciparum genes quantified by real-time
        RT-PCR. Nucleic Acids Res 30(10): 2224-2231.
Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J et al. (2003a) The Transcriptome of
        the Intraerythrocytic Developmental Cycle of Plasmodium falciparum. PLoS Biol
        1(1): 5.
Bozdech Z, Zhu J, Joachimiak MP, Cohen FE, Pulliam B et al. (2003b) Expression
        profiling of the schizont and trophozoite stages of Plasmodium falciparum with a
        long-oligonucleotide microarray. Genome Biol 4(2): R9.
Butler JS (2002) The yin and yang of the exosome. Trends Cell Biol 12(2): 90-96.
Cann H, Brown SV, Oguariri RM, Golightly LM (2004) 3' UTR signals necessary for
        expression of the Plasmodium gallinaceum ookinete protein, Pgs28, share
        similarities with those of yeast and plants. Mol Biochem Parasitol 137(2): 239-
Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M et al. (2002) Genome sequence
        and comparative analysis of the model rodent malaria parasite Plasmodium yoelii
        yoelii. Nature 419(6906): 512-519.
D'Orso I, Frasch AC (2002) TcUBP-1, an mRNA destabilizing factor from trypanosomes,
        homodimerizes and interacts with novel AU-rich element- and Poly(A)-binding
        proteins forming a ribonucleoprotein complex. J Biol Chem 277(52): 50520-
Donnini M, Lapucci A, Papucci L, Witort E, Jacquier A et al. (2004) Identification of
        TINO: a new evolutionarily conserved BCL-2 AU-rich element RNA-binding
        protein. J Biol Chem 279(19): 20154-20166.
Duraisingh MT, Voss TS, Marty AJ, Duffy MF, Good RT et al. (2005) Heterochromatin
        silencing and locus repositioning linked to regulation of virulence genes in
        Plasmodium falciparum. Cell 121(1): 13-24.
Duttagupta R, Vasudevan S, Wilusz CJ, Peltz SW (2003) A yeast homologue of Hsp70,
        Ssa1p, regulates turnover of the MFA2 transcript through its AU-rich 3'
        untranslated region. Mol Cell Biol 23(8): 2623-2632.
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of
        genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25): 14863-
Estevez AM, Kempf T, Clayton C (2001) The exosome of Trypanosoma brucei. Embo J
        20(14): 3831-3839.
Freitas-Junior LH, Hernandez-Rivas R, Ralph SA, Montiel-Condado D, Ruvalcaba-
       Salazar OK et al. (2005) Telomeric heterochromatin propagation and histone
       acetylation control mutually exclusive expression of antigenic variation genes in
       malaria parasites. Cell 121(1): 25-36.
Gardner MJ, Shallom SJ, Carlton JM, Salzberg SL, Nene V et al. (2002a) Sequence of
       Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature 419(6906): 531-
Gardner MJ, Hall N, Fung E, White O, Berriman M et al. (2002b) Genome sequence of
       the human malaria parasite Plasmodium falciparum. Nature 419(6906): 498-511.
Ge H, Liu Z, Church GM, Vidal M (2001) Correlation between transcriptome and
       interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29(4): 482-
Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small
       world. Proc Natl Acad Sci U S A 100(8): 4372-4376.
Golightly LM, Mbacham W, Daily J, Wirth DF (2000) 3' UTR elements enhance
       expression of Pgs28, an ookinete protein of Plasmodium gallinaceum. Mol
       Biochem Parasitol 105(1): 61-70.
Guimera R, Nunes Amaral LA (2005) Functional cartography of complex metabolic
       networks. Nature 433(7028): 895-900.
Haile S, Estevez AM, Clayton C (2003) A role for the exosome in the in vivo degradation
       of unstable mRNAs. Rna 9(12): 1491-1501.
Hall N, Karras M, Raine JD, Carlton JM, Kooij TW et al. (2005) A comprehensive survey
       of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses.
       Science 307(5706): 82-86.
Hall N, Pain A, Berriman M, Churcher C, Harris B et al. (2002) Sequence of
       Plasmodium falciparum chromosomes 1, 3-9 and 13. Nature 419(6906): 527-531.
Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF et al. (2004) Evidence for
       dynamically organized modularity in the yeast protein-protein interaction
       network. Nature 430(6995): 88-93.
Hyman RW, Fung E, Conway A, Kurdi O, Mao J et al. (2002) Sequence of Plasmodium
       falciparum chromosome 12. Nature 419(6906): 534-537.
Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein
       networks. Nature 411(6833): 41-42.
Koonin EV, Wolf YI, Aravind L (2001) Prediction of the archaeal exosome and its
       connections with the proteasome and the translation and transcription
       machineries by a comparative-genomic approach. Genome Res 11(2): 240-252.
Laroia G, Sarkar B, Schneider RJ (2002) Ubiquitin-dependent mechanism regulates
       rapid turnover of AU-rich cytokine mRNAs. Proc Natl Acad Sci U S A 99(4):
Le Roch KG, Zhou Y, Batalov S, Winzeler EA (2002) Monitoring the chromosome 2
       intraerythrocytic transcriptome of Plasmodium falciparum using oligonucleotide
       arrays. Am J Trop Med Hyg 67(3): 233-243.
Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK et al. (2003) Discovery of gene
       function by expression profiling of the malaria parasite life cycle. Science
       301(5639): 1503-1508.
Lehner B, Sanderson CM (2004) A protein interaction framework for human mRNA
       degradation. Genome Res 14(7): 1315-1323.
Li S, Armstrong CM, Bertin N, Ge H, Milstein S et al. (2004) A map of the interactome
        network of the metazoan C. elegans. Science 303(5657): 540-543.
Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA et al. (2004) Genomic
        analysis of regulatory network dynamics reveals large topological changes.
        Nature 431(7006): 308-312.
Mawji IA, Robb GB, Tai SC, Marsden PA (2004) Role of the 3'-untranslated region of
        human endothelin-1 in vascular endothelial cells. Contribution to transcript
        lability and the cellular heat shock response. J Biol Chem 279(10): 8655-8667.
Milone J, Wilusz J, Bellofatto V (2002) Identification of mRNA decapping activities and
        an ARE-regulated 3' to 5' exonuclease activity in trypanosome extracts. Nucleic
        Acids Res 30(18): 4040-4050.
Moraes KC, Quaresma AJ, Maehnss K, Kobarg J (2003) Identification and
        characterization of proteins that selectively interact with isoforms of the mRNA
        binding protein AUF1 (hnRNP D). Biol Chem 384(1): 25-37.
Mukherjee D, Gao M, O'Connor JP, Raijmakers R, Pruijn G et al. (2002) The
        mammalian exosome mediates the efficient degradation of mRNAs that contain
        AU-rich elements. Embo J 21(1-2): 165-174.
Murase T, Iwai M, Maede Y (1993) Direct Evidence for Preferential Multiplication of
        Babesia-Gibsoni in Young Erythrocytes. Parasitology Research 79(4): 269-271.
Raijmakers R, Schilders G, Pruijn GJ (2004) The exosome, a molecular machine for
        controlled RNA degradation in both nucleus and cytoplasm. Eur J Cell Biol
        83(5): 175-183.
Raijmakers R, Egberts WV, van Venrooij WJ, Pruijn GJ (2002) Protein-protein
        interactions between human exosome components support the assembly of RNase
        PH-type subunits into a six-membered PNPase-like ring. J Mol Biol 323(4): 653-
Ralph SA, Scheidig-Benatar C, Scherf A (2005) Antigenic variation in Plasmodium
        falciparum is associated with movement of var loci between subnuclear locations.
        Proc Natl Acad Sci U S A 102(15): 5414-5419.
Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-
        paralogs from pairwise species comparisons. J Mol Biol 314(5): 1041-1052.
Shue P, Brown SV, Cann H, Singer EF, Appleby S et al. (2004) The 3' UTR elements of P.
        gallinaceum protein Pgs28 are functionally distinct from those of human cells.
        Mol Biochem Parasitol 137(2): 355-359.
Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI (2005) The global distribution of
        clinical episodes of Plasmodium falciparum malaria. Nature 434(7030): 214-217.
Vasudevan S, Peltz SW, Wilusz CJ (2002) Non-stop decay--a new mRNA surveillance
        pathway. Bioessays 24(9): 785-788.
von Mering C, Krause R, Snel B, Cornell M, Oliver SG et al. (2002) Comparative
        assessment of large-scale data sets of protein-protein interactions. Nature
        417(6887): 399-403.
Wilusz CJ, Wormington M, Peltz SW (2001) The cap-to-tail guide to mRNA turnover. Nat
        Rev Mol Cell Biol 2(4): 237-246.
Wuchty S (2002) Interaction and domain networks of yeast. Proteomics 2(12): 1715-
Wuchty S (2004) Evolution and topology in the yeast protein interaction network.
       Genome Res 14(7): 1310-1314.
Wuchty S, Oltvai ZN, Barabasi AL (2003) Evolutionary conservation of motif constituents
       in the yeast protein interaction network. Nat Genet 35(2): 176-179.
Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM et al. (2002) DIP, the Database of
       Interacting Proteins: a research tool for studying cellular networks of protein
       interactions. Nucleic Acids Res 30(1): 303-305.

       Roger Guimeras help with the algorithm is greatly acknowledged. This work
was funded by the National Institutes of Health grants AI33656 (JHA), ###### (ALB),
AI055035 (MTF).
                                            a                                                     b

                                            c                                                     d

       Figure 1: (a) Logarithmically binning the interactions according to their Cvw, we
determined the excess retention ER of orthologous proteins in P. falciparum in function
of the hypergeometric       clustering   coefficient Cvw that constitute these groups of
interactions shows a significant logarithmic correlation between high hypergeometric
clustering coefficients and the conservation of these interacting proteins (ER ~ 0.46 x log
Cvw , Pearson's r = 0.71, P = 1.2 x 10-9, Spearman's rank  = 0.69, P = 2.5 x 10-6). (b)
Logarithmically binning data protein interactions according to their hypergeometric
clustering coefficient Cvw,o we determine the mean expression correlation coefficients rP
in S. cerevisiae. In particular, we separate the pool of interactions in disjoint sets of pairs
of interacting proteins that are conserved in Plasmodium. We observe that the distribution
of conserved interactions significantly ascends (Pearson's r = 0.35, P = 3.7 x 10-49,
Spearman's rank  = 0.42, P = 9.1 x 10-64), while the corresponding trend of non
conserved links is weaker (Pearson's r = 0.04, P = 3.9 x10-2, Spearman's rank  = 0.13, P
= 5.1 x 10-14). (c) Concluding, our results indicate a coincidence of (i) co-expression of
interacting proteins, (ii) an enhanced clustering of their immediate neighborhood and (iii)
their elevated tendency to be evolutionary conserved in S. cerevisiae.           Since high
clustering around a certain protein interaction coincides well with an elevated degree of
reliability the integration of knowledge about the yeast protein interaction network, its
local clustering and its proteins' tendency to cluster can be used to elucidate evolutionary
cores of   protein interaction networks in organisms for which orthologues can be
identified. (d) Training a logistic regression model with a random set of conserved and
non conserved protein interactions which are characterized by their hypergeometric
clustering coefficient and co-expression correlation, we evaluated and classified the
actual presence of a yeast interaction. We observe that the distribution of the interactions
which are classified as correct significantly accumulate around high values of the co-
expression coefficient while non-correct links distribute much more even. Accounting for
interactions that appear between conserved proteins in Plasmodium, we find a strong shift
toward co-expression of the correctly classified interactions in the target organism (inset).
Figure 2:

Figure 2: Cartographic representation of the core interaction network of P.
falciparum. Maximizing modularity M, we obtain numerous clusters as indicated by
different colors. Widely, these clusters represent functionally consistent accumulations of
nodes, which are indcated by shades. The network is basically dominated by larger
clusters, where we predominately find proteasome components, ribosomal proteins such
as snRNPs, a large group of translation initiation factors and related proteins, exosome
and spliceosome components, replication factor c, DNA related functions such as DNA
polymerase and helicase subunits, ribosomal proteins and translation factor subunits.
Superimposing the respective co-expression correlation coefficients rP on each
interaction (red: 1.0 to 0.5, yellow: 0.5 to 0.0, green: 0.0 to -0.5, blue: -0.5 to -1.0), we
generally observe that the high cohesiveness in each module coincides with a significant
degree of co-expression of module components.
                                   a                                                          b

       Figure 3: (a) The space spanned by the nodes values of the intra-modular degree
Z and degree k allows a heuristic classification of nodes. We define nodes as party hubs if
their intramodular degree Z scores above 0 and have a degree k larger than 5. In turn, date
hubs refer to nodes which score Z smaller than 1 but still have more than 5 neighbors.
Shaded areas correspond to the color code in (b) Co-expression patterns of the hubs
neighborhoods indicate that generally the majority of nodes in each class prefers to be co-
expressed with their immediate neighboring nodes. However, date hubs show significant
peaks around 0.5, indicating that hubs also are involved in interactions that do not occur
simulatenously. In the inset, we repeat our analysis by determining the mean fraction of
hub neighbors that share the same GO annotation with the node in question. In particular,
we observe that date hubs show a slightly more homogeneous distribution.
       Figure 4: Allowing us to discover that the subnet modularity is time-dependent,
Plasmodium proteins were grouped according to their maximum expression in the three
major stages of asexual development in the erythrocytic stage: ring, trophozoite and
schizont. Nearly all activity of the core network occurs in the ring or the schizont phase
while remarkably almost no expression activity is observed in the trophozoite phase. In
the ring stage, active proteins facilitate cellular transport and gene expression. In the
schizont phase, we predominantly find proteasomal functions and components of and
chromosome condensation. Absence of any major evolutionarily conserved proteins
during trophozoite developmental stage indicates that Plasmodium genes expressed
during this time (i.e., lack an orthologue in yeast) are those metabolic activities that have
evolved as the organism evolved its parasitic life style.
 protein      description                                                             k     Z       rP      fA    C
 PF08_0109    hypothetical protein                                                    22    1.02    0.55   0.00   p
 PF08_0130    wd repeat protein, putative                                             20    3.08    0.86   0.40   p
 PF10_0174    26s proteasome subunit p55, putative                                    20    1.23    0.75   0.90   p
 PF11_0305    hypothetical protein                                                    19    1.34    0.84   0.00   p
 PF14_0676    20S proteasome beta 4 subunit, putative                                 19    0.80    0.65   0.79   p
 PF10_0278    hypothetical protein, conserved                                         18    2.52    0.90   0.39   p
 PFD0665c     26s proteasome aaa-ATPase subunit Rpt3                                  18    1.23    0.61   0.89   p
 PF13_0178    translation initiation factor 6, putative                               17    1.34    0.87   0.00   p
 PF13_0063    26S proteasome regulatory subunit 7, putative                           17    1.23    0.65   0.88   p
 PFB0370c     RNA-binding protein, putative                                           16    2.35    0.77   0.00   p
 PF14_0025    proteosome subunit, putative                                            16    1.02    0.67   0.69   p
 PFC0520w     26S proteasome regulatory subunit S14, putative                         16    0.80    0.69   0.81   p
 PF14_0068    fibrillarin, putative                                                   15    1.83    0.85   0.47   p
 PF10_0114    DNA repair protein RAD23, putative                                      15   -0.04    0.09   0.47   d
 PF08_0009    translation initiation factor EIF-2b alpha subunit, putative            14    1.41   -0.39   0.36   p
 PF11_0105    hypothetical protein                                                    14    1.14    0.86   0.00   p
  PFI0630w    26S proteasome regulatory subunit, putative                             14   -0.04    0.67   0.00   d
MAL13P1.343   proteasome regulatory subunit, putative                                 13    0.38    0.75   0.85   p
 PFL2345c     tat-binding protein homolog                                             13    0.17    0.55   0.85   p
 PF11_0090    hypothetical protein                                                    12    1.41    0.85   0.08   p
  PFI0475w    small nuclear ribonucleoprotein (snRNP), putative                       11    1.45    0.38   0.73   p
 PFE1355c     ubiquitin carboxyl-terminal hydrolase, putative                         11   -0.04    0.59   0.64   d
 PF10_0298    26S proteasome subunit, putative                                        11   -0.04    0.49   0.73   d
MAL8P1.128    proteasome subunit alpha type 6                                         10    1.41    0.83   0.80   p
 PF14_0174    hypothetical protein, conserved                                         10    0.00    0.90   0.30   d
 MAL7P1.24    hypothetical protein, conserved                                         10    0.96    0.89   0.00   p
 PF10_0081    26S proteasome regulatory subunit 4, putative                           10   -0.25    0.63   0.80   d
 PFD0180c     CGI-201 protein, short form                                              9    0.27    0.68   0.33   p
 MAL8P1.48    small nuclear ribonucleoprotein polypeptide g, 65009- 64161, putative    9    0.81    0.67   0.67   p
MAL6P1.119    DEAD/DEAH box ATP-dependent RNA helicase, putative                       9   -0.27    0.91   0.00   d
 PF14_0183    RNA helicase, putative                                                   9   -0.55    0.79   0.44   d
 PFB0875c     hypothetical protein                                                     9    1.22    0.59   0.33   p
MAL13P1.190   proteasome regulatory component, putative                                9   -0.47    0.77   0.89   d
 MAL6P1.88    proteasome subunit alpha type 2, putative                                8    1.41    0.74   0.00   p
 PF14_0716    Proteosome subunit alpha type 1, putative                                8   -0.71    0.77   0.63   d
 PFL0335c     eukaryotic translation initiation factor 5, putative                     8    1.01    0.88   0.50   p
 PF11_0445    DNA-directed RNA polymerase I, putative                                  8    2.37    0.93   1.00   p
 PFE0305w     transcription initiation factor TFiid, TATA-binding protein              8    0.00    0.06   0.75   d
 PF14_0587    hypothetical protein                                                     8    1.57    0.79   0.50   p
 PFL1680w     splicing factor 3b, subunit 3, 130kD, putative                           8    0.59    0.67   0.63   p
 PF07_0067    hypothetical protein                                                     8    0.96    0.89   0.00   p
 PF14_0055    hypothetical protein, conserved                                          8   -0.21    0.82   0.13   d
 PF14_0502    hypothetical protein                                                     8    0.59    0.10   0.00   p
 PF13_0033    26S proteasome regulatory subunit, putative                              8   -0.68    0.78   0.88   d
 PF11_0314    26S protease subunit regulatory subunit 6a, putative                     8   -0.68    0.62   1.00   d
 PFD0450c     hypothetical protein, conserved                                          7    0.81    0.67   0.71   p
 PF07_0117    eukaryotic translation initiation factor 2 alpha subunit, putative       7    1.88    0.48   0.86   p
 PF11_0191    hypothetical protein                                                     7   -0.07    0.85   0.00   d
 PF10_0103    eukaryotic translation initiation factor 2, beta, putative               7    0.00    0.58   0.71   d
Table 1: Here, we show the 50 highest connected hubs which occur in the inferred
network of protein interaction network of P. falciparum. All nodes are characterized by
their degree k, mean coexpression correlation rP and mean fraction of shared GO
annotations fA of their immediate neighbor as well as corresponding standard deviations
and classification of hubs (p: party; d: date hub) .

Shared By: