Embed
Email

web_supp_jan19

Document Sample

Shared by: xiagong0815
Categories
Tags
Stats
views:
0
posted:
2/16/2012
language:
pages:
30
Supplementary Material for “Dramatic reduction of dimensionality in large

biochemical networks due to strong pair correlations”









Table of Contents: Page number



1. Supplementary Figures S1-S4 ……………………… 1-3

2. Generation of Random Linear Networks …………….. 4

3. Nonlinear Feedback Reactions in a

Reaction Network with Linear Architecture ………… 5

4. Biological Networks (Fig. S6) ………………………. 7-9

5. Supplementary Figures S7-S8 ……………………...... 9-10

6. Details of the Simulation Method …………………… 11-12

7. Supplementary Figures S9 …………………………... 13

8. Failure of PCA in reducing dimensionality

in a simple network …………………………………... 14-15

9. Analytical Calculations for

the Linear Network …………………………………… 17-19

10. Supplement on Gramians ……………………………. 20-21

11. Studying the role of modifications likely to occur in exp-

eriments ......................................................................... 22-26

12. Numbers of species and timepoints used for the biological

networks …………………………………………........ 27

13. Effect of unit variance scaling (Fig. S12) ................... 28

14. References ................................................................... 29

N=64 Linear Cascade: Top Two Components

100 100



99

80 98



97

Trial Index









60

96



94

40

93



20 92



91



0 90

0 0.2 0.4 0.6 0.8 1

Time







Fig. S1. Time dependence of the percentage explained by the top two eigenvalues of the

covariance matrix calculated for the linear network for N=64 species. Results for100 trials (each

associated with a set of rate constants and initial conditions randomly drawn from a uniform

distribution of a range of 0.1 to 10) are shown and demonstrate that more than 90% of the total

variance is captured by the top two eigenvalues most of the time. All the trials were started with

a non-zero concentration of the last species ( c64 ! 0 ) and zero concentrations for all other

species. The calculation of the covariance matrix included all the 64 species concentrations.









1

(A) (B)

Fig. S2. Minimum percentage explained by the top and the top two eigenvalues of the

covariance matrix for the linear reaction network. (A) Distributions of the minimum of the

percentage explained by the top eigenvalue in the entire time course for any trial for N=8 to

N=512. In each trial a set of rate constants and initial concentrations were randomly drawn from

a uniform distribution between 0.1 and 10. All the 10,000 trials were started with a non-zero

concentration of the last species and zero concentrations for all other species. The calculation of

the covariance matrix included all the species concentrations. (B) Distributions of the minimum

percentage explained by the top two eigenvalues for the same cases as in (A).









(A) (B) (C)

Fig. S3 Effect of initial conditions on the distributions of the minimum percentage

explained by the eigenvalues in the linear network. We compare the distributions of the

minimum percentage explained by the largest eigenvalue of the covariance matrix for two

different types of initial conditions for the linear network with N=64. The two different types

correspond to (i) only the concentration of the last species is non-zero ( c64 ! 0 ) at t=0, whereas,

the rest of the species start with zero concentrations, (ii) all the species concentrations have non-

zero values at t=0. For both the cases, the rate constants and the initial concentrations were

varied in the same manner (as in Fig. S2). (A) Shows distributions of the largest eigenvalue. (B)

Distributions for the top two eigenvalues. (C) Distributions for the top three eigenvalues. The top





2

eigenvalue seem to capture the variances more efficiently when all the species were given non-

zero initial concentrations . The shapes of the distributions start to converge when three

eigenvalues are considered.









(A) (B)

Fig. S4. Comparisons between results of calculations including all the species vs. including

only a set of independent species in the covariance matrix for the linear network. (A)

Distributions of the minimum percentage explained by the largest eigenvalue when the

covariance matrix was calculated using all the species (black line) and only the independent

species concentrations (red line) for N=64. The number of the independent species was

calculated by accounting for the conservation laws (e.g., total concentrations of the all the

N

species remains unchanged in the linear network, ! c (t) = const ) that produce linear

i=1

i





dependence between different species concentrations. The parameters used for the simulation are

the same as described in Fig. S2. (B) Distributions of the minimum percentage explained by the

top three eigenvalues for the same cases as in (A).











3

Generation of Random Linear Networks

1. For a system of N number of species, we choose a subset of n1 ! N species, where n1 is drawn

from a uniform random number distribution ranged from 2 to N. The species in the sub-network,

indexed as, c1,!, cn1 , are connected such that ci reacts with ci+1 with first order forward and

backward reactions for i 2 , then, we choose another subnet-work

of n2 ! N " n1 following the same method as described in step 1. The reactions in the sub-

network are also generated in the same manner. We continue this process as long as, a single

species or a pair of species is left in the system. We follow the next two steps when that situation

is reached.

B. If, N ! n1 = 2 , then we connect the remaining pair of species by first order forward and

backward reactions.

C. If N ! n1 = 1 , then we connect the last species with a randomly chosen species in the last sub-

network by first order forward and backward reactions.





After we generate the network architecture that is consists of a collection of disconnected sub-

networks with branched and/or linearly connected reactions we assign rate constants and initial

conditions to all the reactions from the range [0.1-10] for the rates and [0.1-10] for the initial

concentrations.





At each trial, a new network architecture and a new set of reaction rates and initial conditions are

generated.











4

Nonlinear feedback reactions in a reaction network with linear architecture









The species in the network react with each other as shown above in the schematic figure. The

wavy lines indicate the feedback reactions. In the above network, species i, interacts with species

i and i+1 as following the mass-action kinetics below,

dci

= ki!1ci!1 + k!i ci+1 ! (k!(i!1) + ki )ci ! ki!1ci ci!1 + kif ci ci+1

f



dt



The last two terms in the above ODE denote the feedback reactions.









Fig. S5. Minimum percentage explained by the top four eigenvalues in the linear reaction

networks with non-linear feedback reactions. Distributions of the minimum percentage

explained shown for N=8 to N=512 for 10,000 trials. In each trial a set of rate constants and

initial concentrations were randomly drawn from a uniform distribution in the range of 0.1 to

10). Each trial was started with a non-zero concentration for the last species and zero

concentrations for all other species.











5

(A) Ras Activation Network









(B) EGFR Signaling Network (Kholodenko et al. (1999))











6

(C) EGFR Signaling Network (Blinov et al. (2006))











7

(D) NF-kB Activation Network





Fig. S6. Biological networks. (A) Ras activation network. Ras activation (transition from

RasGDP to RasGTP) is carried out by two enzymes (Guanine nucleotide Exchange Factors or

GEFs) SOS and Rasgrp1. SOS is recruited to the plasma membrane by the T cell receptor (TCR)

signalosome (not shown in the figure) which is formed when TCRs interact with strong affinity

antigens. In addition to a catalytic site (marked by a yellow circle) that mediates Ras activation,

SOS also has an allosteric site which can bind to both RasGDP and RasGTP (1). When the

allosteric site is occupied by RasGTP the rate of Ras activation via SOS becomes almost 75

times faster. This is the positive feedback. Rasgrp1 is recruited to the plasma membrane by

Diacylglycerol (DAG) which is produced by signaling events downstream to TCR triggering.

Unlike SOS, Ras activation by Rasgrp1 does not have a positive feedback. Ras deactivation is

mediated by the enzyme, RasGAP. (B) EFGR signaling network (coarse grained). EGFR

signaling network where multiple receptor activation states are coarse-grained into a single state.

The details regarding the signaling pathway shown here can be found at

http://models.cellml.org/exposure/468382d300fa0027425f72c195afd11a
and Ref. (2). (C)

EFGR signaling network (including detailed receptor activation states). Details of the

signaling network can be found in Ref. (3). The figure above was taken from Ref. (3) with the

permission of the authors. (D) NF-kB activation network. In a resting state NF-κB is bound to

the three isoforms of IκB (α, β, ε) in cytoplasm. Following stimulation, IKK-complexes get







8

activated. IKK phosphorylates and degrades IκB isoforms, releasing NF-κB from its bound state.

Free NF-κB translocates to the nucleus(4). While the transcription factors for the IκBβ and IκBε

are produced at a steady rate, the transcription factor for the IκBα gets positively regulated by

nuclear NF-κB. The nuclear IκB isoforms then bind with their nuclear NF-κB partners and

translocate them back to cytoplasm, where in presence of IKK, the same cycle repeats itself,

giving rise to oscillation in NF-κB’s nuclear localization. The BioNetGen files describing the

reactions and the parameters used can be found at the link

http://planetx.nationwidechildrens.org/~jayajit/pairwise_correlations/



.










Ras−Sos Network: Top Component Ras−Sos Network: Top Two Components

100 100 100

100

99

95

80 98

80 89

97

84

Trial Index









60

Trial Index









60 96

78

94

73 40

40 93

68

20 92

20 62

91

57

0 90

0 52 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 Time

Time









Fig. S7 (A) Temporal evolution of the percentage explained by the largest eigenvalue for the

Ras activation network until the steady state is reached for 100 different trials (denoted as trial

index in the y axis). In each trail, the kinetic rates and initial concentrations were varied around

the base parameters (web supplement) by up to 50% from a uniform random distribution. (B)

Time evolution of the percentage explained by the top two eigenvalues for the same trails as in

(A).











9

Minimum Percent Explained by Top Two Components

Ras-Sos Network

100

Stable

Bistable

80

Percent of Trials









60





40





20





0

20 30 40 50 60 70 80 90 100

Percent Explained









(A) (B)









(C)







Fig. S8 Percentage explained by more than one principal components for the nonlinear

networks. (A) Shows the Ras activation network, (B) EGFR signaling and (C) NF-kB activation

networks. The parameters used are the same as in Fig. 2 in the main text.











10

Details of the Simulation Method:

We use the rule based modeling software, BioNetGen(5), to generate time courses for the species

kinetics in a biochemical network. This software produces a set of ODEs corresponding to the

mass-action kinetics describing bio-chemical reactions in the networks. The ODEs are solved

numerically in the software using the CVODE solver. We have used a time step ( dt )

proportional to the inverse of the fastest time scale ( dt = ! fastest / 2 ) associated with the reactions.

For the linear networks we calculate ! fastest from the largest rate constant, ie., ! fastest = 1 / klargest ,

for each trial as all the reactions are first order reactions. For the networks with non-linear

reactions, which consist of second order binding unbinding and first order reactions, we calculate

the largest possible rate ( klargest ) of the reactions for every trial by finding the maximum of the

first order rates and the product of the binding rates with the largest species concentration.

Following the same scheme we calculate the minimum possible rate ( ksmallest ) of a reaction in the

linear ( ksmallest =minimum of all the rates) and the non-linear networks ( ksmallest =minimum of all

the first order rates and the product of binding rates and the largest species concentration). The

BIONETGEN files describing the reactions for all the networks studied can be found at the link

http://planetx.nationwidechildrens.org/~jayajit/pairwise_correlations/. We have checked in

specific cases that the results obtained from BionetGen agree with results from higher accuracy

calculations using Mathematica.





Calculation of the steady state: We calculated the eigenvalues of the correlation matrix, Cij ,

for times until the system reached the steady state. When reaction kinetics is described by ODEs,

the system reaches the steady state in principle at t ! " . However, as the system approaches the

fixed point the relative changes in species concentrations compared to the steady state

concentrations decreases with time. In our simulations, we ended the kinetics when we reached

such a limit. We denote this time as the estimated steady state time (tS ) . For all the networks

(linear and non-linear, except the NF-κB network) tS is calculated from the smallest time scale

(! ksmallest ) in the system. We justified this choice by evolving the system for a longer time to

N N

check if the relative Euclidean distance, i.e., d(t1, t2 ) = " (ci (t1 ) ! ci (t2 ))2

i=1

" c (t = 0) between

i=1

i





the species concentrations is smaller than a tolerance of 10-6 when t1 = tS and t2 = 2tS . The table

below (Table I) shows the values of these variables for the different networks we use. For some

of the networks, the system reached close to steady state concentrations even for times t ti ) is given by, ci (t) = ci 0 (ti )e! ki (t !ti ) .







Case I

First we consider a situation where all the reactions start at the same time, t1 = t 2 = ! = t m = 0 ,

with the same values of initial concentrations, i.e., c10 (0) = ! = cm 0 (0) = c0 . We also assume that

all the species decay at the same rate, therefore, k1 = k2 = ! = km = 1 . With the above

simplifications, all the species follow identical kinetics, where, ci (t) = c0 e!t . Thus, the correlation

matrix, Cij , defined as,



T

1

Cij (T ) = ! dt ci (t)ci (t)

T 0

2 T

c0 c2

T !

= dt e"2t = 0 (1 " e"2T )

0

2T







Note that all the elements of the Cij matrix are equal, and there is only one eigenvalue (

2

mc0

!1 = (1 " e"2T ) ) that is non-zero, and, rest of the eigenvalues vanish, i.e.,

2T

!2 = !3 = ! = !m = 0 . Therefore, the largest eigenvalue captures 100% of the variance! This

behavior is expected, because, all the species started decaying at the same time with the same





14

rate (or “velocity”), therefore, the trajectory of the system in the manifold spanned by the species

concentrations is a straight line and an eigenvector along that direction will capture all the

variations in the trajectory at all times. Now, we will construct a case where we will introduce

each of the reactions at different times and show that a particular arrangement of those times will

create a trajectory that will require all the eigenvectors capture equal amount of variation in the

trajectory.





Case II

Here we consider a situation where the reactions in (1) are introduced sequentially at different

times ( {ti } ), such that, t m > t m !1 > t m ! 2 > ! > t 2 > t1 = 0 . As in case I, we assume all the rates are

set to unity and all the species start with the same initial concentration, c0 . The correlation

matrix, Cij , is given by,



T 2 T 2

1 c0 c0 (ti +t j ) "2ti

Cij (T , ti , t j ) = ! dt ci (t)c j (t) = ! dt e "(t "ti ) "(t "t j )

e = e # e " e"2T %

$ &

T 0 T ti 2T



where, ti > t j and ci (t) = 0 when, t t1 . The corresponding eigenvalues are,



(C11 + C22 ) ± (C11 " C22 )2 + 4C12

2



!1,2 =

2



The % explained by the largest component is given by



1" (C11 ! C22 )2 + 4C12 %

2



$1 + ' ( 100% .

2$ C11 + C22 '

# &







When, T ! t1 ,t 2 , the above expression simplifies to,



&1 !2(t 2 !t1 ) )

( "1 + e

# $ + O(e!2T )+ , 100% . Therefore, when t 2 ! t1 ! 1 , the % explained by the top

%

'2 *

component is very close to 50%, and both the eigenvectors capture the total variance almost

equally.





15

Now we can easily generalize the result for any m > 2 . When, T ! t1 ,",t m ,



1 !(ti !t j )

Cij (T ,ti ,t j ) = e + O(e!2T ), ti > t j

2T



and,

1

Cii (T ,ti ,ti ) = + O(e!2T )

2T



Therefore, when the times ({tt }) for introducing the reactions are chosen such that ti ! t j ! 1,

then the off-diagonal elements of the Cij matrix decreases exponentially fast compared to the

diagonal elements. Thus, in this case we get a set of degenerate eigenvalues all equal to 1/2T in

the leading order and the percentage explained by the any eigenvalue is equal to 1 / m (to the

leading order).











16

Linear Model

In this section we provide analytic arguments to support the numerical result pre-

sented in the text that one or two principal components are sufficient to explain almost

all the variance in an overwhelming number of cases for the linear cascade and its variants.



In general the concentrations of the linear system satisfy the rate equations

dci (t) ∑

= Mij cj (t) (1)

dt j



where the reaction matrix elements Mij are constants that are linear in the the rate con-

stants.



The concentrations that obey the set of reactions shown in Figure 1 in the text yield



the matrix elements Mi,i+1 = k−i , Mi,i−1 = ki−1 and Mii = −ki∑ k−(i−1) for 1 ≤ i ≤ n

if we set k−1 = kn = 0. The system obeys the conservation law j cj (t) is independent

of time.



The eigenvalues and eigenvectors of M determine the time evolution of the system.

One of the eigenvalues say λ0 will be zero reflecting the conservation law and the rest of

them can be proved to be negative. Denote the eigenvalues by −λα (λα ≥ 0) and the

right- and left-eigenvectors by vα,i and uα,j respectively. The solution can be written in

the form



N −1

ci (t) = e−λα t Dα vα,i (2)

α=0



where the coefficients Dα are determined by the initial conditions: Dα = i uα,i ci (t =

0) . The first coefficient D0 yields the sum of the concentrations. We integrate the

solutions from t = 0 up to t = τ to obtain the expression for the elements of the covariance

matrix

∑∑ ( )

1 − e−(λα +λβ )τ (1 − e−λα τ )(1 − e−λβ τ )

Cij (τ ) = Dα Dβ vα,i vβ,j − (3)

α β

(λα + λβ )τ λα τ × λβ τ



Only the terms in which both λα and λβ nonzero contribute in the above sums and so the

covariance matrix is effectively an (N − 1) × (N − 1) matrix.



We write it in the form

∑∑

Cij (τ ) = Dα Dβ vα,i vβ,j × Fα,β (τ )

α β

( )

1 − e−(λα +λβ )τ (1 − e−λα τ )(1 − e−λβ τ )

where Fαβ (τ ) ≡ − . (4)

(λα + λβ )τ λα τ × λβ τ

Since Fαβ (τ ) is independent of initial conditions and depends only on the rate constants

the matrix denoted by F is easier to analyze.









17

k

An extreme limit of the cascade is one with degenerate time scales given by Ci Cj

k

for all pairs i ̸= j. In this case the reactant matrix M has elements Mi,j = k for i ̸= j and

Mii = −(N − 1)k. The eigenvalues are easy to determine: there is one zero eigenvalue

and N − 1 degenerate eigenvalues with value −N k. It is convenient to let the indices run

from 0 to N − 1: λ0 = 0 and λα = −N k for α = 1, · · · (N − 1). It is easy to see that

F0α = 0 for all α. So we consider Fαβ with α and β ranging from 1 to (N − 1). Since all

the λα are identical all the matrix elements of F are the same and we will denote it by f .

It is easy to determine the eigenvalues of F : it has one non-vanishing eigenvalue (N − 1)f

and the rest vanish. Since a single eigenvalue exhausts the trace it is straightforward to

show that PCA works perfectly for all initial conditions. Given that in this extreme case

PCA works exactly, for small deviations one can argue that the performance of PCA will

be very good using matrix perturbation theory.



The other case we consider is one in which the rate constants are randomly distributed

yielding a uniform distribution for the eigenvalues of M . We have verified numerically

that the eigenvalues of M for the case displayed in Figure 1 of the text yields an ap-

proximately uniform distribution with an excess at λα close to zero for large N . We

provide a variational bound for the percentage explained by the first principal compo-

nent of F . Denote the largest eigenvalue of F by Λmax . We use the variational bound



Λmax ≥ α,β vα Fαβ vβ for any normalized vector v.



For a given value of N we consider a model in which the eigenvalues of λ of the reac-

tion matrix are distributed uniformly between 1/N and 1. Since F is a positive-definite

matrix, we know from the Perron-Frobenius theorem that the eigenvector corresponding

to the largest eigenvlaue has all positive elements. This suggests choosing the trial vector

⃗ = √N −1 (1, 1, 1, ·, · · · 1, 1) that is guaranteed to have a positive overlap with the exact

v 1

∑ ∑

eigenvector corresponding to Λmax . We have Λmax > N 1 −1 α β Fαβ . We replace the

sum on the eigenvalues by an integral with a uniform density. Similarly we can evaluate

the trace and find a lower bound on the percentage explained by by the first principal

component. The averaging over the eigenvalue distribution can be evaluated numerically

and the results for N = 1000 are displayed in the Figure.

In the limit N → ∞ the integrals can be evaluated exactly in terms of incomplete

Gamma functions (not displayed) and the results are similar. We emphasize that our

results are for the matrix F and not the covariance matrix itself and therefore, for special

initial conditions this bound may be violated.



We have studied the large τ behavior as follows. Consider the matrix F for τ >>

1/λmin where λmin is the eigenvalue of M with the least magnitude. Recall that F is an

N − 1 by N − 1 matrix with a zero eigenvalue removed. Since all the exponentials are

negligible, to leading order in τ −1 it has the form of a Cauchy matrix with

1 1

Fα,β = .

τ λα + λβ









18

0.9





0.8









explained

0.7





0.6





0.5

1 5 10 50 100 500 1000

Τ





Figure S10: Bound on the percentage explained by the first principal component as a

function of τ . We have chosen τ ≥ 1 since the largest eigenvalue is O(1).



We use the variational bound as before and obtain

1

∑ ∑ 1

Λmax N −1 α β λα +λβ

≥ ∑ 1

T rF α 2λα



Now we write the numerator as a sum of rows and use the fact that the harmonic mean

is less than the arithmetic mean. For row α we have

1 1 ∑

∑ ≤ (λα + λβ ) = µ + λα

1

N −1

1

β λα +λβ N −1 β



where T r M = α λα is the trace of the reaction matrix and µ ≡ 1

N −1

Tr M we obtain

∑ 1

Λmax α λα + µ

≥ ∑ 1

T rF α 2λα



So far this is exact; if we assume that the eigenvalues λα of the reaction matrix are

distributed between 1/N and 1 uniformly for large N we can simplify this further. We

have µ = 1/2 and the numerator can be seen to be approximately N log 3 for large N .

The denominator involves a harmonic sum and is logarithmic: is (1/2) N log N . We have



Λmax 2 log 3

≥ .

T rF log N



Thus the fraction explained by the first component is bounded at least by 1/ log N . At

N = 100, N = 500 and N = 1000 this yields a bound of 0.477, 0.3535 and 0.31. These

bounds are consistent with our numerical results. We have been unable to establish better

analytic bounds.









19

Supplement on Gramians



Consider a set of k vectors {⃗j } (j = 1, · · · , k) in an N -dimensional real vector space. The

v

(symmetric) Gram matrix or Gramian is defined by Gij = ⃗i · ⃗j . Note that the k vectors

v v

are linearly independent if there exist k constants ck (not all of which are zero) such that

∑k ∑n

v

j=1 cj ⃗ j = 0 . This corresponds to the equation j=1 Gij cj = 0 and linear independence

occurs if and only if det G ̸= 0. Thus the Gram determinant arises naturally in determining

linear independence. The Gram determinant is non-negative and it can be shown to be the

squared (hyper)-volume of the k-dimensional parallelepiped (or parallelotope) formed by the

vectors. This is easy to demonstrate in three dimensions.



c

As in the differential geometry of three dimensions we consider a curve ⃗ in N -dimensions

′ , ⃗′′ , ⃗′′′ , · · · , ⃗ (k) } where

and define its properties in terms of the N -dimensional vectors {⃗ c c

c c

the prime denotes differentiation with respect to the arc length s the arc length traced by the

curve from a chosen initial point. The curve is planar (embedded in N -dimensions) if only

the first two of the vectors are linearly independent and so on. Thus the local dimensionality

of the curve can be obtained by computing the above vectors successively and calculating the

corresponding Gram determinant, We have done this for k = 2 and k = 3 for the linear and

nonlinear networks.



For completeness we provide some of the equations used in the program below. Let

...

c, c, c denote the first, second, and third derivatives of the concentration with respect to

˙ ¨ √

˙ ˙

time respectively. The arc length is defined by (ds)2 = d⃗ · d⃗. Therefore, s = ⃗ · ⃗ .

c c ˙ c c



We repeatedly use the fact f ′ (s) ≡ df = f˙ where s = ˙ ˙ ˙

⃗ · ⃗ to obtain the the

c c

ds s˙

derivatives with respect to the arc length in terms of the time derivatives. We have

˙

c



⃗ ′ (s) =

c

˙

s

⃗¨

c ˙



⃗s

⃗ ′′ (s) = 2 − 3

c

˙

s... s ˙

s⃗ − ⃗ s

˙c c˙ ... ¨

s

⃗ ′′′ (s) =

c − 3 2 ⃗ ′′ (s)

c

s4

˙ s˙

c

These can be used to calculate the derivatives of ⃗ with respect to the arc length s knowing

the time derivatives. Now the equation of motion yields ⃗:c˙

∑ ∑

˙

ci = Mij cj (t) + Bijk cj (t)ck (t)

j j,k



for system with a quadratic nonlinearity as in the case of NFκB network, for example. For

c

the linear network B vanishes. We can obtain the higher-order time derivatives of ⃗ by









20

differentiation iteratively. The time derivatives of s are displayed below:



˙

s = ˙ ˙

⃗·⃗

c c

¨ ˙

⃗·⃗

c c

¨

s =

˙

s ...

... ¨·⃗ +⃗ · ⃗

c ¨ ˙

⃗ c c c ¨

s

s = − s. ¨

˙

s ˙

s

Knowing the vectors ⃗′ , ⃗′′ , ⃗′′′ the Gram determinant can be evaluated using the dot

c c c

products numerically as a function of time.









21

Studying the role of modifications likely to occur in experiments.



The EGFR model of Kholodenko et al (Ref. 27 in the main text) was used to address the role of

the effects that are likely to occur in experimental datasets where it is not possible to assay all

molecular species and measure protein activations at very small time intervals in modifying the

ability of the principal components to capture variance in the data sets. In particular, we wanted

to investigate why the variations captured (71%) by the top three components for the

experiments in Gaudet et al. (Ref. 8 in the main text) was significantly less compared to the

variation explained by the top three components for the in-silico networks we studied. We

compared our results with the experiments in Gaudet et al. (Ref. 8 in the main text) where the

EFGR signaling network is one of the cross-reacting signaling networks. All the simulations

were performed using 1200 different parameter sets.



(i) PCA was performed on a sub set of species instead of all the species that are generated by this

signaling network. This is more like to occur in experiments where is it difficult to assay every

molecular species that is generated due to the biochemical reactions. Here, we considered the

species EGF, Grb2, Grb2_Sos, Shc, Sos, and R as described in the BioNetGen file

(http://bionetgen.org/index.php/Egfr_path) for the above network that contains 18 species.



(ii) We performed PCA on data with the concentrations chosen in (i) scaled by their

concentrations at t=0. This probes the effect of using fold changes instead of absolute

concentrations of proteins to measure activation as done in the analysis of experiments by

Gaudet et al. (Ref. 8 in the main text).



(iii) Next we probed the effect of choosing fewer data points in time separated by larger time

intervals than that used for the results in Fig. 2D. There were more than 1000 time points in each

kinetic trajectory used for Fig. 2D. We used 13 time points, 0 t S /1440, 5 t S /1440, 15 t S /1440, 30

t S /1440, 60 t S /1440, 90 t S /1440, 120 t S /1440, 240 t S /1440, 480 t S /1440, 720 t S /1440, 960 t S /1440,

1200/ t S 1440, 1440 t S /1440, where the time t S denotes the time where the system reaches the

steady state. The specific time points were selected to better match the time points in the data

given in Gaudet et al. (Ref. 8 in the main text).



(iv) We added random noise in the species concentrations. This was done to investigate the effect

of having random variability in the data arising from experimental measurements in Gaudet et al.

(Ref. 8 in the main text). In our simulations, for each protein concentration obtained from the

solution of the ODEs, a random number is drawn from either [.001,1] or [1,1000] with equal

probability, and multiplied any concentration.

Conclusion: Our results show that introducing the effects described above can in some cases

reduce the percentage explained by the top eigenvectors. On the other hand, we find that, overall

the above modifications increased the percentage explained by the top eigenvector, e.g. the peak

of the minimum percentage explained distributions increased by about 30%. It is possible that

for a specific set of parameters the minimum percentage explained decreased after the

modifications. However, even in that case, the minimum value of the minimum percentage

explained distributions is around 40% for all the modifications. Therefore, the effects considered

above are unlikely to underlie the rather low value observed in the experimental dataset given in









22

Gaudet et al. (Ref. 8 in the main text) where the top three principal components capture only

71% of the total variance.







Fig. S11: Results showing effects of different modifications in the EGFR network

(Kholodenko et al. )

Minimum Percent Explained by First Component

Kholodenko et. al., 1999

140

Experiment 0





120







100







80

Frequency









60







40







20







0

40 50 60 70 80 90 100

Minimum Percent Explained by Top Component









A. Result of PCA when all the species in the network described by Kholodenko et al were

considered. The time points used are the same as in Fig. 2D. 1200 different parameter sets were

used. This graph represents the ‘control’ in our numerical experiments.

Minimum Percent Explained by Top Component

Kholodenko et. al., 1999 Change in Minimum Percent Explained by Top Component Vs Control

800 Kholodenko et. al., 1999

Experiment 1 120

Experiment 1

700



100

600





500 80

Frequency









Frequency









400

60



300



40

200





100 20





0

60 65 70 75 80 85 90 95 100 0

-30 -20 -10 0 10 20 30 40 50

Percent

Change in Percent









B. (Left) Results from the modification described in (i). (Right) Distribution of the percent

changes for any parameter set as (i) was carried out. A positive change implies increase in the

percent explained when (i) was carried out and vice versa.









23

Minimum Percent Explained by First Component

Kholodenko et. al., 1999

400

Experiment 2



350





300





250









Frequency

200





150





100





50





0

40 50 60 70 80 90 100

Minimum Percent Explained by Top Component





C. Results from the simulations when both the effects described in (i) and (ii) were included.









Minimum Percent Explained by First Component

Kholodenko et. al., 1999

700

Experiment 3





600







500







400

Frequency









300







200







100







0

80 82 84 86 88 90 92 94 96 98

Minimum Percent Explained by Top Component



D. Results from simulations when both the modifications described in (i) and (iii) are included.









24

Minimum Percent Explained by First Component

Kholodenko et. al., 1999

500

Experiment 4

450



400



350



300

Frequency

250



200



150



100



50



0

50 55 60 65 70 75 80 85 90 95 100

Minimum Percent Explained by Top Component









E. Results from simulations when the modifications described in (i), (ii), and (iii) are included.



Minimum Percent Explained by First Component

Kholodenko et. al., 1999

400

Experiment 5



350





300





250

Frequency









200





150





100





50





0

50 55 60 65 70 75 80 85 90 95 100

Minimum Percent Explained by Top Component







F. Results from simulations when the modifications described in (i), (ii), and (iv) are included.









25

Minimum Percent Explained by First Component

Kholodenko et. al., 1999

120

Experiment 6





100







80

Frequency









60







40







20







0

55 60 65 70 75 80 85 90 95 100

Minimum Percent Explained by Top Component









G. Results from simulations when all the four modifications described in (i), (ii), (iv), and (iv)

are included.









26

Table II: Number of species and the time points used in the data matrix for the biological

networks studied in the paper.







Network Number of species

Ras Activation Network 14

EGFR Signaling Network: Smaller version 18

(Kholodenko et al., Ref. 27)

EGFR Signaling Network: Larger version 356

(Blinov et al., Ref. 26)

NF-κB Activation Network (Ref. 29) 26





The smallest time intervals for calculating the species concentrations and the time the system

takes to reach the steady state concentrations were chosen using the largest and smallest time

scales estimated from the rate constants and initial concentrations in a reaction network.

Therefore, we had a large number (>103) of time points in our simulations. See the section on,

“Details of the simulation method” and Table 1 in the supplementary material for more details.









27

Fig. S12: Effect of unit variance scaling on percentage explained by top principal

components in the Ras activation network



Minimum Percent Explained by Top Component Minimum Percent Explained by Top Two Components

Ras-Sos Network Ras-Sos Network

35 25

Unit-Variance Scaling Unit-Variance Scaling



30

(A) 20

(B)

25







20 15

Frequency









Frequency

15

10



10





5

5







0

35 40 45 50 55 60 65 70 75 80 85 0

Percent 70 75 80 85 90 95 100

Percent

Minimum Percent Explained by Top Three Components

Ras-Sos Network

80

Unit-Variance Scaling



70

(C)

60





50

Frequency









40





30





20





10





0

88 90 92 94 96 98

Percent









We considered the kinetics of the molecular species involved in the Ras activation

network (Fig. S6A). Each species concentration at any time point was scaled by the

variance for that particular species calculated by using concentrations of that species at

all the time points (from t=0 to the end time of the simulation) as independent variables.

Then we performed principal component analysis on the scaled kinetic trajectories at all

the time points as described in the main text. We show results for the distributions of the

minimum percentage explained by the top three components calculated over 130 trials.

The minimum percentage explained (shown in (A)) by the top component was peaked at

a smaller value compared to case when the variables were not scaled (see Fig. 2C in the

main text). However, as we included the top two (shown in (B)) and three (shown in (C))

principal components, the principal components in the scaled dataset captured significant

amount (peak at 98% for three components) of variations in the original data set.









28

1.
 Das
J,
et
al.
(2009)
Digital
signaling
and
hysteresis
characterize
ras
activation
in


lymphoid
cells.
Cell
136(2):337‐351.


2.
 Kholodenko
BN,
Demin
OV,
Moehren
G,
&
Hoek
JB
(1999)
Quantification
of
short


term
signaling
by
the
epidermal
growth
factor
receptor.
J
Biol
Chem
274(42):30169‐

30181.


3.
 Blinov
ML,
Faeder
JR,
Goldstein
B,
&
Hlavacek
WS
(2006)
A
network
model
of
early


events
in
epidermal
growth
factor
receptor
signaling
that
accounts
for


combinatorial
complexity.
Biosystems
83(2‐3):136‐151.


4.
 Hoffmann
A,
Levchenko
A,
Scott
ML,
&
Baltimore
D
(2002)
The
IkappaB‐NF‐kappaB


signaling
module:
temporal
control
and
selective
gene
activation.
Science


298(5596):1241‐1245.


5.
 Blinov
ML,
Faeder
JR,
Goldstein
B,
&
Hlavacek
WS
(2004)
BioNetGen:
software
for


rule‐based
modeling
of
signal
transduction
based
on
the
interactions
of
molecular


domains.
Bioinformatics
20(17):3289‐3291.


6.
 Mckay
MD,
Beckman
RJ,
&
Conover
WJ
(1979)
Comparison
of
3
Methods
for


Selecting
Values
of
Input
Variables
in
the
Analysis
of
Output
from
a
Computer
Code.


Technometrics
21(2):239‐245.
















29


Other docs by xiagong0815
ENUS108358
Views: 0  |  Downloads: 0
Fastener 3
Views: 0  |  Downloads: 0
2.2.3
Views: 1  |  Downloads: 0
dartford_e_performance_summary
Views: 0  |  Downloads: 0
bscdeanslist
Views: 0  |  Downloads: 0
efiworkshopzschockejansky_abstract
Views: 0  |  Downloads: 0
pp_chapter_06b
Views: 0  |  Downloads: 0
2011-General Instructions-fr-ExecSec
Views: 0  |  Downloads: 0
Review 2010 Russia
Views: 0  |  Downloads: 0
Replacing an existing network printer
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!