To assess whether additional experiments could significantly increase the number of identified
proteins, we plotted the total number of distinct protein identifications as a function of the number of
overall high quality peptide identifications for the twelve S2 cell line experiments (red dots, Fig. 2).
To rule out that the size of the experiments has a large effect, the increase of protein identifications
for hundred random permutations of the experiment order is also shown (gray dots). As expected,
independent of the experiment order, a fast saturation of distinct protein identifications was
observed. To forecast the increase of distinct protein identifications beyond the experimentally
identified proteins, successive Monte-Carlo-Simulations were carried out assuming different protein
identification probabilities. The resulting saturation curves are shown in Figure 2 and include two
extreme scenarios: (i) the probabilities for identification of a protein are all equal (blue curve) and
(ii) the probabilities of those proteins not identified in the 12 experiments are all 0 (purple curve).
The assumption that each of the proteins not yet identified has a small probability of being identified
results in the green curve. Depending on how many genes are not expressed in a given tissue or cell
line (i.e. their encoded proteins have a probability 0 of being detected), the green curve will be
closer to the purple curve. Since the red dots are above the purple curve, the assumption that the
probability for each of the proteins not identified is 0 is wrong. We thus expect that we could
identify more proteins with more experimental trials, but since all simulated curves are saturation
type curves, the effort will become increasingly higher.
Analysis driven experimentation (ADE) to increase proteome coverage
We developed an iterative feedback strategy which cycles between experimentation, thorough
bioinformatics and statistical analysis, and re-directed experimentation, which we term ADE, for
analysis-driven experimentation. This strategy enables us to optimize experimental conditions and
specifically target those parts of a proteome that are underrepresented in prior data sets. To that aim,
a number of physico-chemical and functional protein parameters (length, isoelectric point, favored
codon frequency (FCF) as a measure for protein abundance, transmembrane domains, signal
peptides) were computed for all distinct proteins Drosophila proteins (Supplementary Table 1
online). In a first analysis we compared the length distribution of all proteins with that of 4,455
experimentally identified proteins from five large Kc167 cell line experiments and assessed their
statistical significance by calculating areas of under- or overrepresentation in a combined histogram
(Supplementary Fig. 2S online). We observed a highly significant underrepresentation of short
proteins below 500 amino acids, which is most pronounced in the length classes up to 300 amino
acids (Fig. 3a). The same length-bias was observed for the two other datasets (data not shown)
pointing out the need to change the current experimental setup to cover this highly underrepresented
protein class to a higher extent. Thus gel-filtration experiments on Kc167-cell protein extracts were
performed to enrich for small proteins. The length-distribution of 1,194 proteins identified in these
gel-filtration experiments differed significantly from that of all proteins (data not shown) and the
4,455 previously identified Kc167 cell proteins (Fig. 3b), and revealed an over- rather than an
underrepresentation of the small protein class. Moreover, 15% of the 1,194 proteins were new
protein identifications of which 70% fell into the length classes below 450 amino acids (50 kDa)
(Fig. 3c), demonstrating the high value of this analysis-driven experimentation (ADE) approach.
Similar statistical analyses on the 3rd-instar larvae dataset (not shown) and a larger dataset of almost
8,000 proteins from a range of experiments showed that basic proteins were detected with a lower
frequency than expected, whereas the identification of acidic proteins showed a reciprocal bias (Fig.
3d). While our experimental set-up thus preferentially identified acidic proteins, many more acidic
proteins remain to be identified. For the ADE approach to be generally applicable and lead towards
a more complete proteome coverage, it also has to be able to add new protein identifications in an
area where the experimental set-up works well. We thus targeted acidic proteins by separating larval
proteins by free flow electrophoresis (FFE) in the pH range 4-7 and identified 1,960 proteins.
Statistical analysis of the protein pI distributions revealed that we could indeed achieve an even
more pronounced overrepresentation in the acidic range (Fig. 3ef). A comparison of unique larval
protein identifications before and after this round of ADE adaptation showed that in addition to
around 330 newly identified acidic proteins, we also identified a significant number of previously
undetected basic proteins (about 45% of the 600 newly identified proteins). This can be attributed to
the fact that we also analysed the protein fractions that concentrate at the boundary of the pH 4-7
gradient, in which the basic proteins were concentrated.
Cross-comparative searches for proteomics-based genome annotation
In a pilot study, a representative set of seven large experiments was searched (Supplementary
Table 2 online) against both our standard protein database and a six frame-translated genomic
database, generating two search outputs for every spectrum. The output was subsequently filtered
for spectra with peptide assignments displaying a high PeptideProphet score (p>0.9) in the genomic
database search but a low score (p<0.5) in the protein database search, indicating that the respective
protein sequence entry could be missing in the protein database. Using this concept 1138 MS/MS
scans were identified that defined 901 distinct peptides (Supplementary Table 5 online). These
peptides were then blasted against the non-redundant protein database (NCBI): 349 peptides
remained without any hit, 343 sequences showed certain homology, 132 revealed 100% identity to
proteins from other species and 77 sequences identified various transposable elements in
Drosophila, which were not contained in our protein search database. Next, the peptides were
searched against the genomic sequence to determine their exact coordinates. For a subset of
repeatedly-identified, high-scoring peptides, the genomic regions flanking these loci were inspected
manually for the presence of open reading frames (ORFs), using the Expasy translate tool1, or the
GenScan gene-finding tool2, respectively.
Data Processing and Statistical Validation
MS/MS spectra were searched using the Turbo- Sequest algorithm3 against our standard Drosophila
protein database (BDGP release 3.2) with the following initial criteria: requirement for trypsin
digestion, mass tolerance 3 Da, variable modification of the amino acids, methionine (+15.9994 Da)
and cysteine (+227,26 with a differential modification of +8.9339 Da for the heavy isoform of the
C-ICAT reagent; or +422.22 Da with a +8 differential modification for d8-ICAT-labeled cysteine,
Supplementary Table 2, online). The Sequest output was submitted to a suite of software tools
(trans proteomic pipeline; see www.proteomecenter.org/software.php) including the use of a
statistical model (PeptideProphet) to estimate the accuracy of peptide assignments to MS/MS
spectra4. This combination exhibited excellent sensitivity and selectivity in a recent comparison of
various database search engines5. It extracted high quality information on our dataset, yielding an
average false discovery rate (FDR) of 1.37% in spectrum assignments (average PeptideProphet P-
value of 0.985 for a peptide assignment). Despite a high redundancy in the dataset, some peptides
were identified only once. This set of 29,314 peptides (~5.8%) have to be distinguished from single
hit identifications in smaller datasets since the extensive analysis of samples from the six main
developmental stages are expected to generate redundant identifications also for peptides derived
from rare proteins. (A subset of these peptides are expected due to the analysis of very specific
protein samples that have been performed in this context). We therefore denominate these peptides
“one-hit” identifications and very likely a higher false positive rate is expected amongst this class of
peptides. Indeed, the average PeptideProphet P-value for this class of peptides is 0.974 and thus
slightly lower than the average PeptideProphet P-value for all peptides.
In contrast, if we apply other conventionally used, but less stringent filtering constraints based on
Xcorrelation (Xcorr>1.6 for charge state +1; Xcorr>2.4 for charge state +2; Xcorr>3.2 for +3)
scores we would reach a 75% proteome coverage (11,565 proteins; 102,630 unique peptides) and hit
more than 85% of all Drosophila gene models. However, since one of our key aims was to critically
assess and improve the genome annotation, we emphasized the selection of very stringent criteria.
Two different types of simulations were carried out (Fig. 2).
1. Experiments of different size (i.e. sum of identification trials) could have an effect on the increase
of distinct protein identifications, depending on the consecutive order of the experiments. The
average number of distinct protein identifications per experiment was 1,640 proteins, ranging from
approximately hundred to three thousand identified proteins. To rule out that the size of the
experiments has a large effect, we carried out a first simulation as a control. Out of a total of 12!
theoretically possible experiment orders (close to 480 million), 100 orders were randomly selected,
and the respective number of distinct protein identifications over time for these orders is shown by
the gray dots in Figure 2. If, by chance, in one permutation of the experiment order, several large
experiments are among the first ones for that order, the sum of distinct protein identifications at that
stage can be higher compared to that of the actual experiment order (red dots). Importantly, each of
the individual twelve experiments added some unique identifications.
2. In order to estimate the increase of distinct protein identifications beyond the identified 5,795
proteins for the entire proteome, several additional Monte Carlo simulations were carried out with
the following model:
A single protein identification trial in a single experiment can be regarded as a random trial with
16,743 possible outcomes. Since identification trials can be regarded as independent of each other,
and since the experimental conditions within one experiment can be regarded as constant, each
experiment can be described by a multinomial model. Results, e.g. frequencies of protein
identifications in a single experiment i consisting of n identification trials can be modelled as a
16,743-dimensional multinomial distributed random vector (Xi,1,Xi,2,…,Xi,16743)(n,pi,1,pi,2,…,pi,16743),
with parameters n, p1,...,p16743 with pi = probability of identification of protein i.
For a series of experiments with the aim to identify the whole proteome of an organism the random
variables Yi(n)=Sign(Xi,1)+…+Sign(Xi,16743), the count of distinct identified proteins in a series of n
identification trials in experiment i is of most interest. The maximum of the possible values of Yi(n),
the number of identifiable proteins, corresponds to the number of proteins with pi>0.
Importantly, in order to be able to carry out the simulations one has to assume a certain probability
of identifying those proteins that were not yet identified. Several simulations were carried out under
the following assumptions: An extreme and very unrealistic case, which assumes an equally high
probability of identifying any of the 16,743 proteins results in the blue curve (Fig. 2). More
realistically, for the 5,795 experimentally identified proteins pi,j, one can use as maximum likelihood
estimators their observed relative frequencies hi,j/ni where hi,j denotes the count of identifications of
protein j in experiment i of size ni (hi = # of identifications of protein i in all S2 experiments,
h1,...,hk>0 for k=5795 proteins seen). Finally, the probabilities for proteins not identified by the 12
experiments (16,743-k=10,948) were calculated based on either assuming an overall number of
identifications of 1 for each of these protein (green curve, hk+1:…: h16743 = 1) or of 0 (purple curve,
hk+1:…: h16743 = 0). In case of the green curve, all proteins can be identified, but this needs many
more identification trials than for the blue curve. The purple curve is of most interest, since we thus
got evidence by simulating the stochastic process (Yi(1),Yi(2),…,Yi(ni)) for each experiment that the
observed values of Yi(ni) (red dots) were significantly higher than the simulated values (purple
curve), a result, which contradicts the model assumptions. It indicates that several proteins have a
small probability of being identified, and that continued experimentation can identify more proteins.
Data storage (SBEAMS)
SBEAMS is being designed as a framework for collecting, exploring, and exporting data produced
by a variety of biological experiments. The framework is intended to be flexible for use with a
number of different experiments. At its core, SBEAMS is not a single program, but rather a set of
software tools designed to get data in and out of many evolving relational database schemas. The
code is designed around several Perl modules (web-server common gate interface (CGI) scripts)
which handle most of the communication with the relational database engine and also provide a
consistent Web front end. HTML offers the overwhelming advantage that it is easily accessible to
any computer without software installation. The implemented SBEAMS proteomics module for
Drosophila melanogaster (named FLYCAT) incorporates information about the protein source as
well as the experimental conditions. To analyze and compare datasets from different projects, the
SBEAMS database offers a variety of tools ranging from simple search options connected to
continuative links for retrieving information on specific entries up to specialized implementations
that allow to rapidly compare or summarize over multiple experiments. SBEAMS allows to
combine, resume or compare multiple projects and experiments from different origins enabling
researchers to validate or share data in a simple, quick and reproducible manner. A profound and
detailed description of SBEAMS and of its implementation is published in Desiere et al., 20056.
PeptideAtlas is an expandable resource for the integration of data from many diverse proteomics
experiments. Its main goal is the validation of gene prediction by protein expression data6. This is
achieved by mapping peptides derived from accurate interpretations of protein tandem mass
spectrometry (MS) data back to the corresponding genomic sequence (Ensembl 35.4c).
Identifications in PeptideAtlas have been statistically filtered to retain only those of the highest
quality. The Drosophila build of the PeptideAtlas is available at www.mop.unizh.ch/peptideatlas.
The PeptideAtlas web site offers interfaces for the public to contribute data, explore the database,
and download data.
Fractionation of cells/tissues by hypotonic lysis
Schneider S2 cells were grown to high density in 150 cm2 flasks in Schneider’s Drosophila cell
culture medium (Invitrogen) supplemented with 10% FCS. The cells were harvested by gently
tapping the flasks to dislodge the cells, and then transferred to 50 ml Falcon tubes and centrifuged
(1,000xg) for 7 minutes at 4o C. The pelleted cells were washed 3 times with 50ml of chilled PBS by
using the initial centrifugation conditions. To begin the cell fractionation procedure the final washed
cell pellet was resuspended in 5 volumes of fresh ice cold hypotonic cell lysis buffer [10 mM Hepes
(pH 7.90), 1.5 mM MgCl2, 10 mM KCl] supplemented just before use with 0.5 mM DTT and
CompleteTM protease inhibitor cocktail (Roche) and incubated in this solution for 10 minutes on ice.
The swollen cells were dounced 20x or until all cells were visibly lysed. (The efficiency of cell lysis
was assessed by using phase contrast microscopy to monitor the release of intact nuclei). The
resulting lysate was transferred to corex tubes and centrifuged at 4o C for 7 min. at 1,000xg. This
generated a nuclear pellet. The supernatant (combined cytoplasmic and membrane fractions) was
removed and processed as described in the next section.
The pelleted nuclei were recentrifuged at 4o C in a SS-34 rotor (Sorvall) at 20,000xg for 20 minutes
to further remove contaminating cytoplasmic and membrane proteins. The resulting supernatant was
removed and discarded. To extract nuclear proteins, the nuclear pellet was first resuspended in an
ice cold high salt protein extraction buffer [(20mM Hepes (pH 7.9), 25% glycerol, 1.5 mM MgCl2,
0.42 M NaCl)] supplemented just before use with 0.5 mM DTT, 0.5mM PMSF, and CompleteTM
protease inhibitor cocktail (Roche). Nuclear proteins were extracted over the next 30 minutes by
stirring this suspension at 4oC. The suspension was then centrifuged at 20,000xg in SS34 rotor
(Sorvall) at 4o C for 20 minutes. The resultant supernatant (the nuclear protein fraction) was
carefully removed and placed in Eppendorf tubes. The pellet was discarded; it contained abundant
histone proteins which were selectively retained in the pellet by the choice of the salt concentration
used in the extraction procedure7, 8. The initial hypotonic supernatant (cytoplasmic and membrane
fraction) was ultra centrifuged at 100,000xg (SW 60 rotor, Beckman) at 4o C for 90 minutes to
generate separate membrane and cytoplasmic fractions. The supernatant (cytoplasmic fraction) was
removed and the pellet (membrane fraction) was taken directly without further washing and
dissolved in ICAT Labeling Buffer (see below). The proteins from the cytoplasmic and the nuclear
fractions were each precipitated separately by adding 7 volumes of ice cold acetone followed by
incubation at -20°C for 90 minutes and centrifugation in an Eppendorf tube at full speed for 10 min.
at 4oC. The precipitated protein was resuspended again in ICAT Labeling Buffer at a final
concentration of 2 mg/ml) and stored at -80oC. In the case of the membrane fraction, however, the
proteins were directly resusupended in a slightly modified Membrane labeling buffer so that it
contained a higher percentage of SDS and urea (8M urea, 10mM TrisCl (pH 8.4) and 0.125% SDS).
The protein concentration of each fraction was measured using the BCA assay (Pierce) or the RC-
DC protein estimation kit from Biorad; Bovine Serum Albumin (BSA) served as a standard.
Total Extract from Adult flies
yw flies were reared in population cages at 25°C and fed on normal yeast agar plates. No distinction
was made between male and female flies for protein extractions. To extract total proteins, flies were
first ground and homogenized using mortar and pestle in fresh ice cold RIPA buffer (150 mM NaCl,
50 mM Tris-HCl, pH 7.5, 500 µM EDTA, 100 µM EGTA, 1.0% Triton X-100, and 1% sodium
deoxycholate) supplemented just before use with CompleteTM protease inhibitor cocktail (Roche).
They were centrifuged for 5 minutes at 1,000xg at 4°C to remove the insoluble parts. The
supernatant was homogenized further using 20-30 strokes in a glass douncer. The resulting
homogenate was spun at 1,000xg for 10 minutes at 4°C and proteins were precipitated using 7
volumes of ice-cold acetone. The resulting precipitated protein was resuspended again in ICAT
Protein extract from Adult fly heads
Around 3,000 adult yw flies (from a single population cage) were first immobilized at 4°C,
transferred to a 50 ml falcon tube and immersed into liquid nitrogen for a few seconds. After a
vigorous shake, the heads were sieved through a 0.8mm/0.4mm/0.1mm sieves in that order and
separated from the body. They were manually separated from other debris under microscope. The
heads were pooled onto a 50 ml falcon and frozen at -80°C. The frozen heads grinded by mortar and
pestle (cold) and homogenized in 1 ml of fresh ice cold hypotonic cell lysis buffer [10 mM Hepes
(pH 7.90), 1.5 mM MgCl2, 10 mM KCl] supplemented just before use with 0.5 mM DTT and
CompleteTM protease inhibitor cocktail (Roche). The homogenate was centrifuged in a pre-cooled
eppendorf at 1,000xg for 10 minutes at 4°C . The resulting supernatant were separated into
cytoplasmic and membrane and nuclear (insoluble) fractions like the cells/tissues.
Membrane protein extract from Adult flies
yw flies were reared in population cages at 25°C and fed on normal yeast agar plates. No distinction
was made between male and female flies for protein extractions. To extract total proteins, flies were
first ground and homogenized using mortar and pestle in 10% sucrose supplemented just before use
with CompleteTM protease inhibitor cocktail (Roche). They were centrifuged for 5 minutes at 600g
at 4°C to remove the insoluble parts. The supernatant was spun at 8,000xg for 30 minutes at 4°C.
The pellet was washed once using 50mMTris-HCl (pH8.4) and incubated for 60 minutes on ice to
lyse the cells. The cells were homogenized using 20-30 strokes in a glass douncer. The resulting
homogenate was transferred to ultracentrifugation tubes (1ml/tube), overlayed with 7ml of 48%
sucrose, 15ml of 28.5% Sucrose, and filled with10% Sucrose solution. The samples were spun
100,000xg for 2 hours at 4°C. The membranes were recovered from the interphase between 48% and
28.5%sucrose and proteins were precipitated using 7 volumes of ice-cold acetone. The resulting
precipitated protein was resuspended in ICAT Labeling Buffer at a final concentration of 2 mg/ml.
Protein extract from Adult and Larval Hemolymph
Adult flies were punctured with a sharp pre-cooled needle through the thorax and transferred quickly
into a 0.5 ml eppendorf tube with siliconized glass wool with the bottom clipped off and placed
inside a 1.5 ml eppendorf tube. When enough flies are pooled, the eppendorf tubes are centrifuged at
1500g for 5 minutes at 4°C. The hemolymph was collected from the 1.5 ml eppendorf. The larval
hemolymph was extracted by a similar manner, with the exception that they are initially grabed by a
pair of forceps and a small piece of the cuticle was torn off, taking care that the internal organs were
not damaged and when enough larvae were pooled, they were spun at 800g for 5 minutes at 4°C.
The proteins from the resulting hemolymph from larvae and flies were precipitated using 7 volumes
of ice-cold acetone at -20°C for 90 minutes and spun at full speed in an Eppendorf centrifuge for 10
minutes at 4oC. The resulting precipitated protein was resuspended in ICAT Labeling Buffer at a
concentration of 10 g/ml
Protein extract from larval fatbodies
3rd-instar yw larvae were first washed with PBS and homogenized in Balanced Salt Solution (5M
NaCl, 1M KCl, 1M MgSO4, 1M CaCl2, 1M Tricine, 1M Glucose, 1M sucrose, 1g BSA
supplemented just before use with CompleteTM protease inhibitor cocktail (Roche)). The grindate
was collected and centrifuged for 20 minutes at 3,000xg (4°C) in 250 ml polycarbonate bottles in
HS-4 rotor (Sorvall). The fat layer floating on the surface was collected and proteins were
precipitated using ice-cold acetone at -20°C for 90 minutes.
Protein extract from yw pupae
Pupae were collected from population cages and homogenized using ice-cold RIPA buffer and
supplemented just before use with CompleteTM protease inhibitor cocktail (Roche). The homogenate
was initially centrifuged at 800g to get rid of the insoluble fractions and proceeded as before
(hypotonic lysis) to obtain a membrane fraction.
Total protein extract from yw Embryos
The embryos (2h AEL) were collected from 100 apple agar plates and dechorionated. They were
homogenized using 8M urea, 10mM TrisCl (pH 8.4) and 0.125% SDS. The homogenate was
centrifuged at 800g and the supernatant, containing the proteins, were precipitated using ice-cold
acetone at -20°C for 90 minutes. The resulting precipitated protein was resuspended again in ICAT
Protein extract from Golgi Membranes
Kc cells were first homogenized by hypotonic lysis as described above. Organelle fractionation
(Golgi) was performed using OptiprepTM (Axis-Shield) by buoyant density in pre-formed
discontinuous iodixanol gradients as per the manufacter’s instructions.
Protein extract from Autophagosomes
Kc cells were first homogenized as described above for the generation of a membrane fraction.
Organelle fractionation (autophagosomes) was performed using OptiprepTM (Axis-Shield, Norway)
by buoyant density in pre-formed discontinuous iodixanol gradients as per the manufacter’s
Preparation of Chromatin fraction from Kc cells
The chromatin fraction from Kc cells was extracted using procedures described elsewhere after the
generation of the nuclear fraction as described above9.
Free Flow Electrophoresis (Protein and Peptide)
Total protein extract from 3rd Instar larvae was performed using RIPA buffer, as described above.
Free flow electrophoresis (FFE) was performed using a FFE-Weber FFETM free-flow electrophoresis
apparatus (FFE-Weber, Munich, Germany). All media contained 8 M Urea and 250 mM Mannitol.
The anodic stabilization media contained 100 mM α-Hydroxyisobutyric acid (HIBA), 150 mM DL-
2-Aminobutyric acid, 100 mM Nicotinamide and 15 mM Glycyl-glycine, separation media 1
contained 23% prolyte mixture (pH 4-7) and the cathodic stabilization media contained, 75 mM
Ethanolamine, 75 mM AMPSO, 150 mM TAPS and 30mM HEPES. For the separation of
Drosophila melanogaster proteins and evaluation of FFE, the fractions were collected into a deep-
well 96 well sample collection plate (maximum volume 2 mL for each well). The corresponding
samples from the same wells from 96-well plates were pooled and dissolved in 1% (w/v)
RapiGestTM (Waters) in 50 mM Tris-Hcl and digested overnight at 37°C [1:200 (w/w)
trypsin:protein]. For Peptide FFE, the samples were prepared as described above, with the exception
that they were digested before performing FFE.
Gelfiltration was done on a Superose 6 column (HR 10/30, Amersham-Pharmacia, Switzerland)
controlled by an automatic fast protein liquid chromatography (FPLC) station (Amersham-
Pharmacia) at 20°C. The column was equilibrated with a native buffer containing 150 mM NaCl, 1
mM DTT, 0.2 mM sodium-vanadate, 5mM EDTA, 1 g/ml Aprotinin (Sigma-Aldrich, Germany),
and 50 mM Tris pH 7.2. The column was calibrated with Catalase (232 kDa), Albumine (67 kDa),
Ovalbumin (43 kDa), Chymotrypsinogen (25 kDa), Ribonuclease A (15.6 kDa; all Amersham-
Pharmacia), and Cytidin (Merk, Switzerland). Before loading onto the column cytosolic fractions
were cleared of nucleic acids by precipitation with polyethyleneimine (Sigma-Aldrich, 0.1% w/v).
200 l of the cleared cytosolic fraction (1 mg/ml protein) was loaded onto the column and eluted at
a flow rate of 0.3 ml/minute. Fractions of 0.6 ml calculated to be < 25 kDa were collected. The
eluate was precipitated by trichloroacetic acid (50%), the pellet washed twice with 70% acetone, and
redissolved in digestion buffer containing 1M urea.
Iso Electric Focussing (IEF) of peptides
The digested proteins (nuclear extract embryo10) were brought up to 8M urea in the presence of IPG
buffer 3-10 NL (Amersham) and applied to a 13-cm IPG dry strip, 3-10 NL (Amersham). Using an
IPGphor (Amersham), the following focusing protocol was applied: 16 h 30V, 3h 500V, 8,000V up
to 30,000Vh. Alternative settings were used for the S2 cell extract (1h 500V 1 h 1,000V, 24h
8,000V). Excess cover oil was removed, and the gel was scraped off the plastic backing in 20 (17 for
S2) equally sized parts within 3 min. to prevent diffusion. Peptides were eluted in three sequential
solutions containing 0%, 50% and 100% acetonitrile in water and 0.1% TFA (twice in 40%
acetonitrile 0.5%TFA for the S2 cell experiments). Supernatants were combined, dried down and
redissolved in 5% acetic acid. Samples were cleaned of salts and residual oil using STAGE tips 26,
dried, and stored at -80°C before analysis.
Chloroform extraction of proteins (isolation of CHCl3-soluble proteins)
3rd-instar larvae were first washed with PBS and homogenized in 50mMTris-Hcl, pH 8.4
(supplemented just before use with CompleteTM protease inhibitor cocktail (Roche)) sonicating the
homogenate with 5 short pulses of 10 seconds each. The homogenate was extracted with
Chloroform (1:1) by vortexing for 2 minutes and a subsequent incubation of 20 minutes on ice
(chemical hood). The phases were separated for 7 minutes at 20°C in a table-top centrifuge at full
speed. The interphase was recovered and stored at -80°C. The Chloroform phase was recovered and
recentrifuged as indicated above. The aqueous phase was then completely removed and dried in a
vacuum dryer generating a yellowish, glassy pellet. The pellet was resuspended in 50mMTris-Hcl,
pH 8.4, 2% (w/v) RapiGestTM (Waters) and digest using trypsin (1:50).
Size exclusion Chromatography of proteins
S2 cells were washed 5 times in ice cold PBS and lysed in fresh ice cold RIPA buffer (150 mM
NaCl, 50 mM Tris-HCl, pH 7.5, 500 µM EDTA, 100 µM EGTA, 1.0% Triton X-100, and 1%
sodium deoxycholate) supplemented just before use with CompleteTM protease inhibitor cocktail
(Roche), reduced using 5mM Tributyl phosphine (SIGMA) for 30 minutes at 37°C. Cysteine
residues were carbamidomethylated using 10mM iodoacetamide dissolved in 10mM Ammonium
bicarbonate for 2 hours at 20°C. The protein mixture was centrifuged at 4°C at 2300xg using
ultrafree-Clhigh-flow Bionex-PB (Millipore, cutoff 10kD). The flowthrough was recovered and
proteins were precipitated using ice-cold acetone at -20°C for 90 minutes.
ICAT-labeling, processing and analyzing protein samples
The general procedures for ICAT-labeling, processing and analyzing protein samples are described
elsewhere11-13. Specific differences in these general procedures that were used in our study will be
detailed below. For the cytoplasmic and nuclear fraction, 4 mg of protein that had been dissolved in
ICAT Labeling Buffer was reduced with 5mM tributylphosphine for 30 minutes at 37 oC. The
sample was then split in half (2 mg/ml) and differentially labeled (1:1) with 1.2 mM of isotopically
light or heavy ICAT reagent [ABI, Applied Biosystems] for 90 min. in the dark at 20 oC using gentle
mixing. After combining the heavy and light labeled fractions, the mixture was diluted 6-fold with
H2O [final conc. of urea (1M) and SDS (.008%)] and digested with trypsin [1:50 (w/w)
trypsin:protein] (Promega, Madison, WI) at 37o C for 18 hours. Samples of each labeled fraction and
the trypsinized mixture were then run on polyacrylamide gels and silver- stained to assess whether
the labeling and trypsinization process were successful.
In the case of the membrane fraction 4 mg of the protein fraction was completely dissolved in
Membrane Labeling Buffer by mixing overnight at a concentration of 2mg/ml. At this ratio of SDS
to membrane lipid (15/1) all membrane lipid should have been removed from the integral membrane
proteins and the lipid and protein should be present in separate micelles. This membrane protein
sample was then reduced and differentially labeled with the light and heavy-ICAT reagent in the
same way as previously described for the cytoplasmic and nuclear fractions. (Separate control
experiments using cysteine-rich non-membrane proteins demonstrated that there was complete
ICAT labeling of the protein when this concentration of urea (8M) was present along with 1.2%
SDS) (unpublished observations, Kregenow and Brunner). After combining the light and heavy-
labeled fractions, the combined mixture was diluted 40-fold with H2O [final conc of urea (0.2M) and
SDS (0.03%)] and digested with trypsin [1:10 (w/w) trypsin: protein] at 37o C for 24 hrs. Samples of
each labeled fraction and the trypsinized mixture were analyzed, as before, on silver-stained
polyacrylamide gels to ascertain that both the labeling and trypsinization procedures were
Upstream fractionation of peptides
The peptide mixture was fractionated routinely on a cation-exchange chromatography column as the
first step of a three-step chromatography separation process (Supplementary Table 2 online). 50-60
one minute fractions were collected from a 2.1mm x 20 cm Polysulfaethyl A column (Poly LC Inc,
Columbia, MD, 5M particles, pore size 300A) at a flow rate of 200l/min using Buffer A, [20 mM
KH2PO4, 25% CH3CN (pH 3.0)], and buffer B [20 mMKH2PO4, 25% CH3CN and 350 mMKCl (pH
3.0)] according to the following regime (0 – 25% Buffer B for 30 min followed by 25% - 100%
buffer B for 20 min).
For the second stage of the three-step upstream peptide separation process, adjacent cation-exchange
fractions with a low peptide content (determined by 214 mm absorption measurements) were pooled
so that only approximately 30 cation fractions were purified further by affinity chromotography
using a ABI monomeric avidin column (See ABI literature for the protocol). The final eluent from
these samples, containing the ICAT-labeled peptides, was dried down and the peptides resuspended
in 15 l of Buffer A-1. (See below).
The third step in the separation process involved an online reverse phase (RP) capillary
chromatography column [75 um x 10 cm column with self-packed Magic C18AQ resin (5, 200A)
from Michrom BioResources] positioned directly upstream of the mass spectrometer. Peptides were
eluted from the RP column using a flow rate of approximately 250 nl/min. and a 100 minute
gradient of 10% to 35% buffer B-1 [Buffer A-1: 0.2% formic acid, 5% CH3CN in H20; Buffer B-1:
0.2% formic acid, in 80% CH3CN and 20% ACN]. The peptide content of each sample determined
how much of the sample was loaded onto the column (and therefore analyzed in the mass
spectrometer); the peptide content having been estimated from the 214 nm absorption measurement
made during cation-exchange chromatography. Between 25% - 50% of the cytoplasmic and nuclear
fraction samples were added while we generally loaded entire samples from the membrane fraction.
1. Gasteiger, E. et al. ExPASy: The proteomics server for in-depth protein knowledge and
analysis. Nucleic Acids Res 31, 3784-3788 (2003).
2. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA.
Journal of Molecular Biology 268, 78-94 (1997).
3. Eng, J., McCormack, A. & Yates III, J. An approach to correlate tandem mass spectral data
of peptides with amino acid sequences in a protein database. Journal of American Society for
Mass Spectrometry 5, 976-989 (1994).
4. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to
estimate the accuracy of peptide identifications made by MS/MS and database search. Anal
Chem 74, 5383-5392 (2002).
5. Kapp, A. et al. An evaluation, comparison, and accurate benchmarking of several publicly
available MS/MS search algorithms: Sensitivity and specificity analysis. Proteomics 5,
6. Desiere, F. et al. Integration with the human genome of peptide sequences obtained by high-
throughput mass spectrometry. Genome Biol 6, R9 (2005).
7. Dignam, J.D. Preparation of extracts from higher eukaryotes. Methods Enzymol 182, 194-
8. Dignam, J.D., Lebovitz, R.M. & Roeder, R.G. Accurate transcription initiation by RNA
polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res 11,
9. Mendez, J. & Stillman, B. Chromatin association of human origin recognition complex,
cdc6, and minichromosome maintenance proteins during the cell cycle: assembly of
prereplication complexes in late mitosis. Mol Cell Biol 20, 8602-8612 (2000).
10. Krijgsveld, J., Gauci, S., Dormeyer, W. & Heck, A.J. In-gel isoelectric focusing of peptides
as a tool for improved protein identification. J Proteome Res 5, 1721-1730 (2006).
11. Han, D.K., Eng, J., Zhou, H. & Aebersold, R. Quantitative profiling of differentiation-
induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat
Biotechnol 19, 946-951 (2001).
12. Smolka, M.B., Zhou, H., Purkayastha, S. & Aebersold, R. Optimization of the isotope-coded
affinity tag-labeling procedure for quantitative proteome analysis. Anal Biochem 297, 25-31
13. Von Haller, P.D. et al. The Application of New Software Tools to Quantitative Protein
Profiling Via Isotope-coded Affinity Tag (ICAT) and Tandem Mass Spectrometry: I.
Statistically Annotated Datasets for Peptide Sequences and Proteins Identified via the
Application of ICAT and Tandem Mass Spectrometry to Proteins Copurifying with T Cell
Lipid Rafts. Mol Cell Proteomics 2, 426-427 (2003).