Irrational Exuberance in Clinical Proteomics by tym76564



Irrational Exuberance in Clinical Proteomics
Simon M. Lin and Warren Alden Kibbe

   Is the siren call of ‘‘omics’’-based tests luring researchers,            years of technical improvements, a recent study has shown
physicians, patients, and entrepreneurs onto the rocks of                    dramatically improved intralaboratory and interlaboratory
unproven and unapprovable diagnostics technology, or will                    reproducibility across microarray measurements (2). Similar
the winds of technological change enable them to reach the                   reproducibility study has also been conducted using surface-
beachhead safely? In the marketplace, patients and caregivers                enhanced laser desorption/ionization time of flight platform
welcome minimally invasive tests enabling improved decision                  at multiple laboratories (3). Similar to the early results with
making and better outcomes. Thanks to rapid developments in                  microarrays, variability attributable to instrumentation has
high-throughput technologies, a single blood draw can be used                been confounded by differences in data processing and
to assess single nucleotide polymorphisms, build mRNA                        analysis methods (4). Different algorithms, such as baseline
expression profiles, profile protein and metabolites levels                  subtraction, calibration, denoising, and peak finding, dramat-
simultaneously, at ever-reduced cost. The hunt for diagnostic                ically affect the interpretation of the raw instrument data.
biomarkers has also quickly moved from laborious hypothesis-                    Given the current status of measurement reproducibility and
driven techniques to high-throughput technologies resulting in               lack of standardization of calibration and peak calling between
biomarker development becoming an exercise in data organi-                   instrument vendors, many researchers use proteomics in a
zation and mining. However, the lure of lucrative financial                  discovery mode. These researchers view proteomics as a rapid
return coupled with ‘‘early to market’’ strategies encourages                screening tool for generating new hypotheses. Candidate
rapid and risky investments at an early phase in the maturity of             proteins are selected for further evaluation using more
these technologies. This high-octane environment is very                     traditional, lower-throughput techniques, such as ELISA assays
similar to previous booms of designing drugs through rational                (5). However, purifying and identifying proteins for further
computation, identifying novel drug targets from genomic                     characterization from the peaks suggested by surface-enhanced
sequences and validating prostate-specific antigen as a bio-                 laser desorption/ionization time of flight can be tricky, as
marker for prostate cancer. The first two approaches have been               discussed by Hayashida et al. (1). Moreover, after identifying
remarkably unsuccessful in producing Investigational New                     the protein, there is no equivalent, universal mechanism for
Drug Applications. Although prostate-specific antigen has been               quickly validating protein abundance data similar to reverse
approved by the Food and Drug Administration as a biomarker                  transcription-PCR for measuring mRNA abundance in the DNA
for prostate cancer, the debate over its value continues. The                microarray world.
accompanying study by Hayashida et al. (1) represents a recent                  Another important, unresolved issue in the application
effort of evaluating the utility of proteomics in a clinical setting.        of mass spectrometry – based analysis of proteins is the quan-
   History repeats itself in many different ways. By reflecting on           tification of the detection limit in a standardized, machine-
the recent lessons learned in DNA microarray technologies and                independent manner. Again, taking the example from
the application of microarray technology in the clinic, we can               microarrays, it was the Latin-square spike-in data set from
anticipate some of the problems that will arise in proteomics                Affymetrix that was pivotal in the development of reproducible
and perhaps avoid some pitfalls. Issues of measurement                       analysis algorithms and the establishment of confidence
reproducibility, detection limit, small sample size versus high              intervals on the reproducibility and reliability of measurements
dimensionality, standardization of data representation, and                  from Affymetrix chips. Results from studying this data set
experimental design and analysis are quite similar between                   indicates that the detection of differentially expressed genes
DNA microarray experiments and proteomic experiments, and                    with low abundance at 0.125 to 0.25 pmol is challenging and
many of the solutions from microarray experiments are being                  defines the current detection limit (4). Thus far, we have not
successfully applied to proteomics.                                          seen a similar, publicly available, Latin-square design experi-
   To commercialize a clinical test, the measurement must be                 ment to characterize a protein mass spectrometry profiling
reproducible across laboratories and the results are directly                machine. The establishment of the detection limit is of great
comparable regardless of instrumentation and personnel.                      interest, particularly in serum proteomics where the informa-
Early microarray studies showed shockingly little concordance                tive biomarkers might be circulating in very low abundance.
for measurements taken at different locations and platforms                     As with the early days of microarray studies, the size of the
even when sample handling was highly controlled. After many                  cohort in proteomics studies is usually small, due to the cost of
                                                                             the measurements and/or the difficulty in procuring appropri-
                                                                             ate patient samples. A typical proteomics study may involve a
Authors’ Affiliation: Rober t H. Lurie Comprehensive Cancer Center,          few dozen samples and measure tens of thousands or even
Northwestern University, Chicago, Illinois                                   millions of variables. This ratio of samples to variable size is
Received 8/10/05; accepted 8/10/05.                                          contrary to the traditional application of multivariate statistics,
Requests for reprints: Warren Alden Kibbe, Robert H. Lurie Comprehensive     where the ratio between sample size and number of variables
Cancer Center, Northwestern University, Chicago, IL 60611. Phone: 312-695-
1334; Fax: 312-695-1352; E-mail: wakibbe@
                                                                             are suggested to be larger than 30:1. With such a small sample
  F 2005 American Association for Cancer Research.                           size and huge variable search space, the probability of finding
  doi:10.1158/1078-0432.CCR-05-1744                                          associations by random chance is quite high even when                                                     7963          Clin Cancer Res 2005;11(22) November 15, 2005

analyzed at what are traditionally statistically stringent con-                            publication. For instance, in the Hayashida study (1), samples
ditions. Appropriate cross-validation methods are necessary to                             are annotated with gender, age, tumor location, and stage;
reduce false positives that can lead to false optimism induced                             however, no information is available on smoking history,
by overly optimistic predictions (6). Hayashida et al. (1)                                 Helicobacter pylori infection, and dietary factors that are known
followed the paradigm of leaving 15 cases as an independent                                risk factors in esophageal cancer (12). Asymmetrical distribu-
validation set from the 27 cases of training set. However, cross-                          tions of these risk factors among the small patient population
validation alone is not a panacea for small sample size because                            in that study could, for instance, result in a remarkably different
the variance of cross-validated error rate can be large enough to                          set of results and confound the attempt to generalize the results
challenge its usefulness (7) and small studies are subject to                              of small studies to broader populations.
other biases that are not understood or characterized in the                                  Omics studies generate data on a scale unprecedented in the
individuals. There is a correlation between the sample size of                             traditional domain of biostatistics. Physician scientists have,
the study and the reproducibility of findings in follow-up                                 therefore, consulted with computer scientists to handle these
validation studies (7, 8). With the continued and rapid drop in                            large data sets and seek interesting patterns with data mining
cost of microarray-based experiments, we are now seeing                                    methods. These teams have quickly found that building a
microarray studies involving hundreds of patients. We expect                               reasonable classifier is not sufficient to interpret the data. The
to see similar trends in proteomics experiments, ameliorating                              effect of study design, patient selection criteria, selection bias,
this particular concern.                                                                   and clinical utility/benefit must all be evaluated. This com-
   Given the issues above, it is popular to conduct reanalysis or                          plexity warrants a joint effort by physician scientists, mass
meta-analysis using raw data coming from other groups. Thus,                               spectrometrists, clinical epidemiologists, biostatisticians, com-
the desire to share experimental data between research groups                              puter scientists, and bioinformaticists to address the complexity
has resulted in the adoption of data standards, such as MIAME                              of proteomics.
(9) and MIAPE (10). Unfortunately, these standards do not                                     Inflated expectation fueled irrational exuberance in the
extend into clinical experiments. CDISC and HL-7 both have                                 financial market for Internet companies circa year 2000 and
clinical data working groups with existing and proposed                                    was followed by a bursting of the ‘‘Internet bubble.’’ In spite
standards for clinical data elements but these efforts are not                             of this, the Internet has continued to change the way we
widely known by the omics research community. Also, CDISC                                  conduct research and do business. Similarly, controversial
and HL-7 are focused on data representation, not defining a                                news releases and debates of publications on clinical
‘‘minimum useful data set’’ in the way that MIAME and MIAPE                                proteomics are challenges to reevaluate both the skepticism
have done. Consequently, other than a verbal description in                                and the hype surrounding proteomics and lead to a better
Materials and Methods section or a table in Results section of                             assessment of the clinical utility of proteomics. As reflected in
articles, no consistently detailed clinical data are captured and                          the ‘‘Possible Prediction’’ portion of the title, the article by
reported for clinical proteomics experiments, limiting the                                 Hayashida (1) represents a timely report and a rational step
ability of investigators to independently verify or combine data                           toward a better assessment of clinical utility, given the many
from multiple experiments. For example, a recent breast cancer                             limitations a small, single-institution study represents. These
microarray study (11) was published as a later clarification of a                          and similar studies warrant a larger multi-institutional study
potentially biasing effect of tumor size of patients (size is                              to examine the reproducibility and robustness of the
widely regarded as a primary characteristic of the tumor in                                predictions using the current state of proteomic instrumenta-
cancer studies) that had not been included in the original                                 tion and techniques.

1. Hayashida Y. Possible prediction of chemoradiosensi-     5. Howard BA, Wang MZ, Campa MJ, Corro C,                     an empirical assessment. Lancet 2003;362 :
  tivity of esophageal cancer by serum protein profiling,     Fitzgerald MC, Patz EF, Jr. Identification and valida-      1439 ^ 44.
  this issue; 2005;11:8042 ^ 7.                               tion of a potential lung cancer serum biomarker           9. Ball C, Brazma A, Causton H, et al. An open letter on
2. Irizarry RA. Multiple Lab Comparison of Microarray         detected by matrix-assisted laser desorption/ioniza-        microarray data from the MGED Society. Microbiology
  Platforms. Johns Hopkins University, Dept. of Biosta-       tion-time of flight spectra analysis. Proteomics            2004;150:3522 ^ 4.
  tistics Working Paper 71,           2003;3:1720 ^ 4.                                          10. Orchard S, Hermjakob H, Julian RK, Jr., et al.
  jhubiostat/paper71 2004.                                  6. Simon R, Radmacher MD, Dobbin K, McShane LM.               Common interchange standards for proteomics da-
3. Semmes OJ, Feng Z, Adam BL, et al. Evaluation of           Pitfalls in the use of DNA microarray data for diagnos-     ta: public availability of tools and schema. Proteo-
  serum protein profiling by surface-enhanced laser de-       tic and prognostic classification. J Natl Cancer Inst       mics 2004;4:490 ^ 1.
  sorption/ionization time-of-flight mass spectrometry        2003;95:14 ^ 8.                                           11. Kopans DB. Gene-expression signatures in breast
  for the detection of prostate cancer: I. Assessment of    7. Ransohoff DF. Rules of evidence for cancer molecu-         cancer. N Engl J Med 2003;348:1715 ^ 7; author reply
  platform reproducibility. Clin Chem 2005;51:102 ^ 12.       lar-marker discovery and validation. Nat Rev Cancer         1715 ^ 7.
4. Cope LM, Irizarry RA, Jaffee HA,Wu Z, Speed TP. A          2004;4:309 ^ 14.                                          12. Lagergren J. Adenocarcinoma of oesophagus: what
  benchmark forAffymetrix GeneChip expression meas-         8. Ntzani EE, Ioannidis JP. Predictive ability of DNA         exactly is the size of the problem and who is at risk?
  ures. Bioinformatics 2004;20:323 ^ 31.                      microarrays for cancer outcomes and correlates:             Gut 2005;54 Suppl 1:i1 ^ 5.

Clin Cancer Res 2005;11(22) November 15, 2005                                       7964                                            

To top