Learning Center
Plans & pricing Sign in
Sign Out



									        Gene Arrays and Tissue Arrays for

  Handout for short course at the 91st Annual Meeting of the United States and Canadian
                   Academy of Pathology, March 1, 2002, Chicago IL.

Edward Gabrielson, M.D.
Associate Professor of Pathology and Oncology
Johns Hopkins University School of Medicine

Angelo DeMarzo, M.D., Ph.D.
Assistant Professor of Pathology, Oncology, and Urology
Johns Hopkins University School of Medicine

Ramaswamy Anbazhagan, M.D., Ph.D. is acknowledged for his contributions to this USCAP
course in the year 2001 and to this handout
Helen Fedor is acknowledged for her work in developing tissue microarray techniques and her
contributions to this handout

Course Overview:
       High-throughput gene expression array technologies offer new opportunities to find new
markers for diagnosis and classification of diseases. High-throughput tissue microarrays offer an
effective mechanism to test and validate candidate markers for diagnosis or classification, and to
test the validity of proposed classes of disease. Thus, these two approaches can be used
effectively in combination to accelerate the discovery and validation in molecular pathology.
       This short course will provide a broad overview of different gene array technology
platforms and the application of these technologies to the analysis of tissue samples. The depth of
discussion is limited by time constraints, but the course should provide a general background on
this technology to help pathologists who are contemplating use arrays in future work. With
reference to gene expression arrays, the course will cover three general topics: 1) array technology
(manufacturing processes, hybridization methods, and data acquisition), 2) data management and
analysis, and 3) specimen requirements.

Gene Expression Array Technology
       The history of cDNA gene expression array technology goes back to at lease 1991, when
Lennon and Lehrach developed spotted membrane arrays to analyze yeast transcripts (1). This
technology has been expanded significantly by groups at Stanford (2) and the NHGRI (3), where
two-color fluorescent labeling has been applied to high-density cDNA arrays spotted on glass. In
parallel, technologies for in situ synthesis of oligonucleotides on array surfaces have been
developed by Affymetrix (4, 5). Each of these different platforms continues to be used today and
the relative advantages and disadvantages warrant discussion.

Comparing different gene array platforms - oligonucleotide arrays and
cDNA arrays, fluorescence labeling and radiolabeling:
       The commonly available platforms include oligonucleotide arrays, cDNA arrays spotted on
glass slides, and cDNA arrays spotted on nylon membranes. Oligonucleotide Arrays are
currently manufactured by several vendors, including Affymetrix (,
Agilent (, and Clontech. Affymetrix synthesizes the
oligonucleotides in situ using photolithography, a technology developed for the computer chip

industry. Currently, this method is limited to synthesis of oligonucleotides of about 20 bases in
length. Agilent uses a variation of inkjet technology to synthesize oligonucleotides up to 60 bases
long directly on glass array surfaces. Agilent and other companies will also soon be offering
arrays of oligonucleotides that are spotted or printed by ink jet.

       Intutitively, oligonucleotide arrays would appear to have some important advantages over
cDNA arrays. In particular, specific, highly unique sequences can be selected for each gene and
thus cross-hybridization can be minimized. A major factor that has limited the use of
oligonucleotide arrays is expense, although the cost difference between synthesis of
oligonucleotides and preparation of cDNA clones is becoming relatively small.

       cDNA arrays are constructed by mechanically spotting PCR amplified DNA segments on
glass or nylon filter matrices. In general, glass slides are used for hybridization of fluorescent-
labeled samples and nylon filters are use for hybridization of radiolabeled samples.

       Again, each of these options (nylon filters/radiolabeling or glass slides/fluorescent
labeling) has advantages and disadvantages. Fluorescence can be measured at high resolution and
therefore array features can be densely packed. There is considerable diffusion of radioactivity (P-
32 or P-33) and this limits spatial resolution. For example, 10,000 or more elements can be
resolved on an area the size of a glass slide by fluorescence, but only about 2,000 elements can be
resolved on the same area by P-33 radioactivity. Another important advantage of fluorescence
labeling is that multiple samples, each labeled with a different fluor, can be co-hybridized to a
single array. A common tactic used currently is to label an internal standard (or control) with one
fluor and a test sample with a second fluor, allowing comparisons of multiple different samples by
reference to the standard ―control‖.

       Using radioactivity to label samples also has some significant merits. Importantly, the
methods for radiolabeling are well-standardized and can be performed efficiently in most
laboratories. In contrast, incorporation of fluors (particularly the Cy3 fluor) can be fickle. The
reagents, including the radioisotope, are inexpensive. Paradoxically, the diffusion of radioactivity
actually results in hybridization images that are more uniform than fluorescent images, simplifying
the issue of data acquisition.

       Radiolabeled samples are generally hybridized to nylon filter arrays using rotating
hybridization chambers. This results in an even distribution of labeled sample over the filter.
Fluorescent samples are generally hybridized to arrays printed on glass slides because glass has
significantly less autofluorescence than nylon and because the cost of the fluorescence reagents
necessitates the use of small reaction volumes.

       Sensitivity of the nylon filter/ radiolabeling platform is becoming less of an issue than in
past years. For radiolabeling, approximately 1 μg of total RNA is required for each array
experiment, in contrast to the 50-100 μg of RNA that is required for direct fluorescent labeling
(i.e., incorporation of labeled nucleotide). Newer labeling methods with a two-step fluorescent
label can obtain strong signals with less than 5 μg of RNA. Two such products are marketed by
Clontech (Atlas™ Powerscript™ Fluorescent Labeling Kit ) and Genispere
( ). In practical terms, 1 or 2 microdissected frozen section slides will
provide approximately 0.2 to 1.0 μg of total RNA.

       To further increase sensitivity, a number of laboratories are using amplification methods to
increase the amount of material available for hybridization (for example, see Wang et al, Nature
Biotechnology 18:457-9, 2000). Several kits are now commercially available for RNA
amplification with highly standardized methods (for example MessageAmp aRNA kit from

Array Manufacturing, Sample Preparation, Sample Hybridization, and Data Acquisition
       Most pathologists using gene array technology are not actually involved in the
manufacturing process. However, it is useful to discuss this process in order to improve our
understanding of limitations and potential problems associated with the use of gene expression
arrays. Several protocols are available on-line that cover all aspects of array production, from
amplifying clones to spotting. One recent, and frequently quoted, source is an article published in
Biotechniques by a group from TIGR (Hegde P. et al., Biotechniques 29:548-556, 2000). This
article is available online through a subscription (free) to Biotechniques at Protocols from a Cold Spring Harbor Library Microarray course
are also available online at

Preparation of cDNA for Array Spotting
       Most laboratories and some commercial vendors making arrays currently use clones of
genes or expressed sequence tags (ESTs) as a source for the DNA. The ESTs are cloned in related
common vectors and can be amplified by PCR using ―universal primers‖, several of which are in

       ESTs sequenced by the IMAGE Consortium are available from Research Genetics
( and ATCC ( The original sequencing was high-
throughput and single pass, and a very high percentage of the IMAGE clones do not have the
specified gene inserts. Sets of sequence verified clones are available from Research Genetics
(40,000 clones) and Incyte ( (8500 clones). These sets are also far from
perfect; approximately 10% of Research Genetics clones do not have a human gene insert or
produce two bands when amplified. Reportedly, in an additional 10% or so of the clones, the
insert does not have the designated sequence. Furthermore, these clone sets do not represent all
genes, or even all ―named‖ genes. Thus, currently produced cDNA arrays usually lack
representation of many genes already known to be important in disease and misrepresent many
other genes.

       An alternative method for making cDNA is to design gene sequence-specific PCR primers
and amplify the sequence using cDNA as a template. The cost of this method is driven by the cost
of the sequence-specific primers (about $20 per gene), but it is an effective means of adding
specific genes that are not otherwise available in clone sets.

       Within the past few years, several manufacturers have offered sets of synthesized
oligonucleotides to represent large number of genes. These synthesized oligonuceotides are
generally 50 to 80 bases in length and sequences are selected to represent the most unique portions
of gene sequences. The shorter sizes of the oligonucleotides (relative to sizes of cDNA segments)
can result in weaker signals, but control of the sequence specificity offers a significant advantage
over cDNAs.

Mechanics of Array Manufacturing
        Several companies now manufacture robotic equipment for mechanically spotting cDNA
 on arrays (see table below). The various robotic arrayers have differences with regard to robotic
 handling of plates and arrays, spotting tips, and overall throughput.

        One issue that is commonly overlooked in consideration of the mechanics of array
 manufacturing is the type of spotting tip used. The tip originally developed by the Pat Brown lab,
 and used in most commercially manufactured arrayers, has a ―pen and quill‖ configuration. This
 tip has a slot, which draws and releases DNA-containing solution through capillary action. These
 tips are subject to wear and even variability in a series of successive hits, in contrast to the ring
 and pin tip configuration. An alternative type of tip, used by the GeneticMicrosystems arrayer
 (, uses a ring and pin configuration. We use this apparatus in our
 laboratory and we are very satisfied with the high reproducibility of the spotting.

        Manufacturers of Microarray Spotting Robotics
  Cartesian Technologies                  
  Engineering Services                    
  Genetic Microsystems/ Affymetrix        
  Gene Machines                           
  Genomic Solutions                       
  Intelligent Automation Systems          
  Packard Instruments                     

        Another major variable in array production is the array substrate. There are a number of
 different brands of commercially available nylon filters, and, having been used for many years,
 these products are well established. In contrast, there are a wide variety of types of coatings used
 for glass slides in attempts to improve and standardize DNA binding. Poly-L-lysine coated slides
 are most commonly used, but other coatings include aminosilane and epoxy.

        Different laboratories also use a number of different spotting chemistries in attempts to
 provide the best results in hybridizations. DMSO helps to denature DNA and, because it is
 hygroscopic, it also helps slow drying of the plates during the printing process. Many array-
 printing facilities claim that DMSO causes the spotting solution to diffuse to a greater extent than
 desired, however. In our own laboratory, we have found that the PCR reaction product,
 maintained in the reaction buffers, is stable and suitable for arraying directly on to filters.

Sample Labeling and Hybridization
        Most labeling of sample mRNA is performed by incorporating label into a reverse
 transcriptase reaction. This can usually be done with total RNA as the starting point, eliminating
 the need to purify mRNA. For radiolabeling, P-33 or P-32 nucleotides (either CTP or ATP) are
 used in the reaction, and for fluorescence labeling, Cy3 or Cy5 tagged nucleotides are used. An
 effective protocol for radiolabeling samples is available on the Research Genetics Web site
 ( ), and effective protocols for direct labeling with fluorescence are available in
 the previously cited Biotechniques manuscript or on the Pat Brown lab Web page
 ( Again, newer labeling methods with a two-step
 fluorescent label (e.g., Atlas™ Powerscript™ Fluorescent Labeling Kit , Genispere
 ( ). can obtain strong signals with less than 5 μg of RNA.

        It is important for users of arrays to understand some of the technical aspects of
 hybridization. A number of factors can affect the hybridization of probe to template on the array,
 including buffering salts, temperature, and duration of hybridization. The presence of SDS in the
 hybridization buffer helps to minimize non-specific binding and the use of formamide maintains
 the denatured status of the probe molecules, even at low temperatures. Typically, hybridizations
 are carried out at relatively low, non-stringent, temperatures (e.g., 42° C) overnight and then
 washed several times under more stringent conditions to minimize non-specific binding. If one is
 particularly concerned about low-level expressed genes, the stringency of the washes becomes
 more important for differentiating low expression values from background noise. It is also
 important to remember that even though there is usually an excess of immobilized template on the
 glass slide or nylon membrane, only a portion of the labeled molecules from the sample actually
 hybridize to the template. Therefore, it is possible to increase signal by increasing the
 concentration of template to some extent.
          Unfortunately, there is little published data on specificity of hybridizations for specific
 arrayed sequences. Intuitively, sequences that are rich in C and G will have relatively high levels
 of non-specific binding and this will become increasingly important for genes expressed at low
 levels, where non-specific hybridization may exceed specific hybridization.

Array Image Analysis and Data Acquisition
          Data acquisition from array images can be quite complex, particularly for fluorescent
 labeling. There are several manufacturers (see table below) that currently produce scanners
 capable of detecting the Cy3 and Cy5 labels, and most are developing instruments capable of
 detecting other fluors as well. There are two primary manufacturers of phosporimagers used for
 quantitatively measuring signals from radiolabeled samples, Molecular Dynamics
 ( ) and Fuji ( ).

          Fluorescent Slide Readers
 GSI Lumonics                             
 Genetic Microsystems                     
 Genomic Solutions                        

          Image analysis involves several steps. First, spots must be identified. This task is
 simplified to some extent by the fact that the robotic systems produce regularly aligned spots on
 the array. The most simple image analysis software packages use manually-aligned grids to direct
 spot-specific data acquisition and more sophisticated packages have ―spot-finding‖ features.
 Irregularities in the array configuration and spurious signals can complicate this task.

          Following spot identification, hybridization intensities of each spot are measured. For
 radioactive decay-generated spots, this is a relatively simple task because the scatter of
 radioactivity produces spots of relatively symmetrical geometry. The PSCAN software measures
 intensity at a single pixel in the center of a manually aligned grid and these readings are

surprisingly reproducible, even after the grid is realigned. In contrast, the high-resolution
fluorescent spots have complex geometry, reflecting all irregularities of the original spot
geometry. In most cases, background levels are also measured (preferentially specific for each
spot) and these values are subtracted from hybridization intensities. The commercially available
fluorescent slide readers usually come equipped with software for complete single-experiment
analysis (i.e., converting an image to a quantitative data file).
       Software for Analysis of Radiolabeled cDNA Arrays
Research Genetics Pathways               
Clontech AtlasImage                      
P Scan (FREE!)                           

Commercial Vendors of Gene Expression Arrays
       Given the large start-up costs and the complexities of cDNA array manufacturing, it is not
unreasonable to consider purchasing arrays from a commercial vendor. Generally, experiences
with different commercially manufactured arrays have been variable and this may reflect the
abilities of the users as much as the quality of the arrays. The table below summarizes some of the
major vendors products. The product lines are changing constantly and potential users are
encouraged to seek current information from vendors’ Web sites.

                                                                       Number of Genes              Approximate cost
                 Vendor                     Platform Type                   Represented                    per array
Affymetrix                                Oligonucleotide          Up to 60,000                     $450 for 12K genes
(               arrays for human and
                                          other species
Agilent                                   Glass slides/ two-       12,000                           $2500 per 2 slide kit
(            color cyanine dye
Clontech Atlas                            Glass, nylon, and        Up to 8,300 human genes,         $500 - $1000
( )                plastic arrays using     many special emphasis
                                          oligonucleotides         arrays
Perkin Elmer (NEN) Micromax               Glass Slide/             2400                             $800
( )   fluorescence
Research Genetics                         Nylon membrane/          Approximately 5,000 genes        $1000 per array, can
( )                  radiolabeling            per filter, up to 40,000 total   be used up to 4 times

Data Analysis
          Analyzing expression of thousands of different genes, usually in a relatively small number
of samples, is challenging. Fortunately, software applicable to gene array data analysis and data
visualization is being developed at a rapid pace. No software is a substitute for a fundamental
understanding of the statistical methods being applied, and this usually requires active
involvement of a statistician with expertise in decision-based analysis. However, many
individuals using cDNA arrays are becoming familiar with standard approaches to array data
analysis and can apply software tools for a preliminary investigation of relationships and
visualization of data. Several data analysis software packages are tabulated below.
Silicon Genetics (Genespring)                     
Stanford University (FREE!!) (See also Eisen                http://rana.Stanford.EDU/software/
lab homepage at Lawrence Livermore)               

       A first step commonly used to process and analyze image analysis data is normalization.
This is necessary to adjust for differences in quantities (and quality) of starting RNA, differences
in labeling efficiencies, and differences in detection efficiencies (or, in the case of phosphorimager
detection of radiolabeled probes, differences in exposure times. Normalization can be based on a
subset of ―housekeeping‖ genes or the entire set of genes represented on the array.

       For example, arrays with radionucleotide will typically be imaged individually and the data
from multiple arrays will be assembled together in a spreadsheet. Some arrays will have relatively
high values compared to other arrays because the sample had more RNA or a better labeling
reaction. To make reasonable comparisons of gene expression levels across the different samples,
                              Sample A Sample B Sample C Sample D Sample E
                 Gene1               8     964        17     491      759
                 Gene 2             17       69       34       33       79
                 Gene 3              3       98        5       50     105
                 Gene 4           542        14    1094         6       15
                 Gene 5             28       19       57        9       10
                 Gene 6             26        1       51        1        1
                 Gene 7             29       53       60       31       66
                 Gene 8             11       26       22       12       27
                 Gene 9           480      945       501     996     1021
                 Gene 10          231      444       244     409      433
adjustments must be made to normalize the expression for each sample individually.

       All of the array data analysis software packages that have been developed in recent years,
as well as simple spreadsheet software such as Excel, can perform this function. This initial
normalization is generally performed as a linear scaling, with each individual value re-expressed
as a percent of the total or a fraction of the mean. Table below shows a simple example of linear
Sample ―Raw” Gene Expression Values

             Sample A Sample B Sample C Sample D Sample E
Gene1               8     964        17     491      759
Gene 2             17       69       34       33       79
Gene 3              3       98        5       50     105
Gene 4           542        14    1094         6       15
Gene 5             28       19       57        9       10
Gene 6             26        1       51        1        1
Gene 7             29       53       60       31       66
Gene 8             11       26       22       12       27
Gene 9           480      945      501      996     1021
Gene 10          231      444      244      409      433

Sample Gene Expression Values Scaled to Percent of Mean

                  Sample A    Sample B   Sample C   Sample D    Sample E
Gene1             0.058182    3.661223   0.081535   2.409225    3.016693
Gene 2            0.123636    0.262058    0.16307   0.161923     0.31399
Gene 3            0.021818    0.372199   0.023981   0.245339    0.417329
Gene 4            3.941818    0.053171   5.247002   0.029441    0.059618
Gene 5            0.203636    0.072161   0.273381   0.044161    0.039746
Gene 6            0.189091    0.003798   0.244604   0.004907    0.003975
Gene 7            0.210909    0.201291    0.28777    0.15211    0.262321
Gene 8                 0.08   0.098747   0.105516   0.058881    0.107313
Gene 9            3.490909    3.589062   2.402878   4.887144    4.058029
Gene 10                1.68   1.686289   1.170264   2.006869    1.720986

          With arrays co-hybridized to two samples with different fluors, it is necessary to adjust for
different starting quantities of RNA and labeling efficiencies for each of the two samples on the
array. Usually, there is an underlying assumption that the total amounts of RNA labeled with
either Cy3 or Cy5 are equal, and thus the overall Cy5 to Cy3 ratio should be adjusted to 1 for
either the entire set of arrayed genes or a subset of housekeeping genes. Thus, while relative Cy3
or Cy5 intensities will vary from spot to spot, these variations will average out over the thousands
of spots on the array.

          Once expression values have been normalized, expression for individual genes can now be
compared across a series of samples. However, if we wish to examine several different genes in
this manner concurrently, it may be difficult to compare highly expressed genes to low-level
expressed genes. Thus, it is often desirable to normalize expression for each gene across all
samples. For example, in the sample data shown above, genes 4, 6, and 7 are all expressed at
about 10x higher levels in samples Band D than in samples A and C.         Normalization, using a log
scale with offset of 1.0, across the samples will help visualize these relationships when the data is
graphed. Furthermore, when performing quantitative measures of similarity (such as the
correlation coefficient) it may be desirable to have all gene expression differences of the same
magnitude contribute in a reasonably similar manner to the calculation, regardless of their absolute

          A table of the fully normalized sample data and an intensity plot of this data are shown

Normalized Sample Data Set
              Sample A   Sample B   Sample C    Sample D   Sample E
 Gene1        0.065894   1.793547   0.091329    1.429085   1.620145
 Gene 2       0.631117   1.260089   0.817861    0.812524   1.478408
 Gene 3       0.116056   1.701378   0.127424    1.179765   1.875377
 Gene 4       2.238621   0.072586       2.567   0.040654   0.081137
 Gene 5       1.600888    0.60181    2.08741    0.373246   0.336646
 Gene 6       2.139939   0.046839   2.703729     0.06048   0.049013
 Gene 7        0.95473   0.914947   1.261749    0.706401   1.162172
 Gene 8       0.893664   1.093495   1.164812    0.664351   1.183678
 Gene 9       0.982493   0.996635   0.801024    1.159568    1.06028
 Gene 10      1.016155   1.018571   0.798695    1.134779      1.0318

Intensity Plot of Sample Data

       (red indicates high relative expression and green indicates low relative expression)

       Several questions may be asked when looking at this data. The most obvious question is
which genes are differentially expressed at a significant level between two different situations.
Many published studies have used arbitrary post-normalization cutoff levels (e.g., 2-fold or 3-fold
to define differential expression, but it is probably more reasonable to use traditional statistical
measures, such as the t-test, to define expression differences. For some genes, expression may be
tightly regulated in a normal state and even a small change in expression may be significant,
whereas for other genes, wide variability of expression levels may be the norm.

       Much of the excitement over the microarray technology has come from the ability to look
beyond individual genes, and to consider global patterns of gene expression. For this type of
analysis, more powerful decision-based analysis methods are needed. This is discussed below in
the context of tumor classification, but is also applicable to many other situations.

Decision-based analysis methods used for classification of tumors
       One of the most promising applications of gene arrays is in the classification of cancers
(and potentially other diseases) by gene expression profiles. If different classes of neoplasia are
already defined by some meaningful characteristic, such as tissue of origin or distinctive
histopathological features, gene arrays can be used to screen expression levels of many genes to
find those that reliably distinguish well-characterized examples of the different classes of tumors.
The best candidate markers can subsequently be used to correctly classify tumors that are currently
difficult to categorize. Identifying gene expression profiles characteristic of predefined subsets of
cancers is known as ―class distinction‖.

       In many situations, however, our current classification structures cannot distinguish tumors
that have vastly different clinical behavior and biological phenotypes. Creating entirely new,
clinically meaningful classification systems – or class discovery – represents a far more
challenging problem than class distinction, but is clearly an important goal for pathologists.

       For class discovery, previous distinguishing characteristics of different classes – and
possibly not even the number of different classes – is unknown. In this situation, gene expression
data is analyzed to find previously unrecognized subsets of tumors that share gene expression
profiles. The gene expression profiles represent objective measures of the cellular phenotype and,
if properly analyzed and categorized, can lead to an objective classification structure.

       The most widely used method used to ―discover‖ new classes of cancers is hierarchical
clustering (6), with basic relationships between samples determined by the Pearson Correlation
Coefficient or Euclidean distances. Going back to our sample data set, we can quickly calculate
correlations as follows:

       Correlation Coefficients
                     Sample A Sample B Sample C Sample D Sample E
         Sample A            1
         Sample B     -0.99496        1
         Sample C      0.97435 -0.97384        1
         Sample D     -0.90669 0.911652 -0.96752        1
         Sample E     -0.98274 0.967683 -0.96358 0.869933        1

       Casual inspection of these correlation coefficients leads to the recognition that samples A
and C are almost identical to one another, as are samples B, D and E to each other. A set of
similar samples can be called a ―cluster‖. (Note that, in reality, negative correlations are virtually
never seen unless analysis is restricted to a subset of genes that have very different expression
between different samples.) This scheme of hierarchical classifications leads to a graphical
representation known as a dendrogram, which can be used effectively in identifying and
displaying patterns in gene expression data. A dendogram of the sample data set, with the
intensity plot reorganized according to the relationships among the different samples and genes, is
shown below.

                                           Sample A Sample C Sample B Sample E Sample D
                                Gene 10
                                Gene 9
                                Gene 2
                                Gene 3
                                Gene 1
                                Gene 5
                                Gene 4
                                Gene 6
                                Gene 7
                                Gene 8

       While this approach has generated much excitement in the research community
(particularly among cancer researchers), there is still insufficient validation of methods to
determine its ultimate impact on pathology. One frequently cited example of class discovery was
the identification of two molecularly distinct forms of diffuse large B-cell lymphoma (DLBCL) by
groups from Stanford and the National Heart, Lung, and Blood Institute (7). In this instance, the
two forms of DLBCL were identified on the basis of gene expression patterns indicative of
different stages of B-cell differentiation. One type expresses genes characteristic of germinal
center B cells (germinal center B-like DLBCL) and the second type expresses genes normally
induced during in vitro activation of peripheral blood B cells (activated B-like DLBCL). This
molecular classification was reported to have prognostic value independent of stratification by the

usual clinical grading, with germinal center B-like DLBCL patients having improved survival
compared to the activated B-like DLBCL patients. However, a more recent study that evaluated a
second series of patients using the markers for germinal center B-like and activated B-like features
failed to find any difference in survival between the two groups (8). This discrepancy highlights
the need to validate the biological meaning of any ―discovery‖ of new classes by gene expression
profiling. A logical manner to conduct such validation studies on a large population is through the
use of tissue microarrays, discussed below.

Data Reduction
       Analysis screening of gene expression data should be performed with some consideration
of data reduction to reduce the number of variables by eliminating uninteresting ones or to
substitute gene expression values with more parsimonious representation of the data. Some
screening is almost always appropriate. For example, gene expressions that are constant can carry
no discrimination ability. Additional screening can be based on explanatory power, or measures
of marginal association. We favor examining the ratio of within—group variation to between--
group variation, with groups defined according to pathological criteria. This requires coping with
measurement error, or the variation in signals that would arise from replicates. Previous studies
using gene expression data to classify cancers (e.g, the DLBC lymphoma study) have used such
data reduction methods.

       Parsimonious representations of the data can sometime be identified when there is
biological knowledge about a pathway; the presence of a pathway (say gene 1 overexpressed; gene
2 underexpressed; gene 3 overexpressed) can then be used to construct new and more highly
explanatory variables. Normally such knowledge is not available and we will probably need to
proceed by applying discovery techniques, which find ―centroids‖ of gene expression levels and
assign states. These methods are in development and have not yet been widely applied.

Other Statistical Approaches to Cluster Analysis
       There are a wide variety of methods that have been applied to decision-based analysis of
array data and statistician now take great delight in developing new methods. A few of these
methods will be briefly discussed, and are also discussed in greater detail in a recent review article
by John Quackenbush (9).
K-Means Clustering: K-means clustering begins analysis by finding k groups, such that the
distances within groups are minimized. Different algorithms and clustering indices account for the
different possibilities using this technique. Most of the clustering indices are defined numerically
by partitioning the total dispersion in the data into within-cluster and between-cluster components.
This technique has intuitive appeal. It may be useful to derive the appropriate number of clusters
initially by using hierarchical methods. Predefining the number of classes could be an important
limitation of SOM's and k-means clustering. Although different possible numbers of classes can
be iteratively tested, the clustering algorithm will force all samples into one of the classed and may
thus compromise the distinctiveness of a particular group if assigning its members to other classes
provides a better overall solution to the problem.

Self-organizing maps: The self organizing map (SOM) algorithm is also finding application in
gene expression analysis. As for k-means clustering, SOM requires predefining the number of
classes; the algorithm finds a suitable set of cluster centers around which the data appear to
aggregate and partitions the sample of tumors according to distance from the centers. SOM
classifications lend themselves to interesting visualization techniques such as the Ultsch
representation, which would be especially helpful when the number of clusters is moderate or
large. The software GENECLUSTER, which implements a version of SOM tailored to gene
expression data is available on the Web (

Projection Methods: Projection methods identify interesting linear combination of the gene
expression patterns. These can be used for visualization, dimension reduction, and class discovery
by agglomeration of samples around few interesting linear combinations. One of the oldest an
most popular projection technique is that of principal components, already used successfully in
small-dimensional gene expression data problems. Staged approaches also allow us to refine our
tools and perform the necessary statistical methods work and validation that would ultimately
provide a satisfactory solution to these problems.

Specimen Requirements for Array Analysis
 Microdissection and Purity of Assayed Cell Population
         Many of the studies published to date on the use of gene arrays to analyze tissue samples
 have had minimal input from pathologists. Unfortuantely, ―grind and bind‖ assays do not
 faithfully tell what is happening in the particular cells of interest.

         One approach to obtaining pure samples of a particular cell type is microdissection. Laser
 capture microdissection has been applied to frozen tissue samples, but we have actually found
 mechanical microdissection to be easy, inexpensive, and reliable. Microdissection is applicable to
 purifying any type of cell population or tissue structure.

         A very useful approach to obtaining purified samples of common cancers is to make
 smears from scrapings of cut tumor surfaces. Epithelial cells adhere to one another and typically
 are scraped off in clusters. The scraped material can be smeared on a slide and stained, allowing
 visualization and easy microdissection of these clusters.

RNA Quality Issues: Array Analysis using RNA from Paraffin-Embedded
         There have been highly variable experiences regarding quality requirements for RNA used
 in array analysis. Requirements for Affymetrix oligonucleotide arrays are, for example, reportedly
 very stringent and even modest amounts of RNA degradation are not tolerated. Our experience
 with cDNA arrays has been at the opposite end of the spectrum, and we have even done studies on
 RNA isolated from routinely processed paraffin blocks.

         Pathologists routinely use formalin fixation and paraffin embedding for routine
 examination of tissues. Routinely processed paraffin-embedded tissues are often linked to valuable
 clinical and epidemiological data and developing reliable techniques for analysis of gene
 expression by cDNA arrays in these samples could be extremely useful for correlating molecular
 findings with clinical data. Although these fixation procedures may cause significant degradation
 of RNA and other macromolecules, tissues processed in this manner have been successfully used

for molecular biology experiments including PCR-based methods and in situ hybridization and
measuring RNA expression levels of individual genes. Our laboratory (R. Anbazhagan, et al.) has
recently examined the possibility of using RNA from paraffin blocks for gene array analysis. We
used a commercially available paraffin tissue RNA extraction kit for the extraction of RNA from
paraffin-embedded tissue sections ( ), and modified the procedure to allow
micro dissection of the cells prior to analysis. Our results show that RNA extracted from paraffin
embedded tissues can give results that are surprisingly interpretable. Differences in fixation
conditions that are commonly encountered in the routine handling of pathology specimens result
in alterations in gene expression profile, but in a very consistent and potentially predictable

       1.      Lennon, G. G. and Lehrach, H. Hybridization analyses of arrayed cDNA libraries,
       Trends Genet. 7: 314-7., 1991.
       2.      Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. Quantitative monitoring of
       gene expression patterns with a complementary DNA microarray, Science. 270: 467-70.,
       3.      Duggan, D. J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J. M. Expression
       profiling using cDNA microarrays, Nat Genet. 21: 10-4., 1999.
       4.      Fodor, S. P., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T., and Solas, D. Light-
       directed, spatially addressable parallel chemical synthesis, Science. 251: 767-73., 1991.
       5.      Fodor, S. P., Rava, R. P., Huang, X. C., Pease, A. C., Holmes, C. P., and Adams, C.
       L. Multiplexed biochemical assays with biological chips, Nature. 364: 555-6., 1993.
       6.      Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and
       display of genome-wide expression patterns, Proc Natl Acad Sci U S A. 95: 14863-8.,
       7.      Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A.,
       Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T.,
       Hudson, J., Jr., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T.
       C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Staudt, L. M., and et al. Distinct
       types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature.
       403: 503-11., 2000.
       8.      Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C.,
       Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K.
       W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C., and
       Golub, T. R. Diffuse large B-cell lymphoma outcome prediction by gene-expression
       profiling and supervised machine learning, Nat Med. 8: 68-74., 2002.
       9.      Quackenbush, J. Computational analysis of microarray data, Nat Rev Genet. 2:
       418-27., 2001.

Some useful Web sites that can provide additional information on gene array technology are
provided below. The technology – and approaches used to analyze data - is rapidly evolving, and
these Web sites can provide up-to-date references. - this is a Web site developed by Y.F. Leung, a scientist in
Hong Kong. There are many useful links related to the technology and articles using the
technology. - This Web page is an outline for a
biostatistics course at Johns Hopkins. There is a reading list for contemporary data analysis
methods with many links to pdf files of articles. - This is a Web page developed by Leming Shi. This site has
hundreds of links to product manufacturers, publicly available array data, articles, etc.

Introduction to Tissue Microarrays
       Genomic approaches such as RNA profiling are providing a new powerful means to
discover disease-related genes. One of the most challenging aspects presented by high throughput
gene expression approaches is that they usually generate a large battery of potential targets.
Determination of which genes are truly important for classification in terms of diagnosis,
prognosis, and therapy represents a bottleneck. How does one begin to validate and prioritize
potential targets? Often the first step in attempts to discover the disease relevance of a given gene
is the elucidation of the precise cells that express it in normal and diseased human tissues. This is
even more powerful if it can be done simultaneously with an assessment of clinical significance. A
major limitation, however, is that in situ based molecular analysis is cumbersome and often
limited by the availability of suitable reagents such as high quality antibodies or a robust system
for in situ hybridization. In addition, adequate validation of biomarker expression often requires
large patient cohorts with long-term clinical follow-up. Finally, interpretation of expression
results requires a pathologist. While many of these limitations have yet to be solved, Tissue
Microarrays (TMAs) are emerging as a breakthrough in our ability to rapidly analyze the
expression of existing and new biomarkers using archival pathology specimens.

       Multi-tissue blocks were first introduced by Battifora et al. in the so-called ―sausage‖ or
Multi-Tissue Tumor Block (MTTB) where up to 100 separate tissues were processed together into
a single paraffin block1. Recently, Kononen et al. introduced a new method of combining multiple
tissues into a single paraffin block that uses a novel sampling approach, with regular size and
shaped tissues. This allows for many more specimens that are precisely arrayed to be inserted into
the blocks2; for reviews see3-6. That the production and use of tissue microarrays is taking hold is
demonstrated by the fact that at this 2002 USCAP meeting in Chicago at least 65 abstracts
employed TMA technology7.

What is a Tissue Microarray and What are its Advantages Over Standard
Tissue Blocks?
       The TMA consists of cylindrical paraffin embedded tissue cores that are acquired from
primary ―donor‖ blocks. The donor block is a standard tissue block that may be from surgical
pathology, autopsy or research material. A morphologically representative area of interest within
the donor block is identified under the microscope using a stained section (usually Hematoxylin
and Eosin stained) on a glass slide as a guide. The tissue cores are removed from the donor and
inserted into a ―recipient‖ paraffin block using a custom patented instrument from Beecher
Instruments. Using a precise spacing pattern, tissues are inserted at high density, with up to 1000
tissue cores in a single paraffin block. Sections from this block that are cut with a microtome are
placed onto standard glass slides that can then be used for in situ analysis. Depending on the
overall depth of tissue remaining in the donor blocks, tissue arrays can generate between 100 and
500 sections. Once constructed tissue microarrays can be used with a wide range of techniques
including histochemical staining, immunohistochemical/immunofluorescent staining, or in situ
hybridization for either DNA or mRNA.

           Figure 1. Example of prostate TMA containing 400 tissue cores (20 cores x 20 cores, 0.6 mm each).
           Immunohistochemical staining for alpha methyl acyl CoA racemase (AMACR). A low power view
           (upper left) shows the entire array. We typically place a column of assorted control tissues in the 9 th
           column and 3 additional columns of assorted control tissues in columns 18-20. A medium power is
           shown in the upper right that corresponds to the boxed area from the low power view. A higher
           power view is shown below that corresponds to the boxed area in the medium power view. This
           shows strong staining in prostate adenocarcinoma and weak/negative staining in normal prostate
           epithelium. Images were obtained with the Bacus Labs BLISS Imaging Workstation.
Since relatively small areas of tissue (down to 0.6 mm in diameter) are obtained from the donor
blocks, this method can help to expand the usefulness of existing archival paraffin blocks by
facilitating the construction of multiple ―duplicate‖ blocks. This significantly expands the capacity
of the tissue samples, indicating that more studies can be performed using limited samples. Other
advantages of TMAs are that they are designed for high throughput screening of expression, while
providing uniform reaction conditions and multiple built-in potential positive and negative
controls. Since only one or a few slides are subjected to the staining procedure, TMAs also allows
one to economize use of reagents, which can at times be quite limiting. Since several hundred
cases are now present on one or few slides, TMAs also cut down on microtome sectioning of
numerous paraffin blocks. It should be pointed out that even after removal of cores from donor
blocks, these donor blocks usually still retain sufficient residual tissue for adequate pathological

Are Tissue Microarrays Valid for Clinico-Pathological Studies?
       The most frequently asked question regarding tissue microarrays is how do they account
for heterogeneity of tissues? Several groups have been addressing this issue. In terms of
published work, Camp et al., examined the number of ―disks‖ or TMA spots required to
adequately represent the expression of three common antigens, estrogen receptor (ER),
progesterone receptor (PR) and the Her2/neu oncogene, in 38 cases of invasive breast carcinoma8.
They made TMAs containing 10 tissue cores from the same tumor and compared the results of
analysis of staining of the TMA cores to that obtained using a single standard whole tissue section
from which the TMA cores were derived. They found that two spots produced similar results to
the whole tissue in more than 95% of the cases.

       The largest published study to date to address the issue of tissue heterogeneity is that of
Torhorst et al. who examined ER, PR and p53 in breast carcinoma9. In this study, the clinical
relevance of staining was examined by comparing immunostaining results using standard sections
versus TMA slides in 553 breast cancer patients. Four high-density TMA blocks were constructed,
each containing 1 core from a different region of the tumor from each patient. For ER, the range of
positive staining from the 4 different blocks was from 78.9 to 80.8%, which compares to that
observed in a large standard section (79.8%). When using a single 0.6 mm sample, loss of ER
correlated inversely with disease specific survival to a similar extent as standard sections. Thus,
very little benefit was obtained using more than one spot for ER. For PR, the addition of multiple
spots analyzed increased the frequency of positive staining towards that obtained with standard
sections (41.1% with 1 spot, 53.1% with 4 spots, compared with 60.3% with conventional
sections). Loss of PR as analyzed on TMA spots was also predictive of poor outcome in a manner
similar to that of the conventional section, even when only 1 0.6 mm core was used. For p53, the
frequency of positive staining was less using TMAs than when using standard slides (15.2-20.9%
for single spots, up to 24.1% for all 4 combined, as compared to 42.8% for conventional sections).
However, in terms of prognostic significance the correlation between p53 staining and poor
outcome was stronger using TMA spots, even one spot, than was that of conventional sections.

       In terms of prostate cancer, Rubin et al. used digital image analysis (CAS200, Bacus Labs,
Lombard, IL) on TMAs containing 10 replicate tumor samples from 88 cases of prostate cancer10
to evaluate Ki-67 expression for each case. Four cores provided the optimal sampling for TMA
cores using a Cox proportional hazards analysis to determine predictors of time until PSA
recurrence following radical prostatectomy for clinically localized prostate cancer. Fewer TMA
samples significantly increased Ki-67 variability and a larger number did not significantly
improve accuracy.

       Several other studies have also examined the question of the representation of tissues in
TMAs using various markers in different tumor types11, 12. In general, although they vary
somewhat in terms of the recommendations for sampling, all studies indicate there is usually
excellent agreement between the use of TMAs and standard tissue sections for clinico-pathological

       From a theoretical point of view the question of how many samples are required to
adequately perform a study is related to the variability of the parameter being analyzed. Thus, for
homogeneous markers, a single TMA spot per case will be adequate. At our institution we
routinely take 4 cores each from areas of prostate tumors and matched normal tissue (Fig. 1) in
order to maximize the usefulness of the TMA since we do not know what biomarkers we will be
applying in the future. Thus for some studies, this will be ―overkill‖ and for others it may be
barely adequate.

       Another potential difficulty in terms of how many cores to take involves the fact that not
all TMA cores will be present on all TMA slides. Having at least one additional TMA core will
help ensure the presence of the number of cores that one hopes to obtain on the final TMA slide.

Digital Image Acquisition and Analysis
       TMA slides can be viewed under conventional microscopes. In this case a key, usually in
the form of a spreadsheet, corresponding to the x y coordinate system is used and
histopathological diagnoses and interpretations are recorded. The data can be recorded on paper
for later entry into a spreadsheet or database, or, it can be entered directly into the computer. One
of the difficulties with this approach is that since the array spots do not have their coordinates
printed on the slide it is likely that the user may loose track of the x and y coordinates of given
spots and have to repeatedly become reoriented.

Prostate SPORE Approach to Imaging
       Several groups have been developing methods to acquire and archive digital images of the
TMA spots for evaluation on a computer monitor such that the data is linked to underlying clinico-
pathological information regarding the array spot. To acquire digital images of TMA spots a
number of users have been using the Bacus Labs Incorporated Slide Scanner (BLISS, Bacus™
laboratories, Lombard, IL)13 as previously described14, 15. The BLISS imaging system consists of a
Zeiss microscope equipped with a software driven motorized stage, integrated digital video
camera, and a customized personal computer running Microsoft Windows. The Bacus Labs Tracer
software program is designed to scan entire glass microscope slides, or any part of the slide, using
any available microscope objective. The slides can then be viewed using the free downloadable
WebSlide Viewer. Slide images are generally stored on a server running WebSlide Server
software from Bacus Labs.

Figure 2. Screen shot of the current Microsoft Access Database Form used by our group for scoring immunohistochemical staining. This
approach requires a pathologist or highly trained technician to provide a diagnosis, as well as, a score for each image.

         The system has been adapted to scan TMA spots using customizable features that were
developed in collaboration with the Prostate SPORE Tissue Microarray Working Group
(University of Michigan, Baylor College of Medicine, and Johns Hopkins University School of
Medicine)14. The operator then indicates through the software the number and location of the array
spots. All array spots are then automatically scanned at full resolution (in most cases a 20x Zeiss
Plan-Apochromat® objective is used, although other objectives can be used). Each array spot is
imaged as 6 individual 640 x 480 pixel images that the software automatically “tiles” into a single
composite image. The composite images are stored in a file containing the embedded x y
coordinates from the tissue array spot, along with user provided information regarding the TMA
slide that was scanned. The composite image files can be viewed individually using any number of
image viewers, or can be imported into a relational database and related by their x y coordinates to
the specimen from which they were derived (Fig. 2). More recently, Bacus Labs have developed
an Active X plug-in that is designed to facilitate image handling for viewing TMA spots directly
from inside a database application of your choosing. While the Bacus system has been leading in

         this field of imaging TMA spots, several other systems that are either being adapted or are
         potentially adaptable to imaging TMA spots are shown in Table 1.
 Company/University Product                     Website
 Bacus Labs         BLISS             

 Yale                          Spotfinder
                               and Aqua

 Chromavision                  ACIS   

 MicroBrightField              Virtual
 Inc.                          Slice  

 CompuCyte                     Laser

 TissueInformatics             Quant f(x)
Table 1. Digital microscopic imaging systems that already adapted for TMAs, are in the process of adapting, or may be adaptable for TMA

       TMA Clinico-Pathological and Image Data Handling
                   TMA-based technology prompted the need for a system that effectively managed data
         generated from this high-throughput approach. The use of a large spreadsheet has been the
         standard solution to handle voluminous amounts of data. This approach is useful for experiments
         on a one-time basis, but becomes very cumbersome when analyzing multiple markers on a given
         specimen or when having multiple observers render diagnoses and scores on a give specimen. As
         part of the same on-going collaborations between the Specialized Programs in Research
         Excellence for prostate cancer, the three groups from the University of Michigan, Johns Hopkins
         University, and Baylor College of Medicine (Houston, TX) have been developing systems to
         manage TMA clinical data and TMA image data14-16. The overall architecture of the system is that
         the TMA images are examined on a computer screen where the image is presented within a
         database form. The user then enters data regarding the image directly into the form. The type of
         data that is recorded is flexible and can include items such as image quality, diagnosis, and
         immunohistochemical scoring. The three original Prostate SPORE groups, as well as members

from a newer set of Prostate SPOREs from several other institutions are continuing to work on
these efforts and share information. An example of one such database form is shown in Fig. 2.

       Another system, also based on scanning TMA spots using the BLISS system, has been
presented this USCAP meeting by the Stanford group17. The approach was to develop two
software programs to aid in analysis of staining results and rapid retrieval of TMA spot digital
images. The authors designed a program called Deconvoluter that allows for rapid transformation
of immunostained data, recorded in Microsoft Excel, into a format that can be used for cluster
analysis. The program Cluster is used to group data with regards to tumor staining pattern with
antibodies, analogous to tumors grouped according to RNA expression. The Clustered pattern can
be viewed in another freely available software program called Treeview.

       At Yale University David L. Rimm’s group has developed a software program called
Spotfinder that uses an imaging microscope system from Deltavision for scanning TMA slides that
are stained with fluorescent markers18. Trotter et al. have presented at this meeting on mapping,
navigation and data management of TMA data19.

Quantification of Immunohistochemical Staining
Using TMA
       At present no module exists in the Bacus Labs solution to TMAs for quantification of
TMA results, although this is under development. The first commercial system for imaging TMA
slides has been developed by Chromavision as an extension of their Automated Cellular Imaging
System (ACIS)20. The system is designed specifically for quantification of immunohistochemical
staining using images obtained by light microscopy. In this system, the user examines a low power
image of the scanned TMA slide and selects the region of interest to view at higher power. Next a
region on the higher power view is selected for further automated analysis of the staining results
such as area of positive staining as well as intensity of staining. The data is then exportable to
spreadsheets or database management systems.

       In terms of academic institutions, Yale University’s TMA lab been developing a dedicated
system for quantifying fluorescent TMA spot images called Aqua18. The system uses multiple
colors with fluorescent imaging to automatically quantify staining at the sub-cellular level. This is
a particularly intriguing approach since it is being designed to eliminate the need for the
pathologist to interpret each array spot. JY Rao et al. also had a presentation at this meeting on
quantification of protein expression using fluorescence image analysis21.

       As can be seen, imaging, displaying, storing and analyzing TMA data is in a state of flux.
Whether commercial applications will arise from these efforts so that one can purchase a package
for one stop shopping, or, whether there will be continued piecemeal development is an open
question. However, at present there are excellent tools emerging for high throughput solutions of
TMA data analysis. Those interested can plan to attend an upcoming meeting dealing with TMA
infrastructure that will be embedded in 19th annual Automated Information Management in the
Clinical Laboratory (AIMCL) symposium in Ann Arbor, Michigan on 22-24 May 2002.

Special Array Types
Frozen TMAs
       At least two groups have so far produced TMAs using frozen tissues22, 23. The advantage to
having frozen tissue arrays is that post fixation can be tightly controlled, some antibodies do not
bind to formalin fixed epitopes, and the quality of nucleic acids (DNA/RNA) is generally much
higher. Hoos and Cordon-Cardo have developed a simple devise, independent of the Beecher
Instruments devise for frozen TMA construction22 and Fejzon have adapted the Beecher
Instrument machine using dry ice to keep the donor and recipient blocks frozen 23.

Cell lines
       One of the most powerful types of controls for immunohistochemical staining and for in
situ hybridization is the use of well-defined cell lines. For example, when one is working up a new
antibody against the retinoblastoma protein (pRB), an excellent negative control would be a
retinoblastoma cell line that was shown to be genetically null for RB alleles. Similar approaches
are useful for p53, etc. In fact, when one knows the status of either the genomic DNA
corresponding to a given gene in a given cell line, or information about the expression at the
mRNA and/or protein level, then suitable positive and negative controls can be obtained for
immunohistochemistry or in situ hybridization for essentially all non-house keeping genes. Along
these lines, a very useful type of control is to use an isogenic system to induce expression of a
given gene in a cell line that does not normally express it. In this case the untreated cell serves as
the negative control and the treated cell serves as the positive control. We recently used this
approach to develop controls for examining COX-2 expression in prostate cancer cells where we
induced expression of COX-2 using phorbol ester in PC3 cells24. We have been isolating cells
grown in culture, fixing them in formalin, embedding them in agarose, and then submitting them
for routine processing into paraffin (see 25). The advantage of a solid-like gel suspension such as
agarose is that the cells are not lost in processing, which often happens when preparing paraffin
blocks from cell pellets. Christopher A Moskaluk presented at this USCAP meeting on an
improved method of agarose embedding of cell lines using tubular agarose molds that enhances
histologic results, ease of embedding, tissue core length and cell density26.

Obtaining TMA Slides
        For those seeking to obtain slides from existing tissue arrays, there is a NIH program
called Tissue Array Research Program (TARP) where individual slides are available for purchase
at very reasonable rates27. There are also several commercial sources including Research Genetics
(VastArray™   Tissue Arrays)28, Zymed Laboratories (MaxArray™)29; and SuperBioChips30.

Array Construction Tips and Techniques
        Several sources of information are available for tissue microarray protocols, tips
techniques, and trouble-shooting. These include recent reviews5, 31, a detailed web site with
protocols32, and sites developed by the NIH33, 34.

        Several technical tips from our TMA lab, that we have found to be helpful now that we
have produced 99 TMAs containing over 14,473 tissue cores, include the following:

   1.      In terms of needle sizes the larger needles (1.5 or 2 mm) can cause damage
           to both the donor and recipient blocks. We have found that less damage will
           occur if the blocks are warmed somewhat. This is achieved by placing both
           the donor and recipient blocks under a 25-watt light source for the duration
           of the manual arraying.
   2.      In terms of the type of embedding medium for recipient TMA blocks, we
           prefer to use Paraplast X-Tra.
   3.      We have found that for 0.6mm TMAs, that 400 cores are ideal since the
           block does not need to be turned during array production. This is basically
           due to a limitation in the manual arrayer.
   4.      After completing an array, the block is placed into a 37 °C oven for 30
           minutes face down on a glass microscope slide. The block is then chilled on
           ice before taken off the slide. When the larger needle sizes are used (1.5 or
           2.0) it may be helpful to then take a hot slide (80°C) and touch it carefully
           to the array block surface to help fill in the gaps.
   5.      Sectioning TMAs can be very difficult. We use a state of the art microtome
           that is dedicated to TMAs. We change the blade frequently and have a very

           experienced histotechnologist who cuts the blocks. We have found that in
           most cases the best histology is obtained by simply cutting the blocks as
           with any paraffin block. However, The Paraffin Tape Transfer System from
           Instrumedics Inc. can be helpful when tissues are very soft or when very
           hard. In general, however, we prefer not to use the tape transfer method
           since it can inadvertently result in loss of TMA spots and the presence of
           residual adhesive can be visually distracting.

Tissue Fixation Issues
       One of the most important issues in constructing tissue microarrays is to be sure of the
quality of the tissues used. While there is no best fixative for all types of applications, the vast
majority of archival specimens have been fixed in 10% neutral buffered formalin, which is
actually 4% formaldehyde. While many antibodies produce excellent staining using formalin fixed
tissues, not all tissues are properly fixed after ―routine fixation‖. We have found for p27Kip1 that
longer fixation times yield more reliable results35. For construction of our prostate TMAs we
typically only use tissues that we are certain of the quality of fixation; we use either tissues that
were freshly harvested and sectioned into thin portions (less than or equal to 3 mm thick) that are
fixed in large volumes, or those where the prostates have been injected with formalin to provide
uniform fixative coverage36. In addition, all blocks are subjected to immunohistochemical
staining prior to selection for a TMA.

New Target Validation
       Our approach at John Hopkins to new target identification and validation has been to
discover genes that are highly over expressed in prostate cancer and attempt to validate expression
using TMAs37. In collaboration with William Isaacs and Jun Luo and with the NIH (Jeff Trent’s
group), a list of candidate biomarkers was generated from this approach, and we have begun to
validate these candidate markers38. Several groups are taking this exciting approach (e.g. 39).

Other Potential Type of TMAs
       The types of tissues one can use for construction of TMAs is unlimited. For example those
consisting of human xenograft tumors may greatly extend the ability of many different
investigators to have access to these tissues. In addition, animal tissues and as well as xenografts
can be subjected to drug treatments in vivo and then the pattern of gene expression alterations can
be documented using cDNA arrays and/or TMAs. Similar types of studies with human tissue can
be obtained and tissues can be arrayed before and after treatment, providing access to tissues
possible, such as in clinical trials. The number of cell lines is also growing rapidly and a resource
that provides TMAs containing many cell lines would be very valuable.

1.     Battifora H: The multitumor (sausage) tissue block: Novel method for
       immunohistochemical antibody testing. Lab Invest 55:244-248,1986
2.     Kononen J, Bubendorf L, Kallioniemi A, Barlund M, Schraml P, Leighton S, Torhorst J,
       Mihatsch MJ, Sauter G, Kallioniemi OP: Tissue microarrays for high-throughput
       molecular profiling of tumor specimens. Nat Med 4:844-847,1998
3.     Moch H, Kononen T, Kallioniemi OP, Sauter G: Tissue microarrays: What will they bring
       to molecular and anatomic pathology? Adv Anat Pathol 8:14-20, 2001
4.     Bubendorf L, Nocito A, Moch H, Sauter G: Tissue microarray (tma) technology:
       Miniaturized pathology archives for high-throughput in situ studies. J Pathol 195:72-
5.     Rimm DL, Camp RL, Charette LA, Olsen DA, Provost E: Amplification of tissue by
       construction of tissue microarrays. Exp Mol Pathol 70:255-264, 2001
6.     Rimm DL, Camp RL, Charette LA, Costa J, Olsen DA, Reiss M: Tissue microarray: A
       new technology for amplification of tissue resources. Cancer J 7:24-31,2001
8.     Camp RL, Charette LA, Rimm DL: Validation of tissue microarray technology in breast
       carcinoma. Lab Invest 80:1943-1949, 2000
9.     Torhorst J, Bucher C, Kononen J, Haas P, Zuber M, Kochli OR, Mross F, Dieterich H,
       Moch H, Mihatsch M, Kallioniemi OP, Sauter G: Tissue microarrays for rapid linking of
       molecular changes to clinical endpoints. Am J Pathol 159:2249-2256,2001
10.    Rubin M, Dunn R, Strawderman M, Pienta KJ: Tissue microarray sampling strategy for
       prostate cancer biomarker analysis. Am J Surg Pathol, In Press, 2002
11.    Nocito A, Bubendorf L, Maria Tinner E, Suess K, Wagner U, Forster T, Kononen J, Fijan
       A, Bruderer J, Schmid U, Ackermann D, Maurer R, Alund G, Knonagel H, Rist M,
       Anabitarte M, Hering F, Hardmeier T, Schoenenberger AJ, Flury R, Jager P, Luc Fehr J,
       Schraml P, Moch H, Mihatsch MJ, Gasser T, Sauter G: Microarrays of bladder cancer
       tissue are highly representative of proliferation index and histological grade. J Pathol
12.    Hoos A, Urist MJ, Stojadinovic A, Mastorides S, Dudas ME, Leung DH, Kuo D, Brennan
       MF, Lewis JJ, Cordon-Cardo C: Validation of tissue microarrays for immunohistochemical
       profiling of cancer specimens using the example of human fibroblastic tumors. Am J
       Pathol 158:1245-1251, 2001
14.    Bova GS, Parmigiani G, Epstein JI, Wheeler T, Mucci NR, Rubin MA: Web-based tissue
       microarray image data analysis: Initial validation testing through prostate cancer gleason
       grading. Hum Pathol 32:417-427, 2001

15.   Manley S, Mucci NR, De Marzo AM, Rubin MA: Relational database structure to manage
      high-density tissue microarray data and images for pathology studies focusing on clinical
      outcome : The prostate specialized program of research excellence model. Am J Pathol
      159:837-843, 2001
17.   Lui CL, Natkunam Y, Prapong W, Montgomery K, Botstein D, Brown PO, van de Rign M:
      Software tools for high-throughput analysis and image retrieval of immunohistochemistry
      stains obtained on tissue microarrays. United States and Canadian Academy of Pathology
      91st Annual Meeting, Chicago IL, Abstracts#1421, 2002
19.   Trotter MJ, Demetrick DJ, Ciezar SD: Mapping tissue microarrays: A simplified method
      for microarray navigation and data management. United States and Canadian Academy of
      Pathology 91st Annual Meeting, Chicago IL, Abstract#1440, 2002
21.   Rao JY, Seligson D: Protein expression analysis using quantitative fluorescence image
      analysis. United States and Canadian Academy of Pathology 91st Annual Meeting,
      Chicago IL, Abstract#1431, 2002
22.   Hoos A, Cordon-Cardo C: Tissue microarray profiling of cancer specimens and cell lines:
      Opportunities and limitations. Lab Invest 81:1331-1338, 2001
23.   Fejzo MS, Slamon DJ: Frozen tumor tissue microarray technology for analysis of tumor
      rna, DNA, and proteins. Am J Pathol 159:1645-1650, 2001
24.   Zha S, Gage WR, Sauvageot J, Saria EA, Putzi MJ, Ewing CM, Faith DA, Nelson WG, De
      Marzo AM, Isaacs WB: Cyclooxygenase-2 is up-regulated in proliferative inflammatory
      atrophy of the prostate, but not in prostate carcinoma. Cancer Res 61:8617-8623, 2001
25.   Meeker AK, Gage WR, Hicks JL, Simon I, Coffman JR, Platz EA, March GE, De Marzo
      AM: Telomere length assessment in human archival tissues: Combined telomere
      fluorescent in situ hybridization and immunostaining. Am J Pathol, in press, 2002
26.   Moskaluk CA: Embedding of cultured cells for tissue microarrays. United States and
      Canadian Academy of Pathology 91st Annual Meeting, Chicago IL, Abstract#1428, 2002
31.   Jensen TA, Hammand MEH: The tissue microarray-a technical guide for histologists. The
      Journal of Histotechnology 24:283-287,2001
35.   De Marzo AM, Fedor H, Gage WR, Rubin MA: Inadequate formalin fixation reduces
      reliability of p27kip1 immunohistochemical staining: Probing optimal fixation time using
      high density tissue microarrays. submitted 2002
36.   Ruijter ET, Miller GJ, Aalders TW, van de Kaa CA, Schalken JA, Debruyne FM, Boon
      ME: Rapid microwave-stimulated fixation of entire prostatectomy specimens. Biomed-ii
      mpc study group. J Pathol 183:369-375,1997
37.   Luo J, Duggan DJ, Chen Y, Sauvageot J, Ewing CM, Bittner ML, Trent JM, Isaacs WB:
      Human prostate cancer and benign prostatic hyperplasia: Molecular dissection by gene
      expression profiling. Cancer Res 61:4683-4688, 2001

38.      Luo J, Gage WR, Hicks JL, Wanders RJ, Trent JM, Isaacs WB, De Marzo AM:
         Overexpression of alpha methyl acyl-coa racemase (amacr) in prostate cancer analyzed by
         cdna microarrays and high density tissue microarrays. United States and Canadian
         Academy of Pathology 91st annual meeting, Chicago IL, Abstrat#721, 2002
39.      Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta KJ,
         Rubin MA, Chinnaiyan AM: Delineation of prognostic biomarkers in prostate cancer.
         Nature 412:822-826, 2001
Acknowledgments: We would like to thank Mark A. Rubin and the members of his laboratory for
illustrating hands on techniques to help us get started in the construction of tissue microarrays. We
also thank Marcella Southerland for diligent construction of TMAs and Gerrun March for
perseverance in scanning TMA slides.


To top