A comparison of microarray databases

Document Sample
A comparison of microarray databases Powered By Docstoc
					Margaret Gardiner-Garden
and Timothy Littlejohn
                                     A comparison of microarray
are, respectively,
Bioinformatics Analyst and
Chief Scienti®c Of®cer at
Entigen Pty Ltd (formerly            Margaret Gardiner-Garden and Timothy G. Littlejohn
                                     Date received (in revised form): 8th February 2001
known as eBioinformatics) in
Sydney, Australia. Entigen
provides web-based access to
bioinformatics and                   Abstract
biotechnology-related e-             Microarray technology has become one of the most important functional genomics
commerce for life scientists in
                                     technologies. A proliferation of microarray databases has resulted. It can be dif®cult for
pharmaceutical, biotechnology,
agricultural, government and         researchers exploring this technology to know which bioinformatics systems best meet their
university laboratories. The         requirements. In order to obtain a better understanding of the available systems, a survey and
corporate headquarters is in         comparative analysis of microarray databases was undertaken. The survey included databases
Sunnyvale, California, USA.
                                     that are currently available, as well as databases that should become available in early 2001.
                                     Databases fall into three categories: (i) those that can be installed locally, (ii) those available for
                                     public data submission and (iii) those available for public query. Developers of microarray
                                     gene-expression databases were asked questions regarding the scope and availability of their
                                     database, its system requirements, its future compliance with MGED (Microarray Gene
                                     Expression Database) standards, and its associated analytical tools. Participants included
                                     AMAD (Stanford/Berkeley/UCSF), ArrayExpress (EBI), ChipDB (MIT/Whitehead), GeneX

                                                                                                                                                           Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                     (NCGR), GeNet (Silicon Genetics), GeneDirector (BioDiscovery), GEO (NCBI), GXD
                                     ( Jackson Laboratory), mAdb (NCI), maxdSQL (University of Manchester), NOMAD (UCSF),
                                     RAD (University of Pennsylvania) and SMD (Stanford University). Other database developers
Keywords: microarray, gene           were contacted but data was not available at the time of manuscript preparation. Each
expression, database                 database ful®ls a different role, re¯ecting the widely varying needs of microarray users.

                                     INTRODUCTION                                                           signals change intensity under speci®c
                                     Microarray technology is a high-                                       condition(s), relative to another condition
                                     throughput method for obtaining gene-                                  (such as a control state). Genes that have
                                     expression data from thousands of genes                                similar expression pro®les may be
                                     simultaneously, allowing biologists                                    grouped using various statistical clustering
                                     potentially to study the transcription of an                           techniques (eg hierarchical clustering, or
                                     entire set of genes for a species. Gene-                               self-organising maps, SOM).4,5 A
                                     expression microarrays involve the                                     comprehensive tutorial on microarray
                                     hybridisation of a labelled nucleic acid                               technology and data analysis is available.6
                                     extract (typically cDNA) to a large array                                 An international consortium, the
                                     of different DNA segments (usually                                     MGED group,7 is currently developing a
                                     representing genes).1,2 By varying the                                 standard to represent the minimum
                                     origin of the RNA extracts used to                                     amount of information necessary to
                                     generate the cDNA, microarrays can be                                  publish a microarray experiment, so that
                                     used to indicate a gene's relative                                     the experiment can be interpreted and
                                     expression in various cell or tissue, in                               reproduced. The MGED group is also
Dr Margaret Gardiner-Garden,
                                     different developmental, temporal or                                   creating a standard for the exchange of
Entigen Pty Ltd,
Bay16/104,                           disease states, or following certain                                   information between databases using
Australian Technology Park,          treatment conditions. After normalising                                MAML (microarray mark-up language).
Eveleigh NSW 1430,
                                     the data within an array, and across arrays,                              As microarrays are a relatively new
                                     the intensity of the signal for each gene                              technology, many users are still struggling
Tel: ‡61 02 9209 4740                can be compared.1,3 A subset of                                        to control the burgeoning data that array
Fax: ‡61 02 9209 4747
                                     `interesting genes' may be produced by                                 experiments can produce. Some scientists
steve.taylor@entigen.com.au          ®ltering the data to ®nd genes whose                                   are collecting data in software packages

                        & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001   143
Gardiner-Garden and Littlejohn

                                    (eg spreadsheets) that are not designed to                              microarray databases. Surveys were
                                    handle these rich datasets and lack the                                 distributed to all known developers of
                                    complex query and visualisation features                                microarray databases and, at the time of
survey participants
                                    needed. Researchers are therefore                                       writing, responses from 11 non-
                                    increasingly looking to develop, or                                     commercial organisations and two
                                    acquire, speci®c databases in order to                                  commercial organisations had been
                                    store their data and distribute their results                           received (see Table 1). URLs for
                                    to colleagues and the scienti®c                                         microarray databases not evaluated in this
                                    community. Many scientists are unsure of                                work can be found at the National Center
                                    which databases are appropriate for them                                for Genome Resources, University of
                                    to install at their organisation, or which                              California (NCGR) web site.8 The 13
                                    databases are appropriate for them to                                   databases discussed here are currently
                                    access remotely for data deposition or data                             available or will be available in the near
                                    queries.                                                                future.
                                       To allow a comparison of microarray                                     Each database was ®rst classi®ed with
scope of survey                     databases, numerous commercial and                                      regard to (a) whether its technology can
                                    non-commercial organisations were                                       be obtained for installation on the end-
                                    approached to participate in a survey of                                user's site, (b) whether it supports public

                                                                                                                                                                     Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
Table 1: Microarray databases included in the survey ± URLs and contact details

 Database           Organisation                                          URL                                               Contactsa

 AMAD               Stanford University/University of California          http://www.microarrays.org/software.html Joseph DeRisi (joe@derisilab.ucsf.edu)
                    at Berkeley, University of California at San                                                   Paul Spellman (spellman@bdgp.lbl.gov)
                    Franscisco (UCSF)                                                                              Mike Eisen
                                                                                                                   Max Diehn
 ArrayExpress       European Bioinformatics Institute (EBI)               http://www.ebi.ac.uk/microarray/ or               Alvis Brazma (brazma@ebi.ac.uk)
 ChipDB             Whitehead Institute for Biomedical Research/          http://young39.wi.mit.edu/chipdb_public/          Peter Young (pyoung@wi.mit.edu)
                    MIT Centre for Genome Research
 GeneX              NCGR                                                  http://www.ncgr.org/                              Harry Mangalam (hjm@ncgr.org)
 GeneDirector       BioDiscovery                                          http://www.biodiscovery.com/                      Alexander Kuklin
 GeNet              Silicon Genetics                                      http://www.sigenetics.com/                        Tony Lialin (tlialin@sigenetics.com)
                                                                                                                            Anoop Grewal (anoop@sigenetics.com)
 GEO                National Center for Biotechnology                     http://www.ncbi.nlm.nih.gov/geo/                  Alex Lash (alash@ncbi.nlm.nih.gov)
                    Information (NCBI)
 GXD                The Jackson Laboratory                                http://www.informatics.jax.org/                   Martin Ringwald
 mAdb               National Cancer Institute (NCI)                       http://nciarray.nci.nih.gov/ or                   John Powell (jip@helix.nih.gov)
 maxdSQL            The University of Manchester                          http://www.bioinf.man.ac.uk/microarray/           David Hancock
                                                                          maxd/                                             (david.hancock@cs.man.ac.uk)
 NOMAD              UCSF                                                  Not available                                     Joseph DeRisi (joe@derisilab.ucsf.edu)
 RAD                University of Pennsylvania                            http://www.cbil.upenn.edu/rad2/servlet            Chris Stoeckert
 SMD                Stanford University                                   http://genome-www4.stanford.edu/                  Gavin Sherlock
                                                                          MicroArray/SMD/                                   (sherlock@genome.stanford.edu)

  The people listed here can be contacted for more information about the databases. They are the source of information in this article (with the exception
of AMAD and NOMAD databases where all information was provided by Paul Spellman, and SMD where some information was supplied by Mike Cherry).

144                    & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                                         A comparison of microarray databases

                               data deposition and (c) whether it                                     survey were asked to specify which of
                               supports public queries. The study then                                these three functions their database ful®ls.
                               focused on the database's availability, the                            The results are shown in Figure 1.
                               types of data stored, whether the database                             Throughout this article, survey answers
                               is web-based, the developer's intentions                               for databases that ful®l the ®rst function
                               for compliance with MGED minimal                                       (local installation) are discussed in sections
                               information standards and MAML                                         A and those that ful®l the latter two
                               standards for data exchange, the data                                  functions (remote data deposition and/or
                               format requirements and the system                                     querying) are discussed in sections B.
                               requirements. The aim of this study was
                               not to provide judgments on the quality                                (A) Database systems for local
                               of the microarray databases surveyed, but                              installation
classi®cations of              rather, the results of the survey will allow                           Researchers wishing to install a database
microarray databases           microarray users to identify the database(s)                           system have varied requirements in terms
(1) local installation
(2) public data
                               that most closely meet their needs and are                             of the type of gene-expression data that
deposition                     worthy of further investigation.                                       they need to store. Each of the
(3) public queries                                                                                    organisations was therefore asked to
                               GENERAL DESCRIPTION                                                    describe the main types of gene-
                               Contact details for each of the                                        expression data that can be stored in their
                               participating organisations and URLs for                               database: data from spotted arrays (nylon
                               the microarray databases surveyed are                                  ®lters or glass slides),1 Affymetrix data2

                                                                                                                                                        Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                               provided in Table 1. Microarray users                                  and/or Serial Analysis of Gene Expression
                               have three main interests in databases.                                (SAGE)9 data. The answers for databases
                               These needs can be summarised as                                       that are available for local installation (or
                               follows: (1) to install a third party database                         will be available soon) are reported in
                               on their local server to manage their data                             Table 2. All these databases store data for
                               and those of their colleagues, and/or (2)                              glass microarrays, but the databases vary as
                               to publish their own data, in a public                                 to whether they store data for nylon
                               database, at a remote location, and/or (3)                             arrays, Affymetrix data or non-array
                               to search and analyse data that has been                               SAGE data.
                               electronically published in databases at                                  The developers were asked whether
                               remote locations. Participants in the                                  their database supports private user-

                            (1) Local installation


                                mAdb* maxdSQL
                                                                                                      Figure 1: Microarray databases, classi®ed
                                                                                                      according to whether (1) the developers
                                               SMD*                                                   provide public access to the database
                                     GeneX*                                                           technology, thereby enabling it to be installed
                                    GeNet*                                                            on a local site (`local installation'), (2) the
                                                     ChipDB                                           public is able to deposit their data into the
                                   ArrayExpress       RAD*                                            database (`public data deposition'), (3) the
                                                      SMD*                                            public may query the database and retrieve
                                     GXD                                                              data (`public queries'). Databases shown in
                                                                                                      italics will have the corresponding capability
                (2)                                          (3)
                                                                                                      in the near future. Databases marked with an
              Public                                       Public
                                                                                                      asterisk support private accounts. Databases
          Data Deposition                                  Queries
                                                                                                      shown in overlapping ellipses perform the
                                                                                                      functions of all ellipses

                  & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001      145
                                                                                                                             Table 2: Details for databases which are available (or will soon be available) for local installation

                                                                                                                                 Database       Types of gene-expression data Private   Web interface     Availability                                                       Cost          Comments
                                                                                                                                                that can be stored            accounts
                                                                                                                                                                              supported Data      Data
                                                                                                                                                Spotted    Affymetrix SAGE              input     queries
                                                                                                                                                arrays     arrays

                                                                                                                                 AMAD           Yes (glass,   No          No     No           No           Yes         Can currently be downloaded from web site             Free          Laboratory database only
                                                                                                                                                not nylon)                                                             (see Table 1)
                                                                                                                                 GeneDirector   Yes           Yes         No     Yes          No           No          Developers anticipate that the prototype              Contact       Custom-designed laboratory database. Could be used as an
                                                                                                                                                                                                                       implementation will be completed in 2001              company       international database
                                                                                                                                                                                                                                                                             (Table 1)
                                                                                                                                 GeNet          Yes           Yes         Yes    Yes          No           Yes         Currently available. Demonstration version            Contact       Can scale from a laboratory to international database.
                                                                                                                                                                                                                                                                                                                                                              Gardiner-Garden and Littlejohn

                                                                                                                                                                                                                       can be downloaded from web sitea                      company       GeNet can be used by organisations with research facilities
                                                                                                                                                                                                                                                                             (Table 1)     in different countries to share data and could be used by a
                                                                                                                                                                                                                                                                                           company providing array services to distribute the analysed
                                                                                                                                                                                                                                                                                           data to their customer
                                                                                                                                 GeneX          Yes           Yes         Yes    Yes          Yes b        Yes         Can currently be downloaded from http://              Free          Can scale from a laboratory to international database. At the
                                                                                                                                                                                                                       genex.sourceforge.net or the NCGR website (see                      moment it stores microarray data for NCGR and their
                                                                                                                                                                                                                       Table 1)c                                                           collaborators only. It is envisaged that GeneX will primarily be
                                                                                                                                                                                                                                                                                           implemented as a laboratory database with GEML-based
                                                                                                                                                                                                                                                                                           communication to other instances of GeneX and other gene
                                                                                                                                                                                                                                                                                           expression databases
                                                                                                                                 mAdb           Yes (glass,   No, but     No     Yes          Yes          Yes         Currently available. However, a more robust           Free          Laboratory database only. Was developed to support users
                                                                                                                                                not nylon)    currently                                                second generation database with                                     of the NCI array Centre. There are 207 registered users
                                                                                                                                                              exploring                                                documentation is expected to be available                           and about 2,800 data sets stored to date
                                                                                                                                                              this                                                     early in 2001
                                                                                                                                 maxdSQL        Yes           Yes         No     Nod          Noe          Noe         Can currently be downloaded from web site             Free for   Can scale from a laboratory to international database
                                                                                                                                                                                                                       (see Table 1)f                                        non-
                                                                                                                                 NOMAD          Yes (glass,   No          No     Yes          Yes          Yes         Not currently available. Should be released in        Free          Laboratory only. Will be able to scale to 5000 experiments
                                                                                                                                                not nylon)                                                             2001 (see URL Table 1)
                                                                                                                                 SMD            Yes (glass,   No          No     Yes          Parameters Yes           Can currently be downloaded from website (see         Free for non- Can scale from a laboratory to international database. Stores
                                                                                                                                                not nylon)                                    are speci®ed             Table 1)                                              commercial the data of about 200 of Stanford University's international
                                                                                                                                                                                              in web form,                                                                   use           collaborators and is currently much larger than any other
                                                                                                                                                                                              but data ®les                                                                                academic database
                                                                                                                                                                                              must be
                                                                                                                                                                                              using ftp, to

& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                                                                                                                              the server

                                                                                                                             Databases described in italics will be available for local installation in near future.
                                                                                                                             a                                                       10
                                                                                                                               GeNet demonstration version can be downloaded.
                                                                                                                             b                                                                                                                                                                                               11
                                                                                                                               GeneX uses web browser for SSL-encrypted ®le transfer to server. The developers note that a secure connection should soon be possible without calling the browser, using the OpenSSL library.
                                                                                                                             c                                                              12                                              13
                                                                                                                               NCGR has released code for GeneX via SourceForge net using the Free Software Foundation's LGPL licence. Initially everyone will be able to download code, but only NCGR developers will have code
                                                                                                                             commit privileges. Once people start submitting code back to NCGR, they will allocate privileges on an as-needed basis.
                                                                                                                               Although maxdSQL does not support separate accounts, an organisation may chose to have separate maxdSQL databases where information is exchanged, without any loss of structure or information, using XML.
                                                                                                                               Up-loading and searches of maxdSQL are performed through GUI (graphical user interface) applications written in Java.
                                                                                                                             f                                                    14
                                                                                                                               The maxdSQL licence agreement can be viewed.

                                                                                                                                                                                                           Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                                                                                          A comparison of microarray databases

                                accounts (Table 2). Many users wish to                                 are only free for non-commercial use
                                install a database that can be shared                                  (Table 2). For information on the cost of
                                between groups, but allows each group to                               the commercial databases, GenDirector
data stored
                                retain control over their data. All                                    and GeNet, please contact the suppliers
                                databases with the exception of AMAD                                   (Table 1).
                                and maxdSQL support private user-
                                accounts (Table 2). Databases that support                             (B) Databases for public data
                                private accounts are marked with an                                    deposition and/or queries
private user accounts
                                asterisk in Figure 1.                                                  Frequently, researchers want either to
                                   Each organisation was asked whether                                 publish their own data in public data
                                they use a web browser for data uploading                              repositories or to query existing gene-
                                and/or data queries (Table 2). Some users                              expression data. Therefore the same
                                prefer, or require, that the microarray                                questions as those in section A were asked
                                database has a web interface. A web                                    of the developers of databases that are
web interface
                                interface enables the database to be readily                           available (or will be available) for public
                                accessed by others, often without the                                  data deposition and/or queries. The
                                need for speci®c client software, as most                              survey answers are reported in Table 3.
                                users have a web browser installed. Some                                  Most of the databases described in
                                developers, however, believe that                                      Table 3 are available (or will be available)
                                superior functionality can be obtained by                              for public deposition of microarray data.
                                developing speci®c (non-web browser)                                   Exceptions to this are ChipDB, RAD and

                                                                                                                                                      Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                client interfaces to their database. Only                              SMD. NCGR may not have the capacity
                                GeneX, mAdb and NOMAD allow all                                        to store all the public data submitted to
                                data to be uploaded using a web browser.                               GeneX, so only selected data will be
                                Note that uploading data to GeneX does                                 accepted. All the databases in Table 3 are
                                require a client-side curation tool supplied                           available (or will be available) for public
                                by NCGR. All, except GeneDirector and                                  queries. ArrayExpress will become
                                maxdSQL, use a web browser for data                                    available for public microarray data
                                queries (Table 2).                                                     deposition and queries in the near future.
availability                       Participants were questioned about the                              GXD is presently available for deposition
                                main purpose of their database. Those                                  and queries of non-array expression data
                                designated as `laboratory' databases ful®l                             for mouse, and array data are currently
                                the needs of individual laboratories or                                being added.
                                organisations, and those designated as                                    Almost all the databases store both
                                `international' databases are designed to                              spotted array and Affymetrix data.
                                store data from numerous facilities                                    Exceptions to this are SMD, which
                                worldwide (see `Comments' column in                                    specialises in data from glass arrays, and
                                Table 2). Naturally, all the databases                                 ChipDB, which specialises in Affymetrix
                                available for local installation can be used                           data. The databases vary considerably with
                                as a `laboratory' databases, but some of                               regard to storage of non-array SAGE data
                                these can be scaled up for `international'                             (Table 3).
                                purposes.                                                                 Some users do not want to maintain a
data stored                        Developers were asked about the                                     database on their site, but wish to store
                                availability of their database and its cost.                           their data in a private account in an
                                AMAD, GeneX, GeNet, mAdb,                                              external database. According to the survey
                                maxdSQL and SMD are currently                                          answers, GeNet and GeneX could both
private accounts                available for public distribution. The                                 be used for this purpose (Table 3, Figure
                                others will be available in the near future                            1). Other users wish to publish their data
                                (Table 2). Most databases surveyed are                                 in a public database and have no
                                free (or will be free when released)                                   requirement for a private account. They
                                although the developers of maxdSQL and                                 may, therefore, also consider using GEO,
                                SMD have indicated that their databases                                GXD or ArrayExpress (Table 3, Figure 1).

                   & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001   147
                                                                                                                             Table 3: Details for databases which are available (or will soon be available) for public data deposition and/or public queries

                                                                                                                              Database       Types of gene-expression data Private   Web interface Availability                                                                 Cost for      Comments
                                                                                                                                             that can be stored            accounts                                                                                             searches/
                                                                                                                                                                           supported Data Data                                                                                  data
                                                                                                                                             Spotted    Affymetrix SAGE              input queries                                                                              deposition
                                                                                                                                             arrays     arrays

                                                                                                                              ArrayExpress   Yes           Yes           No     No           No        Yes        Prototype implementation will be completed in Free                          Designed to be implemented as an international microarray database
                                                                                                                                                                                             info.                2001
                                                                                                                                                                                                                                                                                                                                                                         Gardiner-Garden and Littlejohn

                                                                                                                              ChipDB         No            Yes           No     No           N/Aa      Yes        Available now                                                 Free          Laboratory database. Developers aim to allow external loading
                                                                                                                                                                                                                                                                                              and distribution, and to expand the database to include all
                                                                                                                                                                                                                                                                                              available data sets in the future
                                                                                                                              GeNet          Yes           Yes           Yes    Yes          No        Yes        Available now                                                 Free (for this See Table 2
                                                                                                                              GeneX          Yes           Yes           Yes    Yes          Yes       Yes        Available now                                                 Free          See Table 2. NCGR reserves the right to request more
                                                                                                                                                                                                                                                                                              information and annotation of the submission. Private accounts
                                                                                                                                                                                                                                                                                              are supported in the schema, but a charge may apply for this
                                                                                                                                                                                                                                                                                              service (except for collaborators). It is likely that NCGR will
                                                                                                                                                                                                                                                                                              maintain both a public database and a private database
                                                                                                                              GEO            Yes           Yes           Yes    No           Yes       Yes        Available now                                                 Free          International database. Goal is to be the `GenBank' of gene-
                                                                                                                                                                                                                                                                                              expression data. There is a limited `hold-until date' facility in
                                                                                                                                                                                                                                                                                              order that results can be published before data are publicly
                                                                                                                              GXD            Yes           Yes           No     No           Possibly b Yes       2001                                                          Free          International database which stores and integrates published gene-
                                                                                                                                             (in future)   (in future)                                                                                                                        expression data from the laboratory mouse. GXD currently stores RNA
                                                                                                                                                                                                                                                                                              in situ hybridisation, immunohistochemistry, Northern blot, Western
                                                                                                                                                                                                                                                                                              Blot, RT-PCR and RNAse protection data. The database is being
                                                                                                                                                                                                                                                                                              expanded to include array he database is being expanded to include
                                                                                                                                                                                                                                                                                              array data in the future. Filter arrays and spotted glass arrays will be
                                                                                                                                                                                                                                                                                              added ®rst, with additional assay types being added later
                                                                                                                              RAD            Yes           Yes           Yes    Yes          N/Aa      Yes        Available now                                                 Free          Data are from groups on the University of Pennsylvania campus,
                                                                                                                                                                                                                                                                                              collaborators at other institutions, as well as published data (in
                                                                                                                                                                                                                                                                                              areas of speci®c interest, ie haematopoiesis, stem cells, diabetes/
                                                                                                                                                                                                                                                                                              pancreatic development and Plasmodium faciparum)
                                                                                                                              SMD            Yes           No            No     Yes          N/Aa      Yes        Available now                                                 Free (non-    See Table 2

& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001

                                                                                                                             Databases described in italics will be open for public microarray data deposition and/or searches in near future.
                                                                                                                             No info: No information available.
                                                                                                                               Not applicable because public data deposition is not allowed.
                                                                                                                               GXD will de®nitely support data upload without using the web, and may also support data upload using a web interface.

                                                                                                                                                                                                              Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                                                                                           A comparison of microarray databases

                                 GEO accepts microarray data from all                                   option of omitting some MGED-
                                 species, as will ArrayExpress when                                     speci®ed minimal information.
                                 implemented. GXD will accept                                              Developers were asked to indicate if
storage of images
                                 microarray data from the laboratory                                    they link or store to image data, and to
                                 mouse in the future.                                                   specify the format of the image data.
                                   Most of the databases use a web                                      Access to a TIFF image has the advantage
                                 browser for data input and all use a web                               that the image can be reanalysed using
                                 browser for data queries. Those marked                                 different methods. With the exception of
                                 not applicable (N/A) in Table 3 do not                                 AMAD, all the databases provide access to
                                 accept public data.                                                    an image: GeneDirector, GeneX, mAdb,
                                   All the databases surveyed indicated                                 maxdSQL and SMD provide access to a
                                 that there is no cost for public data                                  TIFF image whereas GeNet and
                                 deposition or queries.                                                 NOMAD provide access to a JPEG
                                 DATA STORED AND                                                           Additional questions were designed to
                                 ANALYSIS CAPABILITIES                                                  compare each database's capabilities
                                 (A) Database systems for local                                         beyond the MGED guidelines.
                                 installation                                                           Developers were asked if their database (a)
annotation                       Every user has speci®c requirements for                                stores annotations from searches of
                                 the type of data they wish to store.                                   external databases (such as results of
                                 Therefore, each organisation supplying a                               BLAST searches15 or genetic/physical

                                                                                                                                                        Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                 database for local installation was asked to                           mapping information, (b) stores the results
database schema                  provide a URL for their database schema.                               of analyses (such as lists of genes which are
                                 They were also asked whether their                                     up- or down-regulated, or clusters of
                                 database stores primary microarray data                                genes) or (c) is associated with analytical
                                 (such as intensity values for individual                               tools (Table 4).
                                 elements, background values, ratios of                                    All the databases store annotations to
                                 intensities (for glass slides) or the avgDiff                          some extent, and GeneDirector, GeNet
                                 change and absCall values (for Affymetrix                              and GeneX store the results of analyses
                                 data2 ). They were asked if their database                             (Table 4). All of the databases are
                                 stores sample conditions (eg description of                            associated with analytical tools, to some
                                 organism, cell type, treatment conditions                              extent (Table 4). Some databases are
                                 of the sample from which RNA was                                       integrated with the tools so that the
                                 extracted) and/or experimental conditions                              database can be accessed from the
                                 (eg hybridisation conditions) and whether                              interface of the analytical package, or vice
                                 the database will follow the MGED                                      versa. Other databases export data in a
MGED guidelines
                                 minimum information guidelines when                                    format that can be directly uploaded into
                                 they are ®nalised (Table 4).                                           an analytical package. Each developer has
                                    From the results of the survey, all the                             provided a brief description of the
                                 databases store primary data and almost all                            associated analytical package (see below).
                                 store descriptions of the sample and
                                 experimental conditions. The one                                       AMAD
                                 exception is AMAD which does not store                                 AMAD can export data in a form suitable
                                 experimental conditions and stores sample                              for uploading into Cluster16 (P. Spellman,
                                 conditions to a limited extent only                                    personal communication). Cluster
                                 (P. Spellman, personal communication).                                 performs hierarchical clustering, SOM,
                                 Developers of all the databases, with the                              K-means clustering and principal
                                 exception of AMAD, plan to implement                                   components analysis (PCA).
                                 the MGED minimum information
                                 guidelines when they become ®nalised                                   GeneDirector
                                 (Table 4). Note that when GeNet is                                     GeneDirector will be accessible from the
                                 installed on a local site, the user has the                            interfaces of the analytical packages

                    & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001    149
Gardiner-Garden and Littlejohn

Table 4: Data stored and associated analytical capabilities of databases which are (or will be) available for local installation

    Database                  Stores Stores                Stores     Will follow Stores array                            Stores           Stores results    Associated
    (URL for schema)          primary sample               experiment MGED        image                                   annotations      of analyses (eg   with
                              data    conditions           conditions minimum                                             (eg BLAST        lists of          analytical
                                                                      information                                         results, map     interesting       toolsa
                                                                      guidelines                                          positions)       genes or gene

    AMAD                      Yes         Yes              No               No               No                           Limited          No                Yes
    (Schema URL not
    GeneDirector (Schema      Yes          Yes             Yes              Yes              Stores TIFF images c         Yes d            Yes e             Yes
    URL not provided)
    GeNet (Schema URL         Yes         Yes              Yes              Yesf             Links to JPEG or GIF         Yesg             Yesh              Yes
    not provided)                                                                            images
    GeneX (http://            Yes          Yes             Yes              Yes              Links to TIFF image i        Yes j            Yes k             Yes
    GeneX_ schema.pdf)
    mAdb: (Schema URL         Yes         Yes              Yes              Yes              Links to TIFF images,        Yes              In developmentn Yes
    not provided)                                                                            composite JPEG images
                                                                                             and archive ®les

                                                                                                                                                                          Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                                                                             containing all the
                                                                                             individual spot imagesl
    maxdSQL: (http://     Yes             Yes              Yes              Yes              Stores URLs for the          Yes              In developmentn Yes
    www.bioinf.man.ac.uk/                                                                    images (any format)m
    NOMAD: (Schema URL Yes                 Yes             Yes              Yes              Stores and links to JPEG     Yes              No                Yes
    not provided)                                                                            images
    SMD (http://genome- Yes                Yes             Yes              Yes              Links to TIFF images but     Yes              No                Yes
    www4.stanford.edu/                                                                       on-line access is to GIF
    Microarray/SMD/doc/                                                                      images only o

Databases described in italics will be available for local installation in near future.
  See main text for a description of the associated analytical tools.
   Details of the type of data stored in AMAD can be found in the Help documentation once the database is installed.
  For GeneDirector, the TIFF images can be downloaded from the database.
   GeneDirector stores function information in any format derived from a known public database (NCBI, Swiss-PROT, etc.).
   For GeneDirector, genes that are up-regulated or down-regulated can be stored as a subset, and genes can be partitioned into clusters.
  If GeNet is used for local installation, investigators are not required to provide MGED-speci®ed minimal information.
  For GeNet, if data are available in a standard format, eg keyword, protein product, map position, as for GenBank, Locus Link and Unigene, then this
annotation may be retrieved by GeneSpring and updated in a spread sheet. This information is displayed when a user calls up information speci®c for a
   For GeNet, results presented as visuals and lists of interesting genes can be stored as reports and accessed by clicking on an icon in the result manager
browser window. Written reports can be uploaded as an attachment to, for example, the icon to a particular list of genes. For K-means, clustering and
SOM, clusters are stored as lists of genes. For PCA, the correlation coef®cient indicating the similarity of each gene to a component is stored.
  GeneX supports download, but not viewing, of the TIFF image. The developers may store lower resolution images for display purposes in the future.
  The GeneX schema allows each sequence to be associated with a small number of BLAST hits. A more complete integration of BLAST and other
sequence-based analyses have been coded for and will be integrated in the future. The curation tool queries the data supplier for many annotations and
supporting metadata.
   GeneX stores `hotspots' of genes that change more than some proportion in an experiment.
  For mAdb, the uploaded image is processed and converted to a JPEG image and an archive ®le containing each individual spot image. The user can
download the original uploaded image and can view/save the processed JPEG image. The user can view, but not download, the archive ®le of individual spot
   For maxdSQL, the images can be accessed from outside the database, but the user cannot view or download images from within the database interface.
   The schema of maxdSQL allows for the storage of analyses (such as sets of interesting genes) but there is no user interface to this yet.
   For SMD, TIFF images and primary data are archived via HSM (hierarchical storage management) to tape and Magneto optical disks. The data are then
linked and retrieved as if they were on a `normal' disk, perhaps with a longer retrieval.

150                        & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                       A comparison of microarray databases

             CloneTracker (clone tracking and                                       currently integrating xgobi as a primary
             experimental design), ImaGene (image                                   visualisation engine and its sister
             analysis) and GeneSight (data mining).                                 application, xgvis for MultiDimensional
             These analytical tools provide customised                              Scaling.24 Rcluster, Cyber T, Xcluster
             normalisation/transformation of data,                                  and xgobi can be accessed from within
             histograms, K-means clustering, SOMs,                                  the GeneX database interface. GeneX can
             hierarchical clustering, PCA, scatterplots,                            export data in a format that can be loaded
             genepies, time series analysis and measures                            directly into J-Express and Treeview
             of statistical con®dence (at spot and                                  (H. Mangalam, personal communication).
             microarray level). The package also offers
             a replicate analysis based on ANOVA                                    mAdb
             (analysis of variance) which can indicate                              NCI provides an ad hoc query tool to
             up- or down-regulated genes based on a                                 allow individual genes or sets of genes to
             given con®dence level chosen by the                                    be extracted from mAdb. Venn logic can
             user17 (Alexander Kuklin, personal                                     be applied to extracted groups of data.
             communication).                                                        The extracted data can be retrieved as an
                                                                                    Excel ®le, formatted for direct input into
             GeNet                                                                  Cluster25 or as a tab-delimited ®le. NCI
             The analytical package `GeneSpring' is                                 has also implemented web-based
             fully integrated with GeNet. Tools within                              clustering and viewing based on
             GeneSpring include correlations between                                Xcluster.26 NCI supports a version of the

                                                                                                                                     Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
             different expression pro®les, clustering                               Java applets, microarray viewer and
             techniques (hierarchical, K-means, PCA                                 multiple array viewer developed by
             and SOM) and ®ltering by a factor by                                   NHGRI, and a scatter plot applet
             which a gene is up- or down-regulated.                                 developed by NCI/BRB.27 The
             GeneSpring also has tools for statistical                              developers are working on providing
             group comparisons (z-sample t-test/                                    additional statistical capabilities using the
             ANOVA), class prediction and error                                     R statistical package as a backend. Data
             models.18                                                              sets can be retrieved and formatted for
                                                                                    speci®c applications suites: MAExplorer,28
             GeneX                                                                  BRB Arraytools29 and JExpress. Export to
             The analysis package integrated with                                   GeneSpring18 and PARTEK30 will be
             GeneX currently supports hierarchical                                  supported in the future (J. Powell,
             clustering and permutation-based                                       personal communication).
             validation of this clustering, although the
             latter is quite memory intensive. GeneX is                             maxdSQL
             also distributed with the free Java                                    maxdSQL is associated with maxdView, a
             application J-Express,19 which provides                                Java suite of analysis and visualisation tools
             hierarchical and K-means clustering,                                   designed for rapid prototyping and to help
             SOM, PCA, pro®le ®ltering and more.                                    integrate existing software. maxdSQL can
             NCGR supplies a t-test variant called                                  be accessed from the interface of this
             CyberT20 that includes Bonferroni                                      analytical package. This suite, which
             correction for repeated tests and a                                    includes ®ltering, two-dimensional
             Bayesian estimator for under-replicated                                correlation plots, multidimensional web
             samples. The CyberT analysis spawns an                                 plots and distribution histograms, can also
             xgobi window21 to examine the                                          be used independently of the database.
             relationships of the variables in three                                The maxdView software provides an
             dimensions and can perform PCA on the                                  environment for developing tools and/or
             data. NCGR intends to integrate                                        integrating existing tools. Clustering
             ORCA22 as well as other clustering                                     algorithms from an external package are
             techniques from the R statistical language                             supplied as a demonstration of how the
             cluster libraries.23 The developers are                                integration is performed. A greater

& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001     151
Gardiner-Garden and Littlejohn

                                number of algorithms are planned to be                                 stores data as ordered, unordered and
                                added over time using the modular `plug-                               time-series sets and the developers plan to
                                in' architecture. There is support for                                 add an analysis package including PaGE
                                annotation retrieval and searching in the                              (developed in-house), T-test and other
                                analysis suite (D. Hancock, personal                                   statistical methods in the future (Chris
                                communication).                                                        Stoeckert, personal communication).

                                NOMAD                                                                  ArrayExpress
                                NOMAD can export data in a form                                        ArrayExpress will be associated with an
associated analytical
packages                        suitable for uploading into Cluster16 (see                             analytical package, called Expression
                                AMAD) (P. Spellman, personal                                           Pro®ler35 (A. Brazma, personal
                                communication).                                                        communication). Expression Pro®ler
                                                                                                       allows the user to generate a subset of
                                SMD                                                                    interesting genes by ®ltering for keywords
                                SMD is associated with numerous                                        and/or speci®c intensity ratios and enables
                                clustering and analysis features. Users can                            gene clustering (hierarchical, K-means).
                                access the data in SMD from the web-
                                interface of the analytical package. The                               ChipDB
                                database server provides UNIX versions                                 The analysis package integrated with
                                of the XCluster software package26 for                                 ChipDB supports analyses of single
                                hierarchical and K-means clustering and                                experiments using a rule-based approach,

                                                                                                                                                     Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                SOM, and a number of scripts for                                       and experiment comparisons using
                                statistical analysis of an experiment and its                          hierarchical clustering and
                                duplicates to address variance. XCluster is                            multidimensional scaling. It also supports
                                freely available to academics in either                                set comparisons between experiments, or
                                source, or binary forms, for multiple                                  between experiments and categorisations
                                platforms (G. Sherlock, M. Cherry,                                     of genes such as MIPS categories36 or
                                personal communication).                                               chromosomes. See links within ref. 37 for
                                                                                                       information on the statistical techniques
                                (B) Databases for public data                                          used. The results of analyses are stored in
                                deposition and/or queries                                              the database for search and retrieval. The
                                The same questions were asked of                                       package uses the R statistical language to
                                organisations that currently host (or will                             perform analyses and provides an interface
                                soon host) databases for public data                                   for loading data to/from GeneCluster
                                deposition and/or queries (Table 5). All                               SOM software38 (P. Young, personal
                                these databases store primary data, sample                             communication).
                                conditions and experiment conditions and
                                all will follow the MGED minimum                                       GeNet
                                information guidelines. Only ChipDB                                    (see Section A)
                                does not provide access to image data.
                                ArrayExpress, GeneX, RAD and SMD                                       GeneX
                                provide access to TIFF images and GEO,                                 (see Section A)
access to images
                                GeNet and GXD provide access to JPEG
                                and/or GIF images. The databases vary                                  GXD
                                considerably regarding whether they store                              In addition to the web-based analysis and
annotation                      annotations such as results of BLAST                                   display tools that are currently available,
                                searches or chromosomal map positions.                                 the developers plan to develop matrix
                                All, except ArrayExpress, GXD and                                      views that allow interactive aggregation
                                SMD, store the results of analyses, such as                            and sorting via the different query
                                lists of interesting genes or clusters of                              parameters available, as well as
                                genes. All, but GEO and RAD, are                                       comparison of matrices. These tools will
                                associated with analysis packages. RAD                                 be combined with statistical analysis

152                & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                                                A comparison of microarray databases

Table 5: Data stored and associated analytical capabilities: databases which are (or will be) available for public data
deposition and/or public queries

    Database                      Stores Stores                Stores     Will follow Stores array                        Stores           Stores         Associated
    (URL for schema)              primary sample               experiment MGED        image                               annotations      results of     with
                                  data    conditions           conditions minimum                                         (eg BLAST        analyses       analytical
                                                                          information                                     results, map     (eg lists of   toolsa
                                                                          guidelines                                      positions)       interesting
                                                                                                                                           genes or

    ArrayExpress: (http://      Yes            Yes              Yes             Yes              Links to TIFF image b    No               No             Yes
    ChipDB: (http://            Yes            Yes             Yes              Yes              No                       No               Yesc           Yes
    restricted access)
    GEO: (http://                 Yes          Yes             Yes              Yes              Provides access to a Yesd                 Yese           No
    www.ncbi.nlm.nih.gov/geo/                                                                    ,100 kB `reference'
    info/geo.tbl)                                                                                image in JPEG/GIF
    GeNet (Schema URL not         Yes          Yes             Yes              Yes              Links to JPEG or GIF Yes           Yes           Yes
    provided)                                                                                    images               (see Table 4) (see Table 4)

                                                                                                                                                                            Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
    GeneX (http://www.ncgr.org/   Yes          Yes              Yes             Yes              Links to TIFF image      Yes              Yes            Yes
    GXD: (http://                  Yes         Yes              Yes             Yes              Links to JPEG image. g Yes h              No             Yes
    www.informatics.jax.org/tools/                                                               May provide access to
    schemaf                                                                                      TIFF images in future
    RAD: (http://               Yes            Yes             Yes              Yes              Links to TIFF image Not directlyj Yesk                   Yes (in future)
    www.cbil.upenn.edu/cgi-bin/                                                                  (when available). For
    RAD2/schemaBrowser.pl)                                                                       display purposes, a
                                                                                                 GIF is usedi
    SMD (http://             Yes               Yes             Yes              Yes              Links to TIFF image Yes                   No             Yes
    genomewww4.stanford.edu/                                                                     but on-line access is
    Microarray/ SMD/doc/                                                                         to GIF image only
    db_speci®cations.html)                                                                       (see Table 4)

Databases described in italics will be open for public microarray data deposition and/or searches in near future.
  See main text for a description of the analytical packages.
  The developers of ArrayExpress note that the link to a TIFF image may be re-evaluated long term if it becomes too expensive or if image analysis
becomes reliable.
  Chip DB allows ®ltering results to ®nd a certain gene and all the information contained about it. A gene is found only when the gene is in a result ®le, ie is
up or down in expression levels. The results of all analyses are stored.
  For GEO, some annotations will be done automatically, such as matching GenBank accessions to NCBI's UniGene Clusters, LocusLink IDs and RefSeq
accessions. Annotations not derived in-house can be stored in comments in the platform data table but these annotations may not necessarily be able to
be queried.
  For GEO, results of external analyses can be stored as up to two tab-delimited tables. Submitted analyses can be retrieved but cannot be queried. No
images of cluster results can be stored, but the user may store the table that represents the images.
  For GXD, a schema for the array data is not yet publicly available.
  JPEG images can be downloaded from GXD; however, the developers note that there may be copyright issues.
h                                                                                                                                                31
  GXD is integrated with Jackson Laboratory's Mouse Genome Database (MGD) to form as system called MGI (Mouse Genome Informatics). When
GXD expands to microarray data, it will add value by integrating the array data with other types of expression data and with genotype, genomic and
phenotype data for mouse strains and mutants. This will occur via integration with MGD, establishing new classi®cation schemes for genes and gene
products (as a member of the Gene Ontology Consortium and maintaining links to external community databases.
  No TIFF images have been stored in RAD, to date, but the developers plan to make TIFF images accessible. The GIF image can be downloaded.
  RAD obtains extra information from links to a sister database called GUS (Genomics Uni®ed Schema) that contains genomic sequences, ESTs, mRNAs
and protein sequences from GeneBank/EMBL/DDBJ, UCSC and SWISS-PROT among others. The sequences are integrated through concepts of genes,
RNAs and proteins. ESTs and mRNAs are clustered and assembled to generate consensus sequences. Gene predictions are made of genomic sequence.
                                                                                                             33                                            34
Predictions of gene function and cellular role are also made. A human and mouse view of GUS is available. A Plasmodium falciparum view is available.
  RAD can be queried to answer questions such as `What gene is expressed in the top 20% in B lymphocytes and maps to Chromosome 19?'

                         & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001                   153
Gardiner-Garden and Littlejohn

                                    methods that are being developed by                                    information on their genes of interest: for
                                    collaborators as well as existing software.                            example, information on DNA sequence,
                                    These analytical tools will be accessible                              function, genetic and physical map
                                    from the database interface                                            locations or homologous genes in other
                                    (M. Ringwald, personal communication).                                 species. Developers were therefore asked
                                                                                                           if their microarray database supports links
                                    SMD                                                                    to external databases containing such
                                    (see Section A)                                                        information. The survey results showed
                                                                                                           that each database cross-references to
                                    Cross-references to other                                              various external databases. Some of the
                                    databases                                                              developers did not indicate a speci®c site
                                    Often microarray users wish to relate their                            to which their database is linked as their
                                    microarray results to published non-array                              database is customisable. We have

Table 6: Databases to which the microarray databases have cross-references. These databases provide extra information
such as DNA sequences, functional information and chromosomal map positions

 Cross-reference                URL




                                                                                                                                                                                                Downloaded from bib.oxfordjournals.org by guest on May 12, 2011

 ArkDB                       http://bos.cvm.tamu.edu/cgi-bin/arkdb/browsers/                                                                                     [
 BOVMAP                      http://locus.jouy.inra.fr/                                                                                                          [
 dbEST                       http://www.tigr.org/                                                                                                                    [                      [
 DDBJ                        http://www.ddbj.nig.ac.jp                                                                            [                              [                      [
 EcoCyc                      http://ecocyc.panbio.com/ecocyc/ecocyc.html                                                                  [
 EMBL                        http://www.ncbi.nlm.nih.gov/                                                                         [                              [                      [
 Ensemble                    http://www.ensembl.org                                                                               [
 GDB                         http://gdbwww.gdb.org/                                                                                                              [
 GeneAtlas                   http://www.citi2.fr/GENATLAS/welcome.html                                                                                                                      [
 GenBank                     http://www.ebi.ac.uk/                                                                                [                              [ [                    [
 GeneCards                   http://bio-www.ba.cnr.it:8000/GeneCards/                                                                                              [                        [
 GeneMap99                   http://www.ncbi.nlm.nih.gov/genemap/                                                                                                                           [
 GSDB                        http://www.ncgr.org/research/sequence/                                                                                              [
 GUS (human & mouse)         http://www.allgenes.org                                                                                                                                    [
 GUS (Plasmodium falciparum) http://www.plasmodb.org                                                                                                                                    [



 KEGG                        http://www.genome.ad.jp/kegg/                                                                        [ [ [                              [
 LocusLink                   http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html                                                                               [                              [
 Medminer                    http://www.tigr.org/                                                                                                                  [
 MGD                         http://www.informatics.jax.org/mgihome/MGD/aboutMGD.shtml                                                                           [ [
 MIPS                        http://www.mips.biochem.mpg.de/                                                                          [ [
 OMIM                        http://gdbwww.dkfz-heidelberg.de/omim/omim_search.html                                                                              [                          [
 PathDB                      http://www.ncgr.org/software/pathdb/                                                                         [
 PubMed                      http://www.nlm.nih.gov/databases/freemedl.html                                                       [ [                            [
 RATMAP                      http://ratmap.gen.gu.se/                                                                                                            [
 RefSeq accessions           http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html                                                                               [
 RHdb                        http://www.ebi.ac.uk/RHdb/                                                                                                                                   [
 SGD                         http://genome-www.stanford.edu/Saccharomyces/                                                                                                                [
 SWISS-PROT                  http://www.ebi.ac.uk/swissprot/                                                                      [                              [                      [ [
 Taxonomy                    http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/                                              [
 TIGR                        http://www.tigr.org/                                                                                                                    [
 UCSC                        http://genome.ucse.edu/                                                                                                                                    [
 Unigene                     http://www.ncbi.nlm.nih.gov/UniGene/                                                                                            [ [ [                      [ [
 WormPD                      http://www.proteome.com/                                                                                                                                     [
 YPD                         http://www.proteome.com/                                                                                 [                                                   [

[: Developer has indicated that their microarray database links to this external database. The absence of a tick does not necessarily imply that a link does
not exist.
  NCI maintains an up-to-date copy of the Hs, Mm and Rn UniGene data, as well as a partial mirror of dbEST, within mAdb. NCI is also a mirror site for

154                    & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                                                             Table 7: Comparison of data and system requirements of microarray databases that are available (or will soon be available) for local installation

                                                                                                                                 Database        Operating   Database management                    Web-server            Other        Clients                                           Will support Upload format                    Download format
                                                                                                                                                 system (OS) system (DBMS)                          used                  technologies                                                   MAML
                                                                                                                                                                                                                          used                                                           standard

                                                                                                                                 AMAD            Alla               Flat ®le database written in    Apachec               Perl, Web              Web Browser                             No          Output from GenePixd (Axon), Tab delimited; Probably XML
                                                                                                                                                                    Perlb                                                 Browser                                                                    Scanalyzee (Stanford University) in future
                                                                                                                                 GeneDirector    Any platform       Oracle g                         None                 100% Javah             Java application (with                  Yes         Any tab-delimited text (imported Tab-delimited; XML
                                                                                                                                                 that has a                                                                                      customised SQL queries                              using GeneSight).
                                                                                                                                                 JVM f installed                                                                                 capability)                                         Output from Imagene & Autogene.
                                                                                                                                                                                                                                                                                                     Slide design data imported from
                                                                                                                                 GeNet           Unix,              Oracle                          Apache                GeNet is a    Web Browser                                      Probably    From GeneSpringk                  HTML (very similar to XML),
                                                                                                                                                 Windowsi ,                                                               servlet                                                                                                      tab delimited
                                                                                                                                                 Mac OSj                                                                  programmed in
                                                                                                                                 GeneX           Linux l /Intel m   PostgreSQL p (Linux/Intel);      Apache               Perl CGI and           Web Browser. Curation Tool Yes                      The Curation tool reads tab-      Tab delimited, Values formatted
                                                                                                                                                 & Solaris n /      Sybase q (Solaris/SPARC). The                         DBI/DBD; Perl          is a Java 1.3 client-side                           delimited ®les from almost any    for direct reading into xgobi;
                                                                                                                                                 SPARCo             developers are attempting to                          for other              application not an applet;                          scanning package, and encodes     Values formatted for reading into
                                                                                                                                                                    keep the schema and                                   purposes               Command line applications                           them as GEML/XML r to upload to   R statistical program; HDF5 s in
                                                                                                                                                                    implementation DBMS-                                                         and X windows visualisation                         database                          future; GEML
                                                                                                                                                                    independent                                                                  applications (xgobi/xgvis) on
                                                                                                                                                                                                                                                 server (see text)
                                                                                                                                 mAdb            Not speci®ed       Sybase                          Type not              Perl CGI and           Web Browser                             Yes         Output from deArrayt          Tab delimited; analysis results
                                                                                                                                                                                                    speci®ed              DBI                                                                        (NHGRI Arraysuite) or GenePix are available in HTML, Ms-
                                                                                                                                                                                                                                                                                                     (Axon)                        Excel, tab delimited or pre-
                                                                                                                                                                                                                                                                                                                                   formatted as input for Clusteru
                                                                                                                                 maxdSQL         Alla               Oracle, PostgreSQL, MySQLv None                       100% Java              maxdView and maxdLoad Yes                           Any tab-delimited text            Native XML format and
                                                                                                                                                                                                                          (using Swing           applications (Java)                                                                   `neutral' format; plain text
                                                                                                                                                                                                                          1.2.2 or                                                                                                     formats
                                                                                                                                 NOMAD           Linux tested       MySQL                            Apache               PHP, w Perl            Web Browser                             Yes         Output from GenePix (Axon)        Tab delimited; XML planned
                                                                                                                                 SMD             Solaris            Oracle 8.06                      Apache               Perl CGI and           Web Browser                             Yes         Output from ScanAlyze             CDT ®le, similar to ScanAlyze
                                                                                                                                                                                                                          DBI                                                                        (Stanford University)             output format; Tab delimited
                                                                                                                                                                                                                                                                                                     GenePix (Axon)

                                                                                                                             Databases described in italics will be available for local installation in near future.                                    http://www.linux.org/
                                                                                                                             a                                                                                                                         m
                                                                                                                               The developers believe there are no restrictions in operating systems.                                                     http://www.intel.com/
                                                                                                                             b                                                                                                                         n
                                                                                                                                http: //www-cgi.cs.cmu.edu/htbin/perl-man/                                                                               http://www.sun.com/solaris/

& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                                                             c                                                                                                                         o
                                                                                                                               http://www.apache.org/                                                                                                    http://www.sparc.com/
                                                                                                                             d                                                                                                                         p
                                                                                                                                http://www.axon.com/GN_GenePixSoftware.html                                                                              http://www.postgresql.org/index.html
                                                                                                                             e                                                                                                                         q
                                                                                                                               http://rana.stanford.edu/software/                                                                                        http://www.sybase.com/
                                                                                                                             f                                                                                                                         r
                                                                                                                               Java Virtual Machine.                                                                                                     http://genex.ncgr.org/
                                                                                                                             g                                                                                                                         s
                                                                                                                               http://www.oracle.com/                                                                                                    https://hdf.ncsa.uiuc.edu/HDF5/
                                                                                                                             h                                                                                                                         t
                                                                                                                                http://java.sun.com/                                                                                                     http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/img_analysis.html
                                                                                                                             i                                                                                                                         u
                                                                                                                               http://www.microsoft.com/                                                                                                 http://rana.lbl.gov/
                                                                                                                             j                                                                                                                         v
                                                                                                                               http://www.apple.com                                                                                                      http://www.mysql.com/
                                                                                                                                                                                                                                                                                                                                                                           A comparison of microarray databases

                                                                                                                             k                                                                                                                         w
                                                                                                                                http://www.sigenetics.com/Products/GeneSpring/index.html                                                                  http://www.php.net/

                                                                                                                                                                                                                       Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
Gardiner-Garden and Littlejohn

                                    supplied URLs for all these cross-                                     Table 8. All of these developers indicated
                                    referenced databases (Table 6).                                        that their databases will support the
MAML                                                                                                       MAML exchange format. Access to all
                                    SYSTEM AND DATA                                                        these databases requires a web browser.
                                    REQUIREMENTS                                                           GeneX also requires (and supplies) a
                                    When acquiring a microarray databases                                  client-side curation tool, written in Java
                                    for their site, users require information                              1.3 (H. Mangalam, personal
                                    about the extent to which a certain                                    communication).
database management
                                    database is compatible with their existing
                                    systems (scanning software or computer                                 CONCLUSIONS
                                    systems), and the software/hardware that                               Each of the databases surveyed is suitable
                                    they would need to install. Developers                                 for different members of the microarray
                                    who indicated that their database was (or                              community. Scientists who wish to
                                    will be) available for distribution were,                              acquire database technology for their
                                    therefore, questioned about the database                               organisation may wish to investigate the
                                    management system (DBMS) and                                           ¯at ®le database, AMAD (UCSF), or the
operating systems                   operating systems on which their database                              more comprehensive databases GeNet
                                    is known to function, the main                                         (Silicon Genetics), GeneX (NCGR),
                                    development technologies used and the                                  maxdSQL (University of Manchester) and
                                    formats used for uploading and                                         SMD (Stanford University). Other users
                                    downloading data and whether they                                      may wish to wait to evaluate

                                                                                                                                                                        Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
                                    intend to support the MAML standard                                    GeneDirector (BioDiscovery), NOMAD
                                    when it is ®nalised. Almost all surveyed                               (UCSF), SMD (Stanford University) or
                                    indicated that they would support this                                 the `more robust' version of mAdb (NCI)
                                    standard. The survey answers are reported                              (all of which should be released in early
                                    in Table 7.                                                            2001). All of the above databases, with the
upload and download                    The upload and download formats for                                 exception of GeneDirector and
formats                             databases that are available for public data                           maxdSQL, use a web browser for
                                    deposition and/or queries are described in                             querying the data, but they differ greatly

Table 8: Comparison of upload and download formats of microarray databases available for public data deposition and/or

 Database        Clients                                 Will support  Upload format                                      Download format
                                                         MAML standard

  ArrayExpress   Web Browser                             Yes                     MAML formata                             XML
 ChipDB          Web Browser                             Yes                    N/A                                       Tab delimited
 GeNet           Web Browser                             Yes                    From GeneSpring                           HTML, tab delimited
 GeneX           Web Browser. Curation Tool is a Java Yes                       The Curation tool reads tab-delimited     Tab delimited; Values formatted for direct
                 1.3 client-side application not an applet                      ®les from almost any scanning             reading into xgobi; Values formatted for
                                                                                package, and encodes them as              reading into R statistical program; HDF5 in
                                                                                GEML/XML to upload to database            future; GEML
 GEO             Web Browser                             Yes                    Imports tab-delimited text in ad          Tab delimited; XML planned.
                                                                                hoc format. XML planned.
 GXD             Web Browser                             Yes                     MAML (ideally)a                          Tab delimited; XML planned
 RAD             Web Browser                             Yes                    N/A                                       Tab delimited; HTML; XML planned.
 SMD             Web Browser                             Yes                    N/A                                       CDT ®le, similar to ScanAlyze output
                                                                                                                          format; tab delimited

Databases described in italics will be open for public microarray data deposition and/or searches in near future.
N/A: Not applicable because no public data deposition.
  Will import from other databases rather than directly from scanning package.

156                    & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001
                                                                                         A comparison of microarray databases

             by the method by which data are                                             display of genome-wide expression patterns',
             deposited.                                                                  Proc. Natl Acad. Sci. USA, Vol. 95, pp. 14863±
                For those wishing to publish
             microarray data, the following public                                  5.   Tamayo, P., Slonim, D., Mesirov, J. et al.
                                                                                         (1999), `Interpreting patterns of gene
             databases will be of interest. GEO (NCBI)                                   expression with self-organizing maps: Methods
             is an international site for the publication                                and application to hematopoietic
             of general microarray data and                                              differentiation', Proc. Natl Acad. Sci. USA, Vol.
                                                                                         96, pp. 2907±2912.
             ArrayExpress (EBI), if implemented fully,
             will ful®l a similar international function.                           6.   URL: http://www.umanitoba.ca/faculties/
             GXD (the Jackson Laboratory) is an                                          bioinformatics/lec12/lec12.1.html
             international database, for gene-                                      7.   URL: http://www.mged.org
             expression data from mouse, which
                                                                                    8.   URL: http://www.ncgr.org/research/genex/
             currently provides various forms of non-                                    other_tools.html
             array gene-expression data and will store
                                                                                    9.   Velculescu, V. E., Zhang, L., Vogelstein, B.
             microarray data in the future. Microarray                                   and Kinzler, K. W. (1995), `Serial analysis of
             users may also choose to deposit data into                                  gene expression', Science, Vol. 270, pp. 368±
             GeneX (NCGR), which will accept                                             369.
             selected public data, or GeNet (Silicon                                10. URL: http://www.sigenetics.com/
             Genetics). Both GeneX and GeNet                                            Downloads/index.html
             support private user-accounts. All of these                            11. URL: http://www.openssl.org/
             databases can be queried to provide                                    12. URL: http://www.sourceforgenet/

                                                                                                                                             Downloaded from bib.oxfordjournals.org by guest on May 12, 2011
             information to the public. Three other                                 13. URL: http://www.gnu.org/copyleft/
             databases surveyed can also be queried by                                  lgpl.html
             the public: ChipDB which specialises in                                14. URL: http://www.bioinf.man.ac.uk/
             Affymetrix data from MIT/Whitehead,                                        microarray/maxd/LICENCE
             RAD which specialises in data on                                       15. URL: http://www.ncbi.nlm.nih.gov/
             pancreatic development and malaria, and                                    BLAST/
             SMD which contains data for Stanford                                   16. URL: http://rana.lbl.gov/
             University and their collaborators.                                    17. URL: http://www.biodiscovery.com/
                                                                                    18. URL: http://www.sigenetics.com/Products/
             Acknowledgements                                                           GeneSpring/index.html
             We thank the people that participated in this
                                                                                    19. URL: http://www.ii.iub.no/~bjarted/
             survey: Paul Spellman, Alvis Brazma, Peter Young,                          jexpress/
             Harry Mangalam, Alexander Kuklin, Tony Lialin,
             Anoop Grewal, Alex Lash, Martin Ringwald, John                         20. URL: http://genex.ncgr.org/
             Powell, David Hancock, Chris Stoeckert, Mike                           21. URL: http://www.research.att.com/areas/
             Cherry and Gavin Sherlock. We are grateful to                              stat/xgobi/
             Andrew Hamilton and Matthew Hobbs for helpful                          22. URL: http://pyrite.cfas.washington.edu/orca/
                                                                                    23. URL: http://cran.r-project.org/
                                                                                    24. URL: http://www.research.att.com/
             References                                                                 ~andreas/xgobi/
             1.    Duggan, D. J., Bittner, M., Chen, Y. et al.                      25. URL: http://rana.stanford.edu/software/
                   (1999), `Expression pro®ling using cDNA
                                                                                    26. URL: http://genome-www.stanford.edu/
                   microarrays', Nature Genet., Vol. 21
                   supplement, pp. 10±14.
                                                                                    27. URL: http://linus.nci.nih.gov/~brb/
             2.    Lipshutz, R. J., Fodor, S. P. A, Gingeras, T.
                   R. and Lockhart, D. J. (1999), `High density
                   synthetic oligonucleotide arrays', Nature                        28. URL: http://www.lecb.ncifcrf.gov/
                   Genet., Vol. 21 supplement, pp. 20±24.                               MAExplorer/
             3.    Schuchhardt, J., Beule, D., Malik, A. et al.                     29. URL: http://linus.nci.nih.gov/~brb/tool.htm
                   (2000), `Normalization strategies for cDNA
                                                                                    30. URL: http://www.partek.com/
                   microarrays', Nucleic Acids Res., Vol. 28, p. 47e.
                                                                                    31. URL: http://www.informatics.jax.org/
             4.    Eisen, M. B., Spellman, P. T., Brown, P. O.
                   and Botstein, D. (1998), `Cluster analysis and                   32. URL: http://www.geneontology.org/

& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001             157
Gardiner-Garden and Littlejohn

                             33. URL: http://www.allgenes.org/                                      37. URL: http://young39.wi.mit.edu/
                             34. URL: http://www.plasmodb.org/
                                                                                                    38. URL: http://www-genome.wi.mit.edu/
                             35. URL: http://ep.ebi.ac.uk
                             36. URL: http://www.mips.biochem.mpg.de/

                                                                                                                                             Downloaded from bib.oxfordjournals.org by guest on May 12, 2011

158             & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 143±158. MAY 2001

ghkgkyyt ghkgkyyt