EBI Hinxton EMBL Bank a k a The EMBL

Document Sample
EBI Hinxton EMBL Bank a k a The EMBL Powered By Docstoc
					                                                                                                                         EBI Hinxton




EMBL-Bank a.k.a. The EMBL Nucleotide Sequence
Database
Team leader:                Günter Stösser
Coordinator:                Mary Ann Tuli
Senior database curators:   Wendy Baker, Maria Garcia Pastor,Tamara Kulikova
Database curators:          Philippe Aldebert, Nicola Althorpe*, Paul Browne, Guy Cochrane*, Ruth Eberhardt*, Nadeem
                            Faruque*, Michelle McHale*, Sandra van den Broek, Robert Vaughan
Data assistants:            Yvonne Allen, Karyn Duggan
                            *Indicates part of the year only




Scope                                                                mitters, while automatic procedures allow incorporation of
                                                                     sequence data from large-scale genome sequencing centres
The EMBL Nucleotide Sequence Database (Stösser et al.,
                                                                     and from the European Patent Office (EPO). Database
1999) incorporates, organises and distributes nucleotide
                                                                     releases are produced quarterly. Network services allow
sequences from all available public sources. The database is
                                                                     free access to the most up-to-date data collection via FTP,
located and maintained at the European Bioinformatics
                                                                     email and World Wide Web interfaces. EBI’s Sequence
Institute (EBI) near Cambridge, UK.
                                                                     Retrieval System (SRS) integrates and links the main
Databases operated at the EBI include the EMBL Nucleotide            nucleotide and protein databases plus many other special-
Sequence Database (a.k.a. EMBL-Bank), protein databases              ized molecular biology databases. For sequence similarity
SWISS-PROT & TrEMBL (Bairoch et al., 2000), InterPro                 searching, a variety of tools (e.g. Fasta, BLAST) are available
(Apweiler et al., 2001), the Macromolecular Structure                which allow external users to compare their own sequences
Database (E-MSD), ArrayExpress for gene expression data              against the latest data in the EMBL Nucleotide Sequence
(Brazma et al., 2002), ENSEMBL (Hubbard et al., 2002) for            Database and SWISS-PROT. All resources can be accessed
automatic genome annotation plus several other databases             via the EBI home page at http://www.ebi.ac.uk.
many of which are produced in collaboration with external
groups.                                                              Growth
In an international collaboration with DDBJ (Japan)(Tateno           EMBL-Bank Release 73 (Dec 2002) contains more than 20
et al., 2000) and GenBank (USA) (Benson et al., 2000), data          million sequence entries comprising 28 Gigabases. Over the
are exchanged amongst the collaborating databases on a               last 12 months the database has nearly doubled in size (com-
daily basis to achieve optimal synchronisation. Webin is the         pare Dec 2001: 15 Gigabases).Within a 12 month period the
preferred web-based submission system for individual sub-            database size has increased from about 12.9 million entries




                                                                                Figure 1.

                                                               227
EMBL Research Reports 2002



comprising 13.8 Gigabases (Release 68, September 2001) to              Just try to imagine, where molecular biology and genome
20 million entries and over 28 Gigabases (Release 73,                  research would be today without free access to the com-
December 2002). The database growth has been a direct                  prehensive collection of nucleotide sequences and biological
consequence of ongoing collaborations with sequencing                  annotations provided by EMBL-Bank in collaboration with
projects like the Mouse Genome Sequencing Consortium                   DDBJ and GenBank for nearly two decades now.The up-to-
(MGSC), the International Anopheles Genome Project and a               date repository of primary nucleotide sequences is an essen-
growing number of other genome sequencing groups pro-                  tial requirement for further computational analysis and
ducing large quantities of new sequence data. During the               genome research and todays' molecular biologists depend on
same period the number of organisms represented in the                 free access to all nucleotide sequence data available world-
database has risen by ~25% to over 100000 species.                     wide. Discovery of novel genes, identification of homologous
                                                                       genes, analysis of alternative splicing and detection of poly-
Submissions to EMBL-Bank                                               morphisms are only some of the uses of the database in the
Three major sources contribute to the EMBL database: indi-             context of biomedical research, and this will only increase as
vidual scientists, who submit data directly to the database,           large-scale sequencing efforts keep on depositing more high-
genome project groups which produce very large volumes of              throughput sequence data (HTG) and as more finished
nucleotide sequence data over an extended period of time,              genomes are being added to the database.
including bulk submissions of ESTs, STSs, GSSs or large
genomic records (high-throughput and finished data) and                The International Nucleotide
sequence data from the patent literature via the European              Sequence Database Collaboration
Patent Office (EPO). Researchers submitting new sequences              DDBJ (Japan), GenBank (USA) and EMBL exchange new and
directly to the EMBL database use either the Internet                  updated data on a daily basis to achieve optimal synchroni-
(WEBIN) or a stand-alone software tool (SEQUIN).WEBIN                  sation. The three databases adhere to a set of documented
is an Internet based system developed at the EBI for sub-              guidelines (The DDBJ/EMBL/GenBank Feature Table
mission of nucleotide sequences to EMBL-Bank. Detailed                 Definition) which regulate the content and syntax of the
information for submitters is available from the EBI WEB               database entries.These guidelines ensure that the data con-
pages (URL: http://www.ebi.ac.uk/Submissions/index.html)               tinue to be made available in a format that can be exchanged
or the reference card ‘Quick Guide to Sequence                         efficiently between the databases, is compatible with current
Submissions’ edited by EMBNET.                                         bioinformatics software and reflects developments in the
                                                                       fields of molecular and general biology.
Webin
'Webin' is EMBL's preferred submission system for                      This strong and successful collaboration is based on daily
nucleotide sequences and biological annotation information.            interactions between database staff as well as working meet-
Webin is designed to allow fast submission of single, multiple         ings amongst the databases.
or very large numbers of sequences (bulk submissions) and              Annual meetings of database representatives provide a
is available at URL: http://www.ebi.ac.uk/embl/Submission/             forum for technical discussions concerning detailed aspects
webin.html                                                             of the work. In May 2002, the Int. Nucleotide Sequence
                                                                       Databases Collaborative Meeting was held in Mishima
Genome Project Submissions
                                                                       (Japan) followed by the Int. Advisors Meeting. Discussions
EBI staff work closely with sequencing centers to ensure               between database staff and the International Advisors to
timely incorporation of new data into EMBL-Bank for public             DDBJ/EMBL/GenBank resulted in the publication of the
release. Database entries produced at the research site are            'Nucleotide Sequence Database Policies' (Brunak et al.,
deposited and updated directly by the genome project sub-              2002).
mitters using FTP or email. Groups producing large volumes
of genome sequence data over an extended period of time                The EMBL/DDBJ/GenBank Feature Table is an ongoing col-
are encouraged to submit to an EMBL submission account                 laborative project. Format and definitions of the biological
and are are advised to contact the database at:                        annotation used in common by the International Nucleotide
datasubs@ebi.ac.uk.                                                    Sequence Database Collaboration in the creation and main-
                                                                       tenance of DDBJ, EMBL and GenBank sequence records is
Alignment Submissions                                                  constantly revised and updated. Annual versions of the
'Webin-Align' (Lombard et al., 2002) is a web-based submis-            International Feature Table Definition Document are pro-
sion tool for submission of multiple sequence alignments in            duced and maintained at the EBI. Version 5.0 of the Feature
all common alignment formats. 'Webin-Align' is available at            Table documentation has been made available from the EBI
http://www.ebi.ac.uk/embl/Submission/align_top.html. EMBL-             WWW and FTP servers in December 2002.
Align is a public dataset of multiple sequence alignments and          The comprehensive sequence-based collaborative taxonomy
can be queried from the EBI SRS (Sequence Retrieval                    includes over 100,000 different species, aiming to centralise
System) server.                                                        the classification of all organisms appearing in the nucleotide
                                                                       sequence database. Entries are not released into the public
Why is it essential to submit new                                      domain until the sequenced organism is classified. Organisms
sequences and annotations?                                             are identified at the species level in the taxonomy database.

                                                                 228
                                                                                                                              EBI Hinxton



Integration with other databases                                          describe expressed sequence tag sites (ESTs) or unfinished
                                                                          high throughput genome sequences (HTGs). In particular, it
Interconnectivity between biomolecular databases has
                                                                          is essential to provide locations of coding regions, even when
become essential for utilising the wealth of information
                                                                          partial or preliminary, to allow inclusion of the correspon-
becoming available. Where appropriate, EMBL-Bank entries
                                                                          ding translated protein sequence in the protein databases
are cross-referenced to other databases like the Eukaryotic
                                                                          (TrEMBL and SWISS-PROT).
Promoter database TRANSFAC, Flybase,TrEMBL and SWISS-
PROT. SWISS-PROT itself is linked to many different data-                 Biological curators and database specialists manage the col-
bases thus providing a focal point for database interconnec-              lection and distribution of the EMBL nucleotide sequence
tivity. Cross-references to external databases are represent-             database, independent of whether sequence data and associ-
ed in the EMBL flat file line type 'DR' and where appropriate,            ated biological information are submitted via the WWW
at the feature level via the feature qualifier /db_xref. These            submission system 'Webin' or one of the specialized submis-
cross-references allow access to additional information that              sion procedures for genome project data. While genome
is more appropriately stored in other dedicated databases.                project submissions are processed in large numbers using
Example: /db_xref="SWISS-PROT:P28763"                                     semiautomated systems, all other types of sequence records
                                                                          are processed manually to ensure biological integrity and
In December 2002 EMBL-Bank included a total of 4,784,533                  internal consistency with annotation rules established by the
links to other databases.                                                 International Nuclotide Sequence Database Collaboration
New Cross-References to GOA (GO Annotation)                               DDBJ/EMBL/Genbank. During 2002, EMBL curators have
GOA (http://www.ebi.ac.uk/GOA/) is a project at the                       processed more than 85,000 sequences (not including
European Bioinformatics Institute to provide assignments of               genome project submissions) directly submitted by individual
gene products to the Gene Ontology (GO) resource. The                     scientists via Webin.
goal of the Gene Ontology Consortium is to produce a
dynamic controlled vocabulary that can be applied to all                  Protein translations
organisms. In the GOA project, this vocabulary will be                    Translation of all protein coding regions in EMBL entries
applied to a non-redundant set of proteins described in the               (represented by CDS features) are automatically added to
SWISS-PROT,TrEMBL and Ensembl databases that collective-                  the TrEMBL protein database. SWISS-PROT curators draw
ly provide complete proteomes for Homo sapiens and other                  from this pool to subsequently create the SWISS-PROT
organisms.                                                                database. EMBL nucleotide entries are cross-referenced (via
EMBL Release 73 includes more than 730,000 new cross-                     the /db_xref qualifier) to the TrEMBL and SWISS-PROT data-
references to GOA (GO Annotation)                                         bases.
   Example:                                                               Patent sequence data
   ID          HSFOS             standard; DNA; HUM; 6210 BP.
   XX                                                                     The EMBL database continues to collaborate with the
   AC          K00650; M16287;                                            European Patent Office to capture patent sequences from
   ...
                                                                          patent applications. American and Japanese patent sequence
   DR          EPD; EP11145; HS_FOS.
   DR          GDB; 119917; FOS.                                          data are integrated from NCBI (USA) and DDBJ (Japan).
   DR          GOA; P01100; P01100.                                       Release 73
   DR          SWISS-PROT; P01100; FOS_HUMAN.
   ...                                                                    (Dec 2002) includes more than 899,500 patent sequence
   FT          CDS               join(889..1029,1783..2034,               entries (compared to 429,000 in Dec 2001)
                                 2466..2573,2688..3329)
   FT
   FT
                                 /codon_start=1
                                 /db_xref="GOA:P01100"
                                                                          Major new developments in 2002:
   FT                            /db_xref="SWISS-PROT:
                                 P01100"                                  New CON(struct) Division
   ...                                                                    The new CON database division represents Con(structed)
                                                                          or Con(tig) sequences of chromosomes, genomes and other
Biological annotation and database                                        long sequences constructed from segment entries.
curation                                                                  Nucleotide sequence records in EMBL (as well as in DDBJ
In the era of high-throughput genome project data, the                    and GenBank) currently have a size restriction of 350,000
importance of careful curation of individual sequences                    bases. Sequences >350 kb are split into smaller segment
directly submitted by individual researchers and discussed in             entries prior to inclusion in the database. Segment entries
the scientific literature is obvious. It is such sequences, which         include sequence data, are assigned individual accession
have often been the subject of experimental research eluci-               numbers and are distributed in the appropriate taxonomic
dating features and function, while genome project submis-                divisions. In contrast, CON division entries do not contain
sions in most cases will 'only' include preliminary gene anno-            sequence data per se, but rather the assembly information
tations based on gene prediction programs. Sequence anno-                 including all accession.versions and sequence locations rele-
tation is an essential part of EMBL sequence records and                  vant in building the contig sequence. CON sequence entries
current database policy is to reject submissions for which no             follow the daily data exchange mechanism between
sequence annotation has been provided, unless these                       DDBJ/EMBL/GenBank.

                                                                    229
EMBL Research Reports 2002




                                                                                    Figure 2. Relevant sections of the Bacillus subtilis
                                                                                    CON entry providing construct information for the
                                                                                    assembly of the Bacillus subtilis bacterial genome
                                                                                    (4.2 Mbases) from segment entries (<350 kb).


CON entries are available in the EMBL.con file at                         WGS data are available via EBI's Sequence Retrieval System
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/, from the EBI             SRS and from EBI's FTP server at ftp://ftp.ebi.ac.uk/pub/data-
Genome Web server at URL http://www.ebi.ac.uk/genomes/                    bases/embl/wgs.
and also in the FTP Genomes directory                      at
ftp://ftp.ebi.ac.uk/pub/databases/embl/genomes/,       where              Third Party Annotation Dataset
CON entries are also complemented by EMBL flat-files                      (TPA)
including the complete DNA sequence and biological anno-
                                                                          Following a decision taken at the 2002 Collaborative
tation. For an example see ftp://ftp.ebi.ac.uk/pub/databas-
                                                                          Meeting, DDBJ/EMBL/GenBank are in the process of creat-
es/embl/genomes/Bacteria/bsubtilis/AL009126.embl.
                                                                          ing a Third Party Annotation dataset (TPA). Until now, the
Furthermore, the complete CON sequence is made available
                                                                          collaborative databases have built a comprehensive database
in Fasta format. Underlying segment entries are searchable
                                                                          of primary nucleotide sequences, resulting from direct
and retrievable via SRS and available for BLAST and FASTA
                                                                          sequencing of cDNAs, ESTs, genomic DNAs etc. Primary
homology searching.
                                                                          data are defined to be data for which the submitting group
Whole Genome Shotgun Sequences                                            has done the sequencing and biological annotation, and as
                                                                          'owner' of these data has privileges to submit updates/cor-
(WGS)                                                                     rections etc. In contrast, non-primary sequences are defined
Methods using whole genome shotgun data are now used to                   as sequences which a) consist exclusively of DNA from one
gain a large amount of genome coverage for an organism. A                 or several already existing entries 'owned' by other groups
complete list of WGS data currently available is maintained               or b) consist of a mixture of new primary & already existing
at http://www.ebi.ac.uk/genomes/wgs.html.                                 sequences.
Example  of    WGS              accession-number          format:         Categories of data submissions accepted for TPA include:
AAAA01000001.1
                                                                          a. re-annotation/analysis         of        sequence(s)from
WGS accession-numbers consist of                                             DDBJ/EMBL/GenBank
              <4 letters><2 digits><6 digits>.<version>
with          <4 letters> = set_id
                                                                          b. mixed primary/non-primary TPA sequence including
              <2 digits> = set_version                                       regions of new and existing sequence (e.g. filling the gaps
              <6 digits> = contig_id                                         with HTG or EST or newly sequenced data)
              <version> = sequence_version
                                                                          c. TPA sequences based on trace archive data
Oryza sativa L. ssp. Indica Whole Genome Shotgun sequences
                                                                          d. TPA sequences based on Whole Genome Shotgun
have been assigned the project accession-number
                                                                             (WGS) sequences Categories of data submissions that
AAAA00000000. The first version of the project has the
accession number AAAA01000000 and the first sequence in                      will be accepted for TPA include:
that set has acc#.version AAAA01000001.1 When a given                     Information on contributing sequences will appear in the
WGS project is updated, ALL contigs from the first version                EMBL TPA record with new flat-file line-types AH/AS like in
are replaced by ALL contigs from the second version.                      this example:


                                                                    230
                                                                                                                            EBI Hinxton


AH   TPA-SPAN    PRIMARY_IDENTIFIER    PRIMARY_SPAN    COMP            During 2002 the following EMBL-Bank releases have been
AS   1-426       AC004528.1            18665-19090                     produced.
AS   427-526     AC001234.2            1-100           c
                                                                       • Release 70, March 2002
The AH (Assembly Header) line provides column headings
for the assembly information. The AS (ASsembly) lines pro-             • Release 71, June 2002
vide information on the composition of the TPA sequence by             • Release 72, September 2002
listing base span(s) of the TPA sequence together with iden-
                                                                       • Release 73, December 2002
tifiers and base spans of contributing sequences.
                                                                       The according Release Notes are available from the EBI
TPA submission system                                                  URL: http://www.ebi.ac.uk/embl/Documentation/release_
EBI's submission system WEBIN has been customised to                   info.html
allow submissions of TPA sequences to the EMBL
Nucleotide Sequence Database. WEBIN is available at URL                EBI network services
http://www.ebi.ac.uk/embl/Submission/webin.html                        Network services allow access to the most up-to-date data
TPA release policy                                                     collection via the Internet. Data access to sequence data at
In order to assure that the sequence annotation is of high             the EBI is also granted via Email using the netserver or inter-
quality, we will be requiring, that the study has been pub-            actively via the WWW where the main service is composed
lished in a peer-reviewed journal before we release the data           of an SRS server. Additionally, databases as well as software
to the public.                                                         can be downloaded from the EBI's FTP server.

TPA data exchange                                                      Sequence Retrieval System (SRS)
TPA    sequences  are   exchanged     amongst  the                     The SRS server at the EBI integrates and links a comprehen-
DDBJ/EMBL/GenBank database collaboration on a daily                    sive collection of specialised databanks along with the main
basis, as a complement to the existing primary                         nucleotide and protein databases.The SRS system allows the
DDBJ/EMBL/GenBank database.                                            databases to be searched using, for example, sequence, anno-
TPA data access                                                        tations, keywords, author names. Complex, cross-database
                                                                       queries can also be executed and users should refer to the
The TPA data collection is available via the EBI FTP server at
                                                                       detailed instructions which are available online.
ftp://ftp.ebi.ac.uk/pub/databases/embl/tpa and also via EBI's
Sequence Retrieval System (SRS) at http://srs.ebi.ac.uk.               Sequence Searching
EMBL Sequence Version Archive                                          The EBI provides a comprehensive set of sequence database
                                                                       searching algorithms that can be accessed both interactively
(SVA)                                                                  from       the       EMBL-EBI         WWW         site     (URL:
In order to provide access to previous versions of database            http://www2.ebi.ac.uk/) or by email. EMBL may be searched
records, the EMBL database has established a Sequence                  as a whole or by individual taxonomic division. The most
Version Archive (SVA). Data in the EMBL nucleotide                     commonly used algorithms available are Fasta3 and NCBI-
sequence database change over time for a number of rea-                Blast2 . Fasta3 will find a single high-scoring gapped alignment
sons, e.g due to updates/corrections or extensions based on            between the query nucleotide sequence and database
new findings from more recent experiments. Each time data              sequences. Comparisons between a nucleotide sequence
in an entry are modified, the entry is assigned a new entry            and the protein databases can be made using fastx/y3, whilst
version number.                                                        tfastx/y3 allows comparisons between a protein sequence
Querying the EMBL Sequence Version Archive                             and the translated DNA databank. Ssearch3, the generic
                                                                       implementation of the Smith & Waterman algorithm for
The EMBL Sequence Version Archive is available from the EBI
                                                                       nucleotide and protein database searches is provided as part
web-servers at URL http://www.ebi.ac.uk/embl/sva/.
                                                                       of the fasta3 package. BLITZ (Bic_SW) facilitates more sen-
Entry(ies) are viewable by either accession number,                    sitive searches using the Smith and Waterman algorithm.
nucleotide sequence identifier and protein identifier.                 WU-Blast2 and NCBI-Blast2 are fast algorithms for
Query results options allow to                                         sequence searching that allow gaps, but which may find more
                                                                       than one match to the database sequences if multiple
a. show the complete history for an entry, i.e. all recorded
                                                                       domains exist.
   flat files matching the query criterion in chronological
   order or
                                                                       Sequence analysis
b. show a snapshot of the entry at a particular date. Query            Specialised sequence analysis programs are available from
   results will be presented in a table, listed EMBL entries           the EBI. Such services include multiple sequence alignment
   can be 'Viewed' on Screen' or 'Saved to File'.                      and inference of phylogenies using CLUSTALW, Gene pre-
c. show differences between entry versions                             diction using GeneMark, pattern searching and discovery
                                                                       using PRATT, as well as applications which have been devel-
Database releases                                                      oped in-house for various projects.

                                                                 231
EMBL Research Reports 2002



TITLE                                                                 URL
General
EMBL-EBI Home Page                                                    www.ebi.ac.uk/
EMBL Nucleotide Sequence Database                                     www.ebi.ac.uk/embl/
Documentation
EMBL Database Documentation Page                                      www.ebi.ac.uk/embl/Documentation/
Database User manual                                                  www.ebi.ac.uk/embl/Documentation/User_manual/
                                                                      usrman.html
Database Release Notes                                                www.ebi.ac.uk/embl/Documentation/Release_notes/current/
                                                                      relnotes.html
The DDBJ/EMBL/GenBank Feature Table Document                          www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_
                                                                      table.html
Taxonomy Database                                                     www3.ncbi.nlm.nih.gov/Taxonomy/tax.html
Example Database Entry                                                www.ebi.ac.uk/cgi-bin/emblfetch?x64011
WEBALIGN: Sequence Alignment Submissions                              www.ebi.ac.uk/embl/Submission/alignment.html
WebFeat: feature table keys/qualifiers definitions                    www3.ebi.ac.uk/Services/WebFeat/
Annotation Examples: EMBL entry examples.                             www3.ebi.ac.uk/Services/Standards/web/
DE line Standards: guidelines for entry definitions                   www.ebi.ac.uk/embl/Documentation/de_line_standards.html
Submissions
Submission of Nucleotide Sequence Data                                www.ebi.ac.uk/embl/Submission/
Information for Submitters Document                                   www.ebi.ac.uk/embl/Documentation/information_for_
                                                                      submitters.html
Vector Scanning prior to submission                                   www.ebi.ac.uk/blastall/vectors.html
WEBIN: web-based sequence submission system                           www.ebi.ac.uk/embl/Submission/
SEQUIN: stand-alone sequence submission tool                          www.ebi.ac.uk/Services/Sequin/
Genome Project Submission Account guidelines                          www3.ebi.ac.uk/Services/GenomeSubm/
WEBUP: sequence update form                                           www.ebi.ac.uk/Services/webin/update/update.html
Access
Access to servers, query tools and data archives                      www.ebi.ac.uk/embl/Access/
Sequence Retrieval Service (SRS)                                      http://srs.ebi.ac.uk/
FTP server                                                            ftp://ftp.ebi.ac.uk/pub/databases/embl/
Current Database Release                                              ftp://ftp.ebi.ac.uk/pub/databases/embl/release/
Sequence Tagged Sites (STS) resources                                 www.ebi.ac.uk/embl/Access/sts.html
Expressed Sequence Tag (EST) resources                                www.ebi.ac.uk/embl/Access/est.html
Third Party Annotation Database (TPA)                                 ftp://ftp.ebi.ac.uk/pub/databases/embl/tpa/
Sequences from the patent literature                                  ftp://ftp.ebi.ac.uk/pub/databases/embl/patent/
Genome Data
Completed Genomes Web Server                                          www.ebi.ac.uk/genomes/
Genome FTP server                                                     ftp://ftp.ebi.ac.uk/pub/databases/embl/genomes/
Whole Genome Shotgun Dataset                                          ftp://ftp.ebi.ac.uk/pub/databases/embl/wgs/
EnsEMBL: automated analysis of genome data                            http://ensembl.ebi.ac.uk/
Genome MOT: status of genome projects                                 www.ebi.ac.uk/Databases/Genome_MOT/genome_mot.html
Database Searching, Browsing and Analysis Tools
Access to searching, browsing and analysis tools                      www.ebi.ac.uk/Tools/


Table 1. EMBL-Bank internet-based resources including detailed information on submissions, data access, genome data as well as data-
base searching and analysis tools.




                                                                232
                                                                                                                            EBI Hinxton



Publications during the year                                           Bairoch, A. & Apweiler, R. (2000).The SWISS-PROT protein
                                                                       sequence database and its supplement TrEMBL in 2000.
Brunak, S., Danchin, A., Hattori, M., Nakamura, H., Shinozaki,         Nucleic Acids Res., 28, 45-48
K., Matise,T. & Preuss, D. (2002). Nucleotide sequence data-
base policies. Science, 298, 1333                                      Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,
                                                                       Rapp, B.A. & Wheeler, D.L. (2000). GenBank. Nucleic Acids
Lombard,V., Camon, E.B., Parkinson, H.E., Hingamp, P.,                 Res., 28, 15-18
Stösser, G. & Redaschi, N. (2002). EMBL-Align: a new public
nucleotide and amino acid multiple sequence alignment                  Brazma, A., Sarkans, U., Robinson, A.,Vilo, J.,Vingron, M.,
database. Bioinformatics, 18, 763-764                                  Hoheisel, J. & Fellenberg, K. (2002). Microarray data repre-
                                                                       sentation, annotation and storage. Adv. Biochem. Eng.
Stösser, G., Baker,W., van den Broek, A., Camon, E., Garcia-           Biotechnol., 77, 113-139
Pastor, M., Kanz, C., Kulikova,T., Leinonen, R., Lin, Q.,
Lombard,V., Lopez, R., Redaschi, N., Stoehr, P.,Tuli, M.A.,            Hubbard,T., Barker, D., Birney, E., Cameron, G., Chen,Y.,
Tzouvara, K. & Vaughan, R. (2002).The EMBL Nucleotide                  Clark, L., Cox,T., Cuff, J., Curwen,V., Down,T. et al., (2002).
Sequence Database. Nucleic Acids Res., 30, 21-26                       The Ensembl genome database project. Nucleic Acids Res.,
                                                                       30, 38-41
Other references                                                       Tateno,Y., Miyazaki, S., Ota, M., Sugawara, H. & Gojobori,T.
                                                                       (2000). DNA data bank of Japan (DDBJ) in collaboration
Apweiler, R., Attwood,T.K., Bairoch, A., Bateman, A., Birney,
                                                                       with mass sequencing teams. Nucleic Acids Res., 28, 24-26
E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning,
M.D., et al., (2001).The InterPro database, an integrated              Zdobnov, E.M., Lopez, R., Apweiler, R. & Etzold,T. (2002).
documentation resource for protein families, domains and               The EBI SRS server-new features. Bioinformatics, 18, 1149-
functional sites. Nucleic Acids Res., 29, 37-40                        1150




                                                                 233