Architecture - Download as DOC

Document Sample
Architecture - Download as DOC Powered By Docstoc
					ABBIF Proposal: Architecture
(Draft March 14, 2006)




Index

Introduction ............................................................................................. 1
     DarwinCore .................................................................................................................... 3
     ABCD – Access to Biological Collection Data ................................................................ 3
  Protocols for Data Exchange ............................................................................................. 4
     DiGIR ............................................................................................................................. 4
     BioCASe ........................................................................................................................ 4
     TAPIR ............................................................................................................................ 5
  GBIF Architecture .............................................................................................................. 5
  Network Infrastructure ..................................................................................................... 11
     Latin America ............................................................................................................... 11
     Brazil............................................................................................................................ 15
Analysis ................................................................................................ 16
  Peru ................................................................................................................................ 16
     Questionnaires ............................................................................................................. 16
     Information System ...................................................................................................... 17
  Venezuela ....................................................................................................................... 19
  Bolivia.............................................................................................................................. 20
  Colombia ......................................................................................................................... 21
     Questionnaire .............................................................................................................. 21
     Information System ...................................................................................................... 21
  French Guyana................................................................................................................ 22
  Ecuador ........................................................................................................................... 23
  Brazil ............................................................................................................................... 23
     Collections ................................................................................................................... 23
     Information Systems .................................................................................................... 25
     Strategic Plan .............................................................................................................. 29
Strategy: Proposed Network ................................................................. 31
  Elements of the Architecture............................................................................................ 32
     ABBIF coordination ...................................................................................................... 32
     Data Providers ............................................................................................................. 32
     Portal ........................................................................................................................... 34
     Resource Registry & Discovery .................................................................................... 35
     Tools ............................................................................................................................ 35
     Data archive ................................................................................................................. 35
Proposal ................................................. Error! Bookmark not defined.
  Participants (to be confirmed) ............................................. Error! Bookmark not defined.
  Workshop Program............................................................. Error! Bookmark not defined.
Annex 1: Answers from Collections of Colombia ................................... 36
Introduction
There are a number of possibilities to design an information system when its data is actually
produced and shared by different parties. Basically, a system can be centralized, distributed, or
combined (mixed), with a number of variations.
A centralized system (figure 1) is recommended when data providers do not have the necessary
infrastructure (hardware, software, connectivity) or expertise or even when data will only be
produced for that particular system.




                                                                                            User




                                                                 Central System

                      Data Providers



                             Figure 1. Diagram of a Centralized Information System
By adopting this architecture, data providers don’t need to store any local data and they usually
interact with an administrative interface to manage everything remotely. They also have to agree to
a common format and content to be implemented in the central database. The great advantage is
the low demand on informatics that will be imposed on data providers and the fact that developers
will have a very controlled system to work on. The challenge is to maintain data providers actively
validating and updating their data.
A distributed architecture is a system where the data is distributed but the query is centralized
(figure 2) or where both data and query are distributed (figure 3).



                              Col 3                                                           Col 3
                                                 Col 4                        Col 2                            Col 4
              Col 2



                                                         Col 5      Col 1                                              Col 5
     Col 1




                             Central
                            Repository




                             program                                                         program




                interface                                                       interface

                                         query                                                         query




 Figure 2. Distributed data: centralized query                         Figure 3. Distributed data and query



                                                                                                                               1
Advantages include “real time” updating, clarity as to who the data provider is, and the possibility of
a closer interaction between data providers and users. Disadvantages include the greater demand
on infrastructure and expertise of each data provider and the complexity of developing and
maintaining a distributed system.
The proposal is that ABBIF focuses on species and specimen data. Data will include specimen
records in biological collections, observation data of field surveys, and taxonomic names. A
strategy for each data component must be established.
The choice of the best architecture depends on the existing infrastructure and expertise of each
data provider and custodian. Besides that, biological collections hold their data using different
software in different operational systems, different formats, and recording different data elements
(figure 4).


                                                      Data Model
                                                         Linux
                                                         MySQL                 Win98
                                 Win98
                                                                               biota     FreeeBSD
                                 Access
                   Win2000                               Col 3                           PostgreSQL
                   Brahms            Col 2                                  Col 4

                         Col 1                                                         Col 5




                                              Communication Protocol




                                                        programa




                                          interface
                                                                   buscar




          Figure 4. Diagram showing the complexity of integrating data from biological collections
In order to integrate these systems it is necessary that data providers agree to use a common data
exchange model.
To determine the best architecture to be proposed for the ABBIF network, it is important to study:
   What standards and protocols are available;
   What standards and protocols are the existing networks, of direct interest to ABBIF,
     adopting; and,
   What is the situation of local data providers and custodians concerning infrastructure and
     expertise.


Standards
The adoption of standards and protocols for the exchange of data and information about
biodiversity is fundamental for the development of interoperable systems. In general, one can
define a standard as “something established by authority, custom, or general consent as a model
or example”1. A communication protocol can be defined as a formal description of rules and
message formats that two systems must adopt to communicate and interact. Perhaps the most
important and known protocols are TCP/IP (Transmission Control Protocol / Internet Protocol),


1
    Merriam-Webster Online Dictionary (www.webster.com)


                                                                                                      2
SMTP (Simple Mail Transfer Protocol), POP (Post Office Protocol) and IMAP (Internet Message
Access Protocol). This group represents the basis for all data transmission through the Internet.
Standard languages such as HTML (Hyper Text Markup Language) and XML (eXtensible Markup
Language) are also important as they define rules for formatting the vast majority of documents
through the Internet.
An important group that is discussing and developing standards and protocols for data on species
and specimens is TDWG (International Working Group on Taxonomic Databases)2.
TDWG’s mission is to:
   To provide an international forum for biological data projects;
   To develop and promote the use of standards; and
   To facilitate data exchange.
A number of working groups have been established within TDWG to develop and promote the use
of standards and protocols. Of immediate interest to ABBIF we include: DarwinCore; ABCD –
Access to Biological Collection Data; DiGIR; BioCASe; and TAPIR.

DarwinCore3
DarwinCore (DwC) is a standard that began to be developed within the scope of the Species
Analyst network based at the University of Kansas Natural History Museum and Biodiversity
Research Center. The idea was to define common data fields to all taxonomic groups and this way
standardize the integration of primary data of biological collections. This standard uses XML
(defined by an XML-Schema) and is being used by most networks such as GBIF4, MaNIS
(Mammal Networked Information System)5, OBIS (Ocean Biogeographic Information System6),
speciesLink7 in Brazil, among others.
It is based on a non-hierarchical set of data elements which include: InstitutionCode,
CollectionCode, CatalogNumber, ScientificName, BasisOfRecord, Kingdom, Phylum, Class, Order,
Family, Genus, Species, Subspecies, ScientificNameAuthor, IdentifiedBy, YearIdentified,
MonthIdentified, DayIdentified, TypeStatus, ColectorNumber, FieldNumber, Collector,
YearCollected, MonthCollected, DayCollected, JulianDay, TimeOfDay, ContinentOcean, Country,
StateProvince, County, Locality, Longitude, Latitude, CoordinatePrecision, BoundingBox,
MinimumElevation, MaximumElevation, MinimumDepth, MaximumDepth, Sex, Preparationtype,
IndividualCount,   PreviousCatalogNumber,          RelatedCatalogNumber,       RelatedCatalogItem,
RelationshipType, Notes, DateLastModified. The standard accepts extensions that have been
proposed for geospatial, curatorial, paleontology, microbial, and observation data8.

ABCD – Access to Biological Collection Data9
ABCD is a highly structured standard for data about objects in biological collections. Its objective is
the same as DarwinCore, except with much more detail as it has around 500 elements against 50
elements of DarwinCore. There are specific elements for observational data sets and for the
following types of collections:
     Herbaria and Botanical Gardens


2
    http://www.tdwg.org/
3
    http://darwincore.calacademy.org
4
    http://www.gbif.net
5
    http://elib.cs.berkeley.edu/manis/
6
    http://www.iobis.org/
7
    http://splink.cria.org.br
8
    http://darwincore.calacademy.org/Extensions/
9
    http://www.codata.org/taskgroups/TGbiocollection/


                                                                                                     3
         Zoological Collections
         Culture Collections
         Mycological Collections
         Plant Genetic Resources
         Paleontological Collections
This data model is being used by the Biological Collection Access Service for Europe, BioCASE10.
As DarwinCore it uses XML (defined through an XML Schema). ABCD version 2.0611 has been
recommended by the TDWG meeting in St. Petersburg as the adopted version of the standard and
has since then been ratified by TDWG members.


Protocols for Data Exchange
Networks that serve data from biological collections, besides using a standard data model (such as
DarwinCore and ABCD) also require a protocol for transferring data.

DiGIR12
One of the first networks of biological collections to be developed as a distributed system was The
Species Analyst (TSA), at the end of the 90’s. TSA used the ANSI/NISO Z39.50 protocol which
was first adopted in 1988 and was used to interconnect libraries. It defines a communication
standard between computers to retrieve information. An important characteristic is the fact that it
supports a client-server environment which allows the separation of the user interface from the
data server. Z39.50 has also been implemented on a range of platforms. Whilst Z39.50 was an
effective solution, there were some issues with the protocol that convinced Species Analyst
network developers to study another solution. At the time, the protocol was found to have a
complicated specification, which meant a very steep learning curve for developers. Conceptual
schemas were not defined with a formal language such as XML Schema; and at the time, there
was limited support for XML and Unicode
In order to address these issues, developers of the Species Analyst network and a number of
people involved with the TDWG13 held a small workshop in Santa Barbara to start discussing a
solution to replace Z39.50 for the biodiversity informatics community. The goal was to develop a
protocol that was based entirely on the use of XML documents for messaging between clients and
data providers, with a data transport mechanism that was predominantly based on HTTP. DiGIR
was designed to offer the same capabilities as Z39.50 except using simpler technologies and a
more formal specification for description of information resources. The result is a distributed
information retrieval solution that provides an easy entry for participation in distributed information
networks.
DiGIR became operational in 2003 and was adopted by a number of networks such as The
Mammal Networked Information System (MaNIS), the Ocean Biogeographic Information System
(OBIS), the Global Biodiversity Information Facility (GBIF), and the speciesLink Network in Brazil.

BioCASe14
The Biological Collection Access Service for Europe (BioCASE), a network of biological collections,
adopted ABCD as the concept schema, and for this purpose modified the DiGIR protocol to meet
its needs. This modified protocol is known as the BioCASE data transmission protocol or just


10
     http://www.biocase.org/
11
     http://www.bgbm.org/TDWG/CODATA/Schema/
12
     http://www.digir.net/
13
     www.tdwg.org/
14
     http://www.biocase.org/dev/protocol/index.shtml


                                                                                                     4
simply BioCASE. The protocol is based on the DiGIR protocol, but was forced to incorporate some
BioCASE-specific changes that unfortunately make the two incompatible.

TAPIR15
In 2004 GBIF promoted a study to develop a new merged protocol that would meet the needs of
both DiGIR and BioCASE networks (Döring & Giovanni, 2004). This protocol was named TAPIR
(TDWG Access Protocol for Information Retrieval) and shall be tested in 2006. It is expected that
both networks, BioCASE and those that have adopted DiGIR, migrate to the new protocol. The
new protocol is being tested by implementing it in two data provider software packages,
representing each of the existing network communities, BioCASe (the BioCASe PyWrapper
software ) and DiGIR (a new Java provider package currently named DiGIR2). A detailed TAPIR
specification document is also being developed.


GBIF Architecture
We have discussed possible architectures (centralized, distributed, and combined or mixed) and
standards and protocols that are being adopted internationally. Another important feature of this
analysis is to observe what GBIF, that is openly serving species and specimen data on the
Internet, is using. GBIF plays a fundamental role as it is the global initiative that is integrating
species and specimen data worldwide. Whatever architecture and strategy is adopted by ABBIF
must be compatible with this initiative.
In 2003 GBIF established its “architecture fundamentals” which are important and relevant when
designing an information facility (see GBIF Biodiversity Data Architecture, 200316). The basic
principal was not to impose any specific software or technology, but having the access to
biodiversity data as its key goal.
The document presents as basic principals:
   Free access to data: this implies that any restrictions must be carried out at the data provider
     level, the system would not control user access to data;
   Support for global users: the idea is to enable the implementation of different human
     languages in presentation services;
   Consider human and machine users: the system would be implemented to be accessed by
     web browsers and web services;
   Consider structured and unstructured data: the document acknowledges the importance of
     defining both structure and content of data (fundamental for interoperability and machine
     analysis) but also includes that it is important to make unstructured data available;
   Reusable, replaceable, and redundant components: the idea is to develop a framework
     where new data providers can be rapidly added; promote the maintenance of persistent data
     sources, as opposed to databases where their lifetimes are tied to a project; planning for
     redundancy, replicating working components to different locations across the globe; and
     adopting an open technology framework, where operating systems, database management
     systems, web servers, programming languages, and other tools are a choice to be made by
     each participant according to existing needs and skills.
GBIF has developed a network based on nodes (figure 5).




15
     http://ww3.bgbm.org/tapir
16
  http://circa.gbif.net/irc/DownLoad/kjeFA-
J1mmGHrfOtAyTZ74s8jUwq9HoJ/p6hpeSGHkYZQWMiF42pMFYPs7fCtNHv-
/GBIFBiodiversityDataArchitecture-v0.7-draft.pdf


                                                                                                   5
                          Figure 5. GBIF Network: major classes of nodes
GBIF is responsible for running the network, establishing standards, and developing tools. The
portal is the hub for the development of any service that must be centralized such as the registry of
metadata and for serving data from the biodiversity data index to the end user. GBIF participants’
nodes are established to share biodiversity data. They may be gateways to data nodes or data
nodes themselves. They may also provide services such as mapping, analysis, and hosting of
orphaned data sets. Data nodes are primary providers of data.
When GBIF was first designed, key elements of the Portal were the Biodiversity Data Index and
the Taxonomic Name service (figure 6).




                                Figure 6. Diagram of the GBIF portal




                                                                                                   6
The Biodiversity Data Index holds a subset of the data held by the data nodes and includes
specimen identifiers associated with identification, geospatial and temporal information.
Centralization of these subsets of data supports a much more rapid response to user queries,
minimizing network traffic. Although taxonomic names provide the primary organizational structure
for biodiversity data, no complete catalogue of names is available today. This is an ever evolving
task which requires international collaboration. GBIF is also involved in a number of initiatives to
create web services such as mapping, georeferencing, and data cleaning. This portal presently is
much more complex and figure 7 presents a diagram of how the future portal is expected to
operate.




                                                                                                  7
                         Figure 7. GBIF’s data portal deployment model
The central column represents functions which should be executed centrally (marked as GBIF
Secretariat). The components involved in delivery of services to end users and portals are shown
as replicated to a number of mirror sites. The Master Data Store needs to be implemented in a


                                                                                              8
single location (and should at least be associated with a "Master" instance of the Despatcher
component, but the Crawler and Validation Chain components could also be mirrored for
efficiency.
The existing GBIF UDDI registry would need significant enhancement before it could properly
support the process illustrated here.
The Schema Repository should be developed in close conjunction with the TDWG Technical
Architecture Group and can initially be represented by a small stub implementation that offers
equivalent function to the rest of the Data Portal.
The Crawler corresponds largely to the Indexer component of the existing prototype Data Portal. It
includes a scheduler which identifies data resources which should be indexed or checked for
updates and develops an appropriate strategy in each case for accessing modified data. It should
maintain a "map" monitoring the progress made in indexing any resource so that the process can
be interrupted and restarted, and also so that data providers can be notified of any records from
their resource which could not be accessed for any reason. The data offered by the Service
Registry will provide the basis for the Crawler's activity (including endpoint URLs, protocols and
data standards supported, acceptable times and days for crawling each provider's data, any
agreements made with providers as to how much data the Data Portal should cache in the Master
Data Store, etc.). The Crawler should process the data retrieved by placing an object into the
Validation Chain for each record found (new and modified records; also objects indicating the
completion of an indexer operation for a given provider to allow for clean-up of obsolete records,
etc.).
The Validation Chain corresponds largely to the Data Validation Services described in the GBIF
Data Portal Strategy, but also includes some other function from the Indexer component of the
current prototype Data Portal. This is a configurable workflow component that allows a range of
processing steps to be applied to each object placed into the chain. The exact steps will vary
according to the nature of the record concerned. It will include the generation of a series of
annotations to the object based on routines to validate or interpret the data in the record. The aim
is to reach the end of the Validation Chain with a clear understanding of what the record represents
in as much detail as possible, including an evaluation whether there are ambiguities or problems
with any of the data elements. By the end of the chain, all objects should be in a form that can
readily be stored in the Master Data Store.
The Despatcher is a new addition to the model to ensure the greatest possible flexibility in how the
Data Portal may operate. The key role of this component is to forwarding the objects from the
Validation Chain into the Master Data. It will however be the natural point to process information
which should be included in a report to each data provider at the end of each visit to index their
data. Upon further review and discussion with GBIF stakeholders (including data providers) a
range of other notification services could be implemented at this point (e.g. forwarding objects or
notifications to thematic and regional portals whenever records appear which are of interest to
those portals; management of notifications to users of the addition of data relating to their taxa of
interest). Such extensions would be a future option, but the development of a generic Despatcher
will make this easy.
The Master Data Store (Data Index) is implemented as a database used solely for managing the
best possible overview of the data in the GBIF network and does not itself support requests from
users or remote portals. All such requests will be made against Slave Data Stores maintained by
MySQL replication.
The Access Portal is a layered application making use of Hibernate to access data from a Slave
Data Store and including a Service Layer implementing all logic associated with the Data Portal's
processing of data for display. Axis will be used to provide an XML access interface to the methods
offered by the Service Layer. These methods will be those required to develop an HTML User
Portal based on the GBIF data. Axis will allow these same methods to be exposed easily as SOAP
web services for use by other portals. This interface will represent a "GBIF Native Portal Interface"
which will not always map directly to TDWG standards (since frequently only a tiny number of data
elements are needed and these should be combined in different ways from the standards).
Additional access interfaces (TAPIR, WFS, etc.) can also be implemented and exposed from the


                                                                                                   9
Service Layer. The Data Portal's own HTML User Portal and User Services will be implemented by
a JSP layer based on the XML Data Services (the "GBIF Native Portal Interface").
Mirroring will be implemented by a combination of multiple DNS records and Apache redirection.
But GBIF is more then just a portal. Figure 8 shows GBIF’s data-exchange architecture.




                       Figure 8. Diagram of GBIF’s data exchange architecture
This diagram emphasizes GBIF’s basic layers, as follows (from the bottom up):
    Resources - There is an increasingly large number of digital resources relating to
      biodiversity. These may be in just about any format (various databases with all kinds of data
      models; human readable text documents in various formats; images; etc.) and may or may
      not yet be connected to the Internet.
    Access – To make these resources accessible in a practical way, it is important to select a
      limited number of agreed transfer protocols and formats to expose them on the Internet.
      GBIF has adopted various TDWG data standards and protocols for this purpose
      (DiGIR/BioCASe/TAPIR, Darwin Core, ABCD, Taxon Concept Schema) and also expects to
      handle access through plain URLs where appropriate or via Globally Unique Identifier (GUID)
      resolution services as these are agreed and implemented.
    Discovery – Once these resources are available on the Internet, it is important to advertise
      them to potential users. GBIF has established a (UDDI) registry for this purpose to store
      information describing the content and access interfaces for resources and to allow GBIF
      and others to find resources of interest for various purposes. GBIF has been operating its
      registry for over two years and plans soon to replace the existing implementation with one
      that offers richer function for describing resources and for searching for resources of interest.
      Other registries may be developed to meet the interests of different networks and
      communities. These may still benefit from the use of the same protocols and data standards
      adopted by the GBIF network (access to reusable software components; ability for some
      resources to be part of both networks; etc.).




                                                                                                    10
      Indexing – Within a large distributed network, it is important to maintain a dynamic map of
       the content of the network at a finer level of detail than is possible with the metadata stored
       in a service registry. GBIF is therefore developing a central index of biodiversity data by
       crawling the contents of resources registered in the UDDI registry. This index will itself be
       exposed through a range of web services to allow users to get rapid answers to many basic
       questions and to provide pointers to relevant data records throughout the network. Again it is
       likely that other groups may develop their own special-purpose indexes based on the
       underlying infrastructure, benefiting from the common core of standard access mechanisms
       and discovery services.
      Presentation & analysis – These underlying layers should provide a common set of core
       services suitable for GBIF and others to build a wide range of applications and portals. GBIF
       will continue to develop a central portal for rapid discovery of basic information, but other
       groups may develop more specialized portals which integrate information from the central
       GBIF index and all the underlying network resources with other information managed by the
       groups concerned. Since the interfaces to the GBIF index and other resources will be
       exposed as web services, it will also be possible to include these data within workflow
       applications of various kinds.
In general, GBIF expects ultimately to see increasing diversification at the higher levels in this
diagram, but strongly encourages the shared use of as many of the lower layers as makes sense
in each case. Its goal is to support the replication of the GBIF data services on a regional basis to
ensure that the information from the GBIF registry and index are available for inclusion within local
applications and portals.
We believe that ABBIF must follow GBIF’s general concept of a Network Portal with data nodes
or data providers and participant nodes that encourage local participation and may themselves
act as data nodes. It is important to analyze the answers to the questionnaires to identify local
institutions that already are GBIF nodes or that may contribute to the network.


Network Infrastructure
Another important element that helps define the architecture is the existing or potential
communication infrastructure. The present analysis is based on the document Redes Nacionais de
Educação e Pesquisa: Situação no Brasil e América Latina17 written to offer subsidies to Brazil’s
national strategy for biological collections.

Latin America
The digital divide is something that concerns scientific research due to our ever increasing
dependency on network infrastructure and on information and communication services that many
times are not available in developing countries. Latin American countries present a very
heterogeneous and fragile situation especially when compared to more developed regions. 10
Latin American countries hold operational academic networks, the best being located in México,
Brazil, and Chile. In Colombia, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Paraguay
and Peru the academic networks are still in an organization phase.
The Americas Path (AMPATH) project led by Florida International University in 2001 established a
high performance exchange point in Miami, Florida to facilitate peering between U.S. and
international research and education networks (figure 9).




17
     http://www.cria.org.br/cgee/documentos/redesALC310505.doc


                                                                                                    11
Figure 9. Diagram of the international, high-performance research connection point in Miami, Florida
                                              (AMPATH)
Recently, in 2004, with the support of the European Commission (@LIS program) another network,
RedCLARA, began to operate and will include 18 countries. This is certainly a milestone in Internet
connectivity in Latin America. Besides facilitating the development of new networks this certainly is
an opportunity to build common research agendas of regional and global interest.
Figure 10 presents a diagram of the network.




                                                                                                  12
                            Figure 10. ALICE Project and Red CLARA
The topology for RedCLARA includes connections of 155 Mbps with the main national networks
(Argentina, Chile, Brazil and Mexico) and of 10 to 45 Mbps to the other South American countries.
Peru and Uruguay were recently connected and the next will be Costa Rica, El Salvador,
Nicaragua, Guatemala, Panama and Ecuador. Connections to Bolivia, Ecuador and Colombia are
planned. A connection of 622 Mbps leaves Brazil and interconnects RedCLARA to the GÉANT
network of research and education in Europe.
Tables 1 and 2 present a synthesis of the situation of national research and education networks
(NRENs). When comparing data from more developed with developing countries it is clear that the
situation of Latin America is not good, especially when one considers new developments and
applications that are adequate in environments with good infrastructure but may be prohibitive in
less developed countries. Table 2 also shows what countries will gain with RedCLARA.




                                                                                              13
Table 1. National Research and Education Networks of some countries
Country      Organization            Status      Connectivity                External             Connected Institutions
                                                 Backbone                    Capacity


Germany      G-WIN                   operating   2.5 – 10 Gbps               US - 2 x 2.5 Gbps    550
                                                                             EU - 5 Gbps
Korea        KREONET2 /              operating   2.5 – 10 Gbps               US - 2 Gbps          277
             KOREN                                                           EU - 155Mbps
                                                                             Japan - 2 Gbps
Holland      SURFnet5                operating   10 Gbps                     US - 10 Gbps         150
                                                                             EU - 10 Gbps
Poland       PIONER                  operating   10 Gbps                     US - 2 Gbps          21 Metropolitan Network
                                                                             EU - 10 Gbps         5 High Performance Computing
                                                                                                  Centers
France       RENATER                 operating   2.5 Gbps                    EU – 10 Gbps         50
                                                                             US – 4 x 2.5
                                                                             Gbps
                                                                             CA – 2 x 1 Gbps
USA          Internet2 / Abilene     operating   155 Mbps – 10 Gbps          EU – 10 Gbps         220
                                                                             Asia – 10 Gbps
Source: ICFA SCIC Report – Networking for High Energy and Nuclear Physics, February, 2004

Table 2. National Research and Education Networks of Latin America
Country        Organization        Situation        Connectivity                        External Capacity             Connected
                                                    Backbone                                                          Institutions
                                                    2004              2005              Before          After
                                                                                        CLARA           CLARA
Argentina      RETINA              operating        256 Kbps a        45M               59 Mbps         +45 Mbps      56
                                                    34 Mbps
Bolivia                            under            64   a   128                        1.5 Mbps        In            18
                                   development      Kbps                                                negotiation
Brazil         RNP                 operating         34M a 622        Up to 10 Gbps     555Mbps         +155 Mbps     220
                                                    Mbps              to 10 states
Chile (*)      REUNA               operating        155 Mbps          1 Gbps            45 Mbps         90 Mbps       14
Colombia       Universidad    de   Under            2      Mbps-                        34 Mbps         In            43
               Cauca               development      34Mbps                                              negotiation
Costa Rica     CR2 Net             operating        45 Mbps           45 Mbps           8 Mbps          45 Mbps       8
Cuba           REDUNIV             Under            64Kbps-2                            6 Mbps          In            23
                                   development      Mbps                                                negotiation
Ecuador        REICYT              operating        128    Kbps-      45 Mbps           8 Mbps          16 Mbps       20
                                                    5Mbps
El             RAICES              Planning phase                                                       10 Mbps       9
Salvador
Guatemala      RAGIE               Planning phase                                                       In            7
                                                                                                        negotiation
Honduras       RHUTA               Planning phase                                                       In            -
                                                                                                        negotiation
México         CUDI                operating        2Mbps a 155       2*1 Gbps          3*155           45 Mbps       60
                                                    Mbps                                Mbps
Nicaragua      RENIE               Planning phase                                                       In            8
                                                                                                        negotiation
Panama         REDCYT              operating        2-5 Mbps                            45 Mbps         +10 Mbps      8
Paraguay       ARANDU              Planning phase   128Kbps-          Up to 155         2 Mbps          12 Mbps       37
                                                    10Mbps            Mbps to 2
                                                                      sites
Peru           RAAP                operating        10 Mbps           45 Mbps           45 Mbps         45Mbps        8
Uruguay        RAU                 operating        64 Kbps a 1       Up to 100         6 Mbps          18 Mbps       46
                                                    Mbps              Mbps to 12
                                                                      sites
Venezuela      REACCIUN            operating        26 Mbps                             53 Mbps         + 45 Mbps     78
Source: CLARA.

It is important to observe that in the global scenario NRENs are constantly evolving and achieving
higher levels of connectivity to meet the requirements of new applications developed by research
and education institutes worldwide. Countries that are catching up, such as Korea have their
backbones at a Gbps level. In these countries one can also see the greater level of investments to
guarantee a good connectivity in the extremes of the network (end-to-end). Latin American
countries are still in the Mbps level (one thousand times less) with the exception of Brazil, Mexico
and Chile. When compared to developed countries one can state that Latin America is in the



                                                                                                                                     14
situation these countries were 5 years ago. This is certainly a constraint to international
cooperation in the field of science and technology.

Brazil
Brazil has its Research and Education Network (RNP) installed since 1989. RNP integrates all 26
Brazilian states and its capital through a backbone of up to 10 gigabits per second. São Paulo, Rio
de Janeiro, Minas Gerais and Brasília, are on a backbone of 10 Gbps; while Rio Grande do Sul,
Santa Catarina, Paraná, Bahia, Pernambuco and Ceará, at 2,5 Gbps. The rest of the states are
connected through links of up to 34 Mbps. It is expected that the whole network will be operating at
gigabits by the year 2007 (figure 11).




                                         Figure 11. RNP Backbone
The national network (RNP) links about 300 universities, research institutions and federal
agencies. Integrated to the national network are the state networks that distribute the network from
the state presence point of RNP. The most important state networks are Santa Catarina, Paraná,
Rio de Janeiro, and São Paulo. In the case of São Paulo, there is also a Research and
Development Program (TIDIA)18 in different areas of information and communications technology,
telecommunications and computer networks, associated with the advanced internet.




18
     Tecnologia da Informação no Desenvolvimento da Internet Avançada (www.tidia.fapesp.br/portal)




                                                                                                     15
Analysis
The questionnaire that was sent out to evaluate the situation of data providers and custodians of
the region included questions of relevance for the definition of the best architecture, such as:
    Standards Used
                Data Model: Darwin core, ABCD, CABRI, Others (specify)
         Protocol: DiGIR, BioCASE, Z39.50, http, xml, Others (specify)
    Existing infra-structure: Hardware and software:
    Staff: Adequate, Insufficient (specify)
    Internet Access:
        Type of Internet Access: None, Modem, dedicated line.
    Data and Information Access Policy: Unrestricted access, Restricted access
    Willingness to participate in this project.
All questionnaires from collections from countries located in the Amazonian region were analyzed.
Collections that don’t wish to participate in ABBIF or that don’t want to share their data were not
included. Institutions that don’t have specimen data were also not included.


Peru
Questionnaires
CRIA sent out the ABBIF questionnaires to 5 institutions and 15 individuals in Peru and received 7
answers. The answers from 6 institutions were sent by Siamazonia and 1 answer was sent from a
private collection. Table 3 shows the result of the questionnaire concerning standards, protocols,
infrastructure, Internet access and information policy.

                       Table 3. Answers to the questionnaire from institutions in Peru
Collection            total records   Digitized (no. &   Georeferenced     total      Digitized    Georeferenced
                                      % of total)        (no. & % of       records    (no. and     (no. and % of
                                                         digitized)        Amazon     % of         digitized)
                                                                                      total)
Siamazonia                   60.000            60.000            30.000      60.000       60.000          30.000
Herbário MOL-FCF             11.428             6.857                         5.000
Herbário                    130.000            32.500            22.750
Amazonese
Herbário Regional
de Ucayali
Herbário Herrerensi           6.000             3.300             2.640       5.000       3.200          0 - 2500

UNMSM (11                 1.500.000           200.000                       400.000      40.000
collections)
Personal collection         100.000             5.000             5.000       5.000
of leaf beetles and
their host plants

Total (without            1.747.428     247.657 (14%)       30.390 (12%)    415.000     100.000     30.000 (30%)
Siamazonia                                                                                (24%)




                                                                                                                    16
                         Standards &        Infrastructure                         Internet    Information    observation
                         Protocols                                                 access      Policy
Siamazonia               DarwinCore,        Sufficient hardware, require           512 Kbps    unrestricted   gbif node
                         DiGIR, http, xml   software for data analysis,
                                            mirroring, Arc IMS, sufficient staff
Herbário MOL-FCF                            Require more disk space, sufficient    dedicated   unrestricted
                                            staff                                  line
Herbário Amazonense      DarwinCore,        computers, disk space, camera,         256 Kbps    unrestricted
                         http, xml          require staff for the collection
Herbário Regional de                                                                                          No answers
Ucayali
Herbário Herrerensi      DarwinCore,        computers, disk space, camera,         512 Kbps    unrestricted
                         http               scanner and software for collection
                                            management, insufficient staff
UNMSM (11                                   computers, camera, scanner,            dedicated   unrestricted   the museum has
collections)                                memory (servers and software for       line                       not adopted
                                            image editing), require people for                                standards
                                            digitization
Personal collection of
leaf beetles and their
host plants


Information System
Peru is in a very good situation as it has developed Siamazonia19, the information system for
biological and environmental diversity of the Peruvian Amazon (Sistema de Información de la
Diversidad Biológica y Ambiental de la Amazonía Peruana). Siamazonia was created in 2001
through the BIODAMAZ project (Proyecto Diversidad Biológica de la Amazonía Peruana), an
agreement between Peru and Finland, and was developed by the Instituto de Investigaciones de la
Amazonía Peruana (IIAP). IIAP is a GBIF node and therefore is a natural partner of the ABBIF
network.
Its structure is based on nodes, similar to GBIF. The following diagram was taken from its website:




                                  Figure 12. Structure of the Siamazonia Network




19
     www.siamazonia.org.pe/


                                                                                                                            17
In the diagram, the facilitating node is IIAP that has committed itself for long-term development and
maintenance of secretarial, technical, and administrative tasks of the system. Principal nodes are
universities or their museums, research institutes, and other institutions with valuable information
resources and interest in participating in the development of the system. Their representatives
(IIAP and principal nodes) constitute the Steering Committee, which is the major decisive body of
the system. Additional nodes may include a broad category of institutions of interest to the network,
but that don’t fulfill the requirements of principal nodes.
IIAP produced a technical document presenting an overview of the architecture for the planned
Peruvian Amazonian Biodiversity and Environmental Information System (IIAP, 2004)20. This
document is a result of five regional workshops held during the months of March and April, 2001.
Besides its proposed node structure, the document also presents a diagram of the information
system (figure 13) where it includes a linkage to GBIF.




                    Figure 13. General Structure of the Information System (IIAP, 2004)
The document also states that databases in general are of free access.
Siamazonia is already serving data to GBIF using DiGIR. One resource is Observations of flora y
fauna of Peruvian Amazon by BIODAMAZ project with 477 records and 112 taxons and the other
resource is Information of Flora and Fauna in Varzeas (Peruvian Amazon) with 11.009 records and
3.218 taxons.


20
  Sistema de Información de la Diversidad Biológica y Ambiental de la Amazonía Peruana (SIAMAZONIA),
                                                     http://www.iiap.org.pe/biodamaz/faseii/download/literatura_gris/2.pdf
Serie IIAP-BIODAMAZ, ISBN N° 9972-667-10-3, 2004.


                                                                                                                      18
Venezuela
Questionnaires were sent to 21 institutions and 28 individuals and 11 answers were received.
Table 4 and 5 present the answers to the questionnaire as to available data, digitization, facilities,
and policy.

Table 4. Answers to the questionnaire concerning no. of records and digitization

                                                 No.
                                                 Records                           No. Rec.   georef.   Digit.
Acronym         Checklists      group            (total)     Georef.   digitized   Amazon     Amazon    Amazon    Software


                                                                                                                  BIOTA
COP                             birds              80.000     80.000     80.000                                   database

                Reports to
                MARN on
                diverse
EBRG            vertebrates     vertebrates        61.529      6.000      6.000       7.404       740     7.404   Excel
                Checklists of
PORT            Amazonian                                                                                         Visual Basic,
(BioCentro)     phanerogams     phanerogams       100.000                 7.500                                   Access
ecoSIG -
interested      Databases on    maps, lists of
in              Amazonian       species                                                                           Sistema de
participating   amphibians,     (amphibian,                                                                       Información
as data         birds and       birds and                                                                         Geográfica,
custodians      mammals         mammals)                                                                          Arc-View
                                phanerogams,
GUYN                            cryptogams         18.625     13.000                                              Access
BioCentro -
Museo de
Zoologia                        fish               53.000     40.000     53.000      10.000    10.000    10.000   Specify
                                plants, fungi,
VEN                             algae             350.000     35.000   113.000       27.200              27.200   Access
                                phanerogams,
                                vertebrates,
MHNLS                           invertebrates     190.000    190.000   190.000        5.399     5.399     5.399   WinISIS
                                terrestrial
MBUCV                           vertebrates        14.898     10.000     14.898       1.628     1.628     1.628   Excel
                                                                                                                  PostgreSQL,
MIZA                            insecta          2.500.000     5.289      5.289    500.000        768      768    PHP, Excel
                                amphibian,
ULABG                           reptiles               200         0         0         200
Total                                            3.368.252   379.289   469.687     551.831     18.535    52.399
%                                                  100,0%     11,3%     13,9%       16,4%       3,4%      9,5%

An important feature is that, with the exception of one institution, all are in the process of digitizing
their holdings. 14% of the over 3 million specimens are digitized, with a very high percentage of
georeferencing (over 80% of the digitized records). If one doesn’t consider MIZA’s collection of
insects (2.5 million of which less then 0.2% is digitized), we are considering over 850 thousand
specimens, of which more than 460 thousand records (or more than 50%) are digitized.




                                                                                                                             19
Table 5. Answers to the questionnaire concerning on-line data
                                                      Adequate
                                         Standards    Hardware
                                         &            &          Adequate   Internet    Restricted   Willingness to
 Acronym         available online        Protocols    Software   Staff      Access      access       participate
                                                                                                     not before knowing
                                                                                                     ABBIF conditions and
 COP             no                                   yes        yes        ABA 256     yes          aims
 EBRG            no                                   no         no         none                     yes
 PORT
 (BioCentro)     no                                   no         no         none        yes          yes
 ecoSIG -
 interested
 in
 participating
 as data                                                                    100
 custodians      http://ecosig.ivic.ve                yes        yes        Mbps        no           yes
                                                                            dedicated
 GUYN            no                                   no         no         line        yes          yes
 BioCentro -
 Museo de
 Zoologia        no                      DIGIR        yes        no         none        yes          yes
 VEN             no                                   no         no                     no           yes
 MHNLS           no                                   yes        yes        192/128     no           yes
 MBUCV           no                                   no         no         none                     yes
                 http://www.miza-        Darwin
 MIZA            fpolar.info.ve          Core, http   no         no         none        yes          yes
                                                                            dedicated                yes, if of mutual
 ULABG           no                                   no         no         line        yes          benefit

Although the digitizing process seems to be in place, the same does not apply to on-line availability
of data. Venezuela, unlike Peru, does not hold a GBIF node or have a local organization working
on an information system to integrate biodiversity data. Two institutions indicated that they have
data on-line: Ecosig, a geographic information system; and the Museo del Instituto de Zoología
Agrícola (MIZA). MIZA is the collection with the largest holding (2.5 million specimens - insects) of
which only 0,2% is digitized.
Venezuela shows a need for resources to digitize data and also to develop a system to make
biodiversity data available on the Internet. There is a project that was recently approved on the
development of an integrated information system for vertebrate collections in Venezuela. The
project involves the following institutions: Museo de Historia Natural La Salle (MHNLS); Museo de
Biología de la Universidad Central de Venezuela (MBUCV); Museo Estación Biológica Rancho
Grande (EBRG); and the Colección Ornitológica Phelps (COP), all of which answered the ABBIF
questionnaire and together are responsible for 350 thousand specimens. The focal point of this
project is MHNLS. We believe that this project can learn from experiences such as GBIF, CRIA
and Siamazonia and use all open source developments that are available.


Bolivia
The questionnaire was sent to 14 institutions and 13 individuals and CRIA received answers from
one herbarium and two zoological collections all from the Museo de Historia Natural Noel Kempff
Mercado (MHNNKM)21.




21
     http://www.museonoelkempff.org


                                                                                                                         20
                     No.                                   Staff,                                Willingness
                 specime               Data                Software &                            to
Collection            ns    Software   online   Protocol   Hardware     Internet   information   participate
Herbario del
Oriente
Boliviano          65.000   Excel      no       http       Inadequate     ADSL     Restricted    yes
Zoological
collection         92.970   Excel      no       http       Inadequate     ADSL     yes           yes
Entomological
collection        500.000   Excel      no       http       Inadequate     ADSL     yes           yes
Total             657.970

There was no information on the percentage of digitized or georeferenced data. The Museum’s
website holds information on research that is being carried out, on maps and also presents lists of
species from the project of the Fundación para la Conservación del Bosque Seco Chiquitano,
Cerrado y Pantanal Boliviano (FCBC). Data of the collections’ holdings are not available on-line.


Colombia
Questionnaire
Based on the web survey on possible data providers that was carried out in the beginning of the
project, 124 questionnaires were sent out to Colombian institutions and 133 to individuals. Only 9
answers were received, but, as is the case of Peru, Colombia has a GBIF node, the Alexander von
Humboldt Biological Research Institute. They were contacted directly and carried out a very good
survey on 93 institutions involving 29 herbaria, 60 zoological and 9 microbial collections. These
institutions together hold a total of 4.071.632 records, more then 50% digitized and about 10% of
the digitized records georeferenced. Only approximately 2% of the records are from the Amazon
region, but this is undoubtedly an important initiative to be sponsored (tables in annex 1).

Information System
The Alexander von Humboldt Biological Research Institute are natural partners of ABBIF as they
are a GBIF node and are responsible for the Biodiversity Information System SIB (Sistema de
Información sobre Biodiversidad22) that is being developed in Colombia.
SIB is being implemented as a distributed network. Humboldt is the leading institution, and is
officially23 responsible for its design, implementation and general coordination. The structure
includes a Technical Committee that is responsible for:
     defining general aspects of a national policy for biodiversity data and information
       management;
     validating technical elements and providing recommendations as to the implementation of
       SIB at a local, regional, and national level;
     establishing a line of capacity building, replicating the SIB model and promoting expertise in
       other entities; and,
     facilitating the articulation of SIB with other information initiatives at the national, regional,
       and global lever.
The technical committee today is composed of members of the following institutions:
   Instituto Amazónico de Investigaciones Científicas – SINCHI
   Instituto Alexander von Humboldt
   Instituto de Hidrología Meteorología y Estudios Ambientales – IDEAM
   Instituto de Investigaciones Marinas y Costeras José Benito Vives de Andréis – INVEMAR
   Instituto de Investigaciones Ambientales del Pacífico – IIAP


22
     http://www.siac.net.co/Home.php
23
     Ley 99 de 1993 y los decretos reglamentarios 1600 y 1603 de 1994


                                                                                                               21
      Instituto de Ciencias Naturales de la Universidad Nacional – ICN
      Ministerio de Ambiente, Vivienda y Desarrollo Territorial
SIB is also composed of regional and thematic networks.
The data model that is being used is DarwinCore V2 standard (as the minimum acceptable
content) and the Estándar para intercambiar información al nivel de organismo designed by the
project team24. The interface for distributed searches is not available. SIB communication protocol
was being developed concurrently with DiGIR, but it seems clear now that in order to share data
with other initiatives it is important to use a common protocol. GBIF recommended that SIB should
use TAPIR that should be ready for testing in the near future.
SIB (Dec 2005) has four datasets publicly available: Butterflies of the Schmidt-Mumm Biological
Collection (Humboldt), Pteridophyta of the FMB biological collection (Humboldt), Leguminosae of
the FMB biological collection (Humboldt) and Selected records from the National Herbarium of
Colombia (ICN-UN). According to Ángela M. Suárez-Mayorga25, standardization and proper
documentation of biological data and/or metadata are ongoing in nearly 30 organizations and four
networks of data administrators: Red Nacional de Observadores de Aves, Red Nacional de
Jardines Botánicos, Red de Colecciones Biológicas de los Andes and SIRAP-Eje Cafetero. All are
also documenting metadata for biological datasets, following the "Estándar para la documentación
de metadatos de conjuntos de datos relacionados con biodiversidad26. One problem that Humboldt
faces, as do many other data custodians, is in convincing data providers to openly share their data
on the Internet.


French Guyana
Although French Guiana is an overseas department of France and, consequently, is politically a
part of Europe, it is located in South America and for this reason will be included in this report as
an “Amazonian country”.
The “Herbier de Guyane (CAY)”, a Center of the Institute de Recherche pour le Developpment
(IRD) in Cayenne answered the questionnaire. The herbarium houses approximately 160,000
vascular plant, bryophyte, and fungal specimens collected in the Guiana’s area, mainly in French
Guiana, more than 125,600 of which are digitized in the AUBLET2 database (2,500 with digitized
images), ca. 90,000 georeferenced. Of the 160,000 specimens, 452 are nomenclatural types. IRD
is a member of the “Flora of the Guianas” consortium. Moreover, the Herbarium contributes to the
“Flora Neotropica” program (New York Botanical Garden) and to the “Checklist of the vascular
plants of the Guyana Shield” (Smithsonian Institution, Washington DC).
Of the 160 thousand specimens, 125,600 (78.5%) are digitized and 90,000 (56%) are
georeferenced. The database software used is Oracle and the data model RIHA (Réseau
Informatique des Herbiers Africains) that is compatible to ABCD. Biocase is used as a
communication protocol
The data is freely available on-line at http://www.cayenne.ird.fr/aublet2/ and the herbarium also
serves data through gbif (123,634 records).
The herbarium requires scanning equipment for digitizing images of the specimens, especially the
types. They are also interested in establishing a digitizing program of the non vascular plants and
of the Guiana Shield collections and would require additional staff.
The herbarium is open to collaboration with new partnerships of the Amazonian countries to share
experience and their data is available without restrictions. This data could therefore be immediately
available to ABBIF.



24
     http://www.siac.net.co/sib_descargas.php
25
     personal email January 10, 2006
26
     http://www.siac.net.co/sib_descargas.php


                                                                                                  22
Ecuador
The questionnaire was sent to 16 institutions and 23 individuals, based on the Internet survey.
Three answers were received, including 2 collections that have interest in participating in ABBIF.
Name coll.                        Acronym       Checklists        No.           %               No.             software
                                                                  Specimens     Digitalized     records
                                                                                                Amazon
Unión Mundial para la             UICN          UICN databases                                                  Access, Cold
Naturaleza                                                                                                      Fusion

Escuela Superior Politecnica      CHEP          TROPICOS                8.700           5.000          1.500    TROPICOS
de Chimborazo (ESPOCH),                         database                                                        (pick)
Herbarium
Pontificia Universidad Católica   Herbario      Checklists for        250.000          30.000         10.000    Filemaker
del Ecuador                       QCA           Equadorian                                                      Pro
                                                Angiosperms and
                                                Pteridophytes



Acronym       Data available      Standards      Hardware     Staff        Access        Information      Willingness
              online? URL         & Protocols    &                         to            policy           to
                                                 Software                  Internet      access           participate

IUCN                              http           Adequate     Inadequate   dedicated     Restricted       Yes
              www.sur.iucn.org                                             512



CHEP          no                  Tropicos       Inadequate   Inadequate   Modem         Unrestricted     Yes
                                  (Pick)
Herbario      no                                 Inadequate   Inadequate   Ethernet,     Unrestricted     Yes
QCA                                                                        54kbp




These collections require support to digitize and to make data available on the internet. IUCN,
although leading the conservation commons initiative27 does not have a clear data sharing policy in
Ecuador or have data on species readily available.
The Herbaria of the Pontificia Universidad Católica del Ecuador is a collaborator of Missouri
Botanical Garden together with the Herbario Nacional at Museo Ecuatoriano de Ciencias Naturales
and the Department of Systematic Botany of Aarhus University, in the Catalogue of the Vascular
Plants of Ecuador project28. A possible strategy may be the establishment of a regional server with
a portal at QCA to begin structuring local data and to serve data to the ABBIF network.


Brazil
Collections
Brazil has two very important projects underway that are of direct interest to ABBIF: the
speciesLink network29 and PPBio – MCT30, the biodiversity research program of the Ministry of
Science and Technology.
The speciesLink network involves 40 collections, one centralized information system of observation
data from São Paulo State (SinBiota31) and one centralized network with 9 microbial collections
(SICol32):

27
     http://www.conservationcommons.org
28
     http://www.mobot.org/mobot/research/ecuador/welcome.shtml
29
     http://splink.cria.org.br/
30
     http://ppbio.inpa.gov.br/
31
     http://sinbiota.cria.org.br/atlas/


                                                                                                                               23
Collections                      no. of records    Digitized (no. and % of total)   Georeferenced (no. and % of digitized)
Plants (herbaria, algae, wood)       1.430.250                    289.487 (20%)                              70.544 (24%)
Zoological collections                 935.523                    374.208 (40%)                             191.101 (51%)
Microbial Collections                    8.724                     8.724 (100%)                                    0 (0%)
Observation Data                        71.866                    71.866 (100%)                             71.866 (100%)
Total records                        2.446.363                    744.285 (30%)                             333.511 (45%)

The total number of records from the Amazon region in Brazil in the speciesLink network is
127.473 and from other Amazon Basin countries 122.624. Of the total (250.097), 128.097 are
georeferenced. It is important to stress the fact that there are some very specialized collections in
São Paulo with important holdings from the Amazon region such as
         the fish collection of the São Paulo State University Museum (MZUSP); and,
         The Bee Collection (RPSP) of the biology department FFCLRP/USP.
PPBio in its first phase is concentrating on the Amazon and the semi-arid regions. Partner
institutions include INPA – Instituto Nacional de Pesquisas da Amazônia; MPEG – Museu
Paraense Emílio Goeldi; and INSA-CF – Instituto Nacional do Semi-Árido Celso Furtado. These
institutions will have support to digitize and make their data available on-line.
The tables that follow present a summary of the status of the institutions that answered the
questionnaire.
Collection                 Plants    Animals      Micro.   total          Digitized (no. and % of   Georeferenced (no. and % of
                                                           records        total)                    digitized)
DZSJRP - pisces                            1                      7.500                     7.500                            4.684
UNIR - Fish                                1                     23.229                    23.229                           23.229
UNIR - Mammals                             1
(CRM)
MEFEIS                                     1                    10.200                      1.000
UFRR                             1                               2.751                      2.751
UFAM Coleção                               1                   191.992                          0                                0
Zoológica
MIRR                             1                               5.914                     4.666
INPA - Peixes                              1                    24.536                    17.000
INPA - CMIM                                           1          7.459                     7.459                                 0
INPA - Mammals                             1                     4.819                     4.819
INPA - invert.                             1                   303.015                     1.022
INPA - Amphi                               1                    13.500                    13.500                              6.750
INPA - Herbaria                  1                             215.000                   200.000                             86.000
INPA - Aves                                1                       631                       631                                400
JBRJ                             1                             410.000                    40.000
SPF - USP                        1                             145.000                    15.000                              7.000
Herbário MG                      1                             174.000                   165.000
Instituto Butantan -                       1                     9.298                     2.295                               593
IBSP
HRCB                             1                              40.000                     5.000                                 0
MPEG - Invert                              1                 2.000.000                    20.000
MPEG - Fish                                1                    11.000                     8.500
MPEG - Herp.                               1                    60.000                    58.000                              2.000
IPT - BCTw (xiloteca)            1                              19.500                     7.600                                  0
MPEG - Masto                               1                    34.000                    16.000                              1.000
MPEG - Coleção                             1                    74.965                    71.200
Ornitológica
INPA - xiloteca                  1                              10.392                      3.100
Total                            9        16          1      3.798.701              695.272 (18%)                    131.656 (19%)




32
     http://sicol.cria.org.br/cv/


                                                                                                                               24
Records from the Amazon Basin:
Collection           Plants   Animals     Micro.   total           Digitized (no.    Georeferenced (no.          on-line
                                                   records         and % of total)   and % of digitized)
                                                   amazon
DZSJRP - pisces                      1                     390                 309                      343         splink.cria.org.br
UNIR - Fish                          1                  23.229              23.229                   23.229                        no
UNIR - Mammals                       1                                                                           no (100% digitized -
(CRM)                                                                                                                           Word)
MEFEIS                               1                                                                              splink.cria.org.br
UFRR                      1                              2.751               2.751                                                 no
UFAM Coleção                         1                 191.992                   0                          0                      no
Zoológica
MIRR                      1                                                                                                      Nno
INPA - Peixes                        1                                                                                            no
INPA - CMIM                                   1                                                                                   no
INPA - Mammals                       1                   4.819               4.819                                                no
INPA - invert.                       1                                                                                            no
INPA - Amphi                         1                  13.500              13.500                    6.750                       no
INPA - Herbaria           1                                                                                                       no
INPA - Aves                          1                       631               631                         400
JBRJ                      1                                                                                         splink.cria.org.br
SPF - USP                 1                                                                                         splink.cria.org.br
Herbário MG               1                                                                                                        no
Instituto Butantan                   1                                                                              splink.cria.org.br
- IBSP
HRCB                      1                                                                                         splink.cria.org.br
MPEG - Invert                        1                                                                                             no
MPEG - Fish                          1                   8.000               8.000                                                 no
MPEG - Herp.                         1                  59.000              58.000                    2.000                        no
IPT - BCTw                1                                                                                         splink.cria.org.br
(xiloteca)
MPEG - Masto                         1                  32.000              16.000                                                 no
MPEG - Coleção                       1                  63.720              60.500                                                 no
Ornitológica
INPA - xiloteca           1                              8.300                                                                     no
Total                     9         16        1        408.332      187.739 (46%)              32.722 (17%)


Information Systems
Brazil has been actively involved in the discussions of the clearing-house mechanism of the
Convention on Biological Diversity and in the discussion of the establishment of biodiversity
information systems. Although the country is not a GBIF member, CRIA has been participating in a
number of international initiatives collaborating in the establishment of standards, protocols and
information systems. This experience and the products of this work will certainly contribute to the
establishment of ABBIF.
CRIA developed an information network to link data from biological collections located in the State
of São Paulo called speciesLink and a centralized information system called SinBiota to receive
data from surveys carried out by researchers financed through the Biota/Fapesp Program33. Both
of these developments are project based and were financed by The State of São Paulo Research
Foundation (Fapesp). Another system developed by CRIA is SICol, a centralized information
system with data from microbial collections of biotechnological interest.

SinBiota
SinBiota34 adopted a centralized model (figure 14). Data providers are individual researchers or
groups working in the field. It really doesn’t make any sense to expect each and every researcher
to maintain his/her data in a private information system, on the Internet, interoperable with a
number of databases of other researchers. So the natural strategy was to develop a centralized
database that could be fed by each researcher through a password-controlled web interface. A
common format for field records for all taxa was adopted and all use the same web interface to
enter, alter, or delete data from the database. Associated to the field record is a list of species. The


33
     www.biota.org.br
34
     http://sinbiota.cria.org.br/atlas/


                                                                                                                                  25
web server, besides freely and openly serving data to any internet user, integrates the database
with maps through the mapCRIA35 web service.



                                                                               Maps




                                                                               Map Service
                                 User                  Web server




                                                                              Database

                                                     Web Interface
                                 Researcher of                                Surveys and
                                 the Biota/Fapesp                             associated
                                 Program                                      lists of species




                                        Figure 14. Diagram of SinBiota’s architecture

SICol
Before defining the architecture for the microbial culture collection information system, a survey
was carried out to determine what infrastructure and expertise was available. Holdings of microbial
collections are small when compared to herbaria and most zoological collections. The survey
showed that most collections use spread sheets or text files to organize their data and have
problems such as lack of local expertise in informatics and inadequate Internet access. For these
reasons the option was the development of a centralized system (figure 15) where collections
could “deposit” their data. This system was named SICol36.




35
     http://www.cria.org.br/mapcria/doc/
36
     http://sicol.cria.org.br/


                                                                                                 26
                         Data providers
                       (culture collections)
                                                                         users

                                                                                    HTTP

                             administrative            Perl &            virtual
                               interface               Apache           catalog

                             updates                                  queries       SQL

                                                                 relational database
                                                                 PostgreSQL

                                       Figure 15. Diagram of SICol’s Architecture
A user friendly interface was developed to allow curators to simply upload a flat file with the data of
their holdings. An extension of DarwinCore for microbial data was developed based on CABRI37
guidelines for minimum, recommended and full data sets for catalogue production.

speciesLink
The third system developed by CRIA was speciesLink38. The aim was to integrate data from
biological collections located in the State of São Paulo that were willing to share their data. The
system to be developed should also be interoperable with SinBiota and The Species Analyst
Network. This was clearly the case of a distributed architecture, but it would also have to
acknowledge problems such as lack of expertise and poor Internet connectivity.
speciesLink is based on a DiGIR network, which typically involves 3 components:
    Presentation layer: the software that interacts with the user offering a friendly interface for
       queries and presentation of the results. This layer also interacts with the next layer, the
       portal.
    Portal: the portal is responsible for the distribution of messages. It is the software
       responsible for receiving queries from the presentation layer and distributing them to each
       data provider connected to the network. Communication with the providers is carried out
       using the DiGIR protocol.
    Provider: is the software responsible for receiving queries from the portal and translating
       them to the query language used by the local database. The translation process includes
       mapping of the local fields according to the conceptual schema used by the network.
The original idea would be to connect each collection directly to the portal through this protocol.
But due to lack of good connectivity, infrastructure and/or expertise, the solution found was to
develop regional servers that mirror the data held by these collections (figure 16).




37
     CABRI (Common Access to Biological Resources and Information) – www.cabri.org
38
     http://splink.cria.org.br/


                                                                                                    27
                                                 queries    virtual
                             users
                                                           catalog
                                                                                            HTTP / XML

                                                           DiGIR
                                                           Portal
                                                                                            HTTP / XML
                                                                                        DiGIR
                                                                                       Provider

                                                                                     Regional
                        DiGIR                 DiGIR          DiGIR                    Server
                       Provider              Provider       Provider
                                                                                                    SOAP



                                            Collections (data providers)

                                           Figure 16. Diagram of a combined system
For this architecture other interfaces were developed to read records and update the databases
held at the regional server. Filters that allow the curator to omit sensitive data and have full control
over the data he/she wishes to make freely available were also developed. Figure 17 presents the
diagram of the architecture adopted by speciesLink.

                   speciesLink site
                  speciesLink site
                                      Lib                               DiGIR
                                                                       DiGIR
                                     Lib
                                     DiGir                              Portal
                                   DiGir                               Portal
                   Presentation Layer
                  Presentation Layer



            Fast and stable connectivity



                      Collection A                                                                Regional Server

                                SQL                                                         SQL    Provider
                                            Provider                               Data
                   Data                                                          Postgres            PHP
                                              PHP



                   Collection
                  Management                                                        Mirror
                                                                                  SOAP server
                    System

            Slow or unstable connectivity

                       Collection B                                               Collection C

                                SQL                                                         SQL    spLinker
                                            spLinker
                    Data                                                            Data
                                              Java                                                   Java



                   Collection                                                      Collection        Data
                                              Data
                  Management                Repository                            Management       Repository
                    System                                                          System


                                  Figure 17. Diagram of the speciesLink Architecture




                                                                                                                    28
Another important feature of the speciesLink network was the development of a number of tools for
mapping, monitoring and data cleaning39. This increased the interaction between participating
collections and CRIA’s staff.
Figure 18 presents a diagram of the data cleaning process which is initiated every night. The
system identifies collections that have updated their databases and then runs the process. A report
for each collection is generated and made available on the web. Suspect records for names and
lat/long are highlighted and a number of diagrams and charts with the collection’s profile and data
cleaning progress are presented. All information is made publicly available40 so that users can also
evaluate the quality of each collection’s data.
                                                                                                                     out/2004



                                         Collections of São Paulo                            International Collections
                               Col 1       Col 2     Col 3   ...     Col n                Col 1    Col 2    ...     Col n




                                                                    spLink Portal
                                                                   spLink Portal
                                                                                  J ava

                                                                                  daily import of updates

                                                                Local database
                                                               Local database
                                           Suspect                                                Preparation of
                                                                       dc_tax
                                           records                     dc_geo
                                                                                                      diagrams
                                                   Perl                                               & profiles
                                                                             Pos tg reSQ L

                                                                                                           chart.pm (Perl)

                                        Tables with suspect
                                       Tables with suspect
                                              records
                                             records     PostgreS Q L




                                    Web




                                    Figure 18. Diagram of the data cleaning process
All developments carried out by CRIA use free and open source software (Intel hardware; Linux
Red Hat operating system; Apache web server; Perl, PHP and Java programming languages; and
HTTP, SOAP, XML and DiGIR protocols).

Strategic Plan
Another important study of relevance to this project, contracted by the Brazilian Ministry of Science
and Technology through the CGEE (Center for Strategic Management and Studies on Science,
Technology and Innovation) was the definition of a national strategy for the modernization of
Brazilian biological collections and the development of an integrated information system about
biodiversity.
The Brazilian Societies of Botany, Zoology, and Microbiology were invited to coordinate this
process together with CRIA. A number of documents were produced by specialists and were


39
     See the speciesLink data & tools page at http://splink.cria.org.br/tools
40
     http://splink.cria.org.br/dc


                                                                                                                                29
presented and discussed at a workshop held in June and the proposed strategy was presented at
a workshop held in July with approximately 80 participants, including visiting specialists from
abroad41. All documents are available on-line42 and present the state-of-art of biological collections,
information systems, and the Internet in Brazil.
The strategy for the establishment of a program for the next 10 years was presented and is being
discussed within the Ministry. There already are some concrete results of this work, such as:
    A call for proposals sent out by the Ministry of Science and Technology for biological
      collections that includes setting up on-line information systems, with a total budget of R$ 5
      million for 2 years.
    A Taxonomy Program established by the National Council for Scientific and Technological
      Development (CNPq)
Another interesting development is the replication of the speciesLink experience in other states.
The following networks shall be developed in 2006 using the same standards, protocols, and
decentralized architecture, all integrated with speciesLink:
   Parana Network of Biological Collections: this is one of the 8 projects approved in the recent
     call for proposals of the Ministry of Science and Technology and involves 8 collections from
     Parana State;
   Biota do Espírito Santo: with 16 collections from 3 institutions (Universidade Federal do
     Espírito Santo, Museu de Biologia Mello Leitão, and INCAPER – Instituto Capixaba de
     Pesquisa, Assistência Técnica e Extensão Rural).
Discussions have also begun with the state of Bahia and collections of the semi arid region of the
Northeast of Brazil.




41
     http://www.cria.org.br/cgee/col
42
     http://www.cria.org.br/cgee/col/documentos


                                                                                                    30
Strategy: Proposed Network
The analysis of existing experiences (GBIF, Siamazonia, Humboldt, and CRIA) with standards,
protocols, tools, and architecture indicate that there isn't a universal solution for all situations.
Technology for a truly distributed system exists and the speed of the Internet is increasing, but a
decision as to the architecture to be adopted (centralized, distributed, or combined) for each
situation will depend on an evaluation of the data provider and user, and on the available
resources (expertise, hardware, software, communication). Factors that are independent of the
architecture are that the data provider must have full control over his/her data/information and that
target users have complete access to the information they require in a format that they can use.
It is clear that the proposed architecture for ABBIF must reflect the aims of the project which
include:
     the establishment of an integrated regional information system for the Amazonian region,
       based on free and open access to taxonomic information and specimen data;
     the development of a system where each data/information provider or custodian will be fully
       responsible for his/her own data/information;
     the development of a system where each provider can undertake frequent updating;
     the development of a system that will help promote data validation;
     the development of a system where full attribution to data/information sources are given;
     strengthening of local stakeholders – biological collections and data custodians;
     strengthening and integration of existing information systems at local, national, and regional
       levels; and,
     integration of ABBIF with GBIF.
In order to propose a strategy it is important to think about the different actors that will compose the
network. The actors of ABBIF are:
    data providers;
    data custodians;
    users; and
    financing agencies
Data providers can be biological collections, researchers carrying out inventories, taxonomic
studies, etc, and researchers of other fields with complimentary data such as climate, vegetation,
satellite images, etc. They have a series of responsibilities within the network with include following
certain standards in registering data and metadata and attesting the quality of their. Biological
collections as data providers must also have a clear data and information policy, allowing free and
open access to data that is not confidential or sensitive.
Data custodians or administrators of databases and/or information systems have an important
role to play. Developing, running, and maintaining information systems is a highly professional
activity. It is not for amateurs. Data custodians therefore must be trustworthy and competent and
must participate in the development or at least adopt internationally accepted standards. They
have an important role to play in offering support to data providers as to the use of standards and
must promote the interoperability and integration of systems. Data custodians must guarantee data
integrity and respect any restrictions indicated by each data provider, protecting property rights,
confidentiality and other restrictions if necessary or pertinent. Data custodians are also responsible
for system back up, migration to new technologies and maintenance in general. It is desirable that
they have a highly specialized team in data bases, qualified to develop tools of interest to data
providers and users.
Users also have an important role to play. They must adhere to adopted standards and respect
restrictions and limits of data use, acknowledging authorship and credits. They must also offer
feed-back to authors and to custodians indicating possible errors and discussing the possibility of
implementing new services.




                                                                                                     31
Financing Agencies must also prepare themselves for this new digital age and have a clear data
and information policy especially for data that is already born digital. Public funding in activities of
public interest should generate systems that provide free and open access to data and information
that are not confidential or sensitive. There must also be a policy to digitize historical data, such as
biological collection records, and a long term policy to maintain information systems. In a regional
information facility such as ABBIF, it is also important that an inter-agency policy be established to
maximize resources and better integrate activities.


Elements of the Architecture
The aim is the establishment of a data infrastructure open to all interested, where the data provider
has complete control over his/her data.

ABBIF coordination
A distributed coordinating effort is perhaps that greatest challenge to be faced. The whole concept
proposed is the strengthening of local data providers, offering the necessary infrastructure for open
and free dissemination of data, but without their losing control and responsibility for the data.
In the case of ABBIF we believe that local data custodians such as Siamazonia, Humboldt and
CRIA have a significant role to play. It is important that the project strengthens these initiatives at
the country level and, at the same time, is able to use these capacities at a regional level. Country
data custodians should act as facilitating nodes and should be part of an ABBIF development
council together with GBIF. At the same time it is important that there is a “secretariat” in place,
responsible for the network, monitoring activities and promoting ABBIF, identifying new country or
regional partners.
An ABBIF secretariat should work on coordinating and strengthening these efforts and capacities
and, at the same time offering services to countries and institutions that want to share data and
don’t have the necessary local expertise and infrastructure.
The coordination structure should be further discussed at a workshop with country representation.

Data Providers
We think it is important to determine target data providers of ABBIF’s initial phase. In our opinion
focus should be given to specimen and specie data, so therefore biological collections and
observation data (inventories) would be our first targets. We also believe that the organization of
data providers must be country driven, meaning that the articulation and involvement of different
providers will be carried out nationally.
Biological collections, due to the nature of their activities, are information centers. They must have
sufficient infrastructure and expertise to set up their own information system for internal purposes.
Those that also have the necessary infrastructure and expertise to hold an internet information
system available 24 hrs a day can serve their data directly to the network. Those that don’t have or
don’t want to maintain dynamic links should have a mechanism to submit, alter, and delete their
data at a regional server (or cache node).
Figure 19 shows a diagram of the network.




                                                                                                     32
                                             ABBIF Portal




                                            Regional Servers




               Collections with
               dynamic links                             Collections mirroring their data in
                                                                  regional servers




                      Figure 19. Component data provider: biological collections
Collections with dynamic links and regional servers must adopt compatible standards and
protocols and must be held in institutions capable of maintaining the system and serving data
through fast Internet connections.
Observation data and taxonomic descriptions represent two other groups of data providers,
individuals or research groups. This is the case where facilities must be offered by data custodians
where researchers may deposit their data for full and open access on the internet. This is not a
task for amateurs. There must be a highly specialized staff that has as its main activity the
development and maintenance of information systems that guarantee the preservation and
dissemination of data.
Based on the open and free access to data concept, this element of the network will be called
digital data commons space43. The network may have more then one servers that guarantee the
necessary infrastructure for preservation, maintenance, recuperation, and dissemination of the
data. Internet connectivity must be stable and fast (figure 20).




43
  see National Science Board. Draft Report: Long-Lived Digital Data Collections: Enabling Research and
Education in the 21st Century. NSB-05-40. March 30, 2005.
http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf


                                                                                                         33
                                    portal



                                    Internet 2




                              Data commons space        “data commons”   “data commons”




                                Observation data
                                                      Taxonomic data          Other data




                     Figure 20. Architecture element: digital data commons space
We believe that this element could involve the conservation community that hold important
observation data that are normally disseminated through books and reports.

Portal
GBIF today has a data index that serves data to the system. A subset of over 85 million records,
with name and locality data is harvested from 152 data providers and maintained at a centralized
database. This makes the basic search system much quicker and solves problems such as slow or
unstable connectivity. After carrying out the basic search the user obtains a list of providers with
the number of records found. Users can then display the list of records corresponding to each
provider. Users can also download the selected records. This moment he/she may choose to
download the data directly from the data providers or from the GBIF index (faster), and the format
of the downloaded file. There is also a map illustration of the distribution of the requested records
that can be produced dynamically.
CRIA developed a fully distributed system. When a query is processed it is sent out to the
providers that search the databases and dynamically send the results. At the moment, the
speciesLink Network has 6 regional servers (mirroring data from 38 collections) 2 collections with
dynamic links, one centralized database with observation data (at CRIA), and one centralized
information system of microbial collections (with 9 collections). This architecture is interesting for
advanced users that can search any field and retrieve the full data set as a file. Speed and the
“fragility” of the network is a disadvantage. If a server for any reason is off line, that “branch” of the
network will be unavailable. Maps are also produced dynamically.
CRIA also developed an indexing service of a subset of the data which is used for data cleaning.
At the moment CRIA is thinking in providing the user with the possibility of searching its index for
the data subset to provide faster results and a more stable system. But the distributed search
system will continue to be offered as we believe it is very powerful and important to advanced
users.
Based on the Internet connectivity study that was carried out for Latin America (RedCLARA) one
can see that some links of Amazonian countries are still not in place. At the same time, it is
important to develop a truly distributed network helping countries “catch up” with both, the
technology and infrastructure. For this reason we believe that the ABBIF portal should have both,
an index system that will harvest data from all regional providers, and a distributed search service.




                                                                                                       34
The index system will be used to quickly serve a data subset to users and for data cleaning. The
dynamic search system will be available for advanced users.

Resource Registry & Discovery
In a distributed environment with many data providers it is desirable to have a central registry
defining at the software level who are the network participants and how to interact with them
(species-level services might use different protocols from specimen-level services, and even
specimen-level services could potentially use different protocols among them or different protocol
versions). As the number of data providers may increase over time, a means for automatic
discovery will certainly be necessary. GBIF’s UDDI registry seems to be the most reasonable
alternative since it is already available for the whole biodiversity community and ABBIF resources
should also be integrated with the GBIF network. UDDI offers a simple mechanism to enable
configuration of thematic networks (through service categories) that could easily be used to
distinguish ABBIF participants from other resources.

Tools
Another important activity is the development of tools for data providers and users. These tools
should be preferably developed as web services to be able to used more freely at all levels local,
country, and regional.

Data archive
As a last element of the network, it would also be important to address the problem of long term
data archiving. This may also be a task for country data custodians or their partners. It is important
that the scientific council discusses this issue to determine priorities as to what data should be
added to a permanent archive and identify an institution or a pool of institutions responsible for this
activity.
Figure 21 below presents a diagram of the system.

                Web services
                •Maps
                •Modeling
                •Data cleaning                         portal
                •automatic
                 georeferencing
                •Other services




                                                  regional server            data commons     data commons
                                                                                  space           space




                                  biological collections                   observation data      taxonomic data




                                                            long term data archive




                                            Figure 21. Diagram of the system




                                                                                                                  35
Annex 1: Answers from Collections of Colombia
Collection name               Acronym   Institution             General Group    No. total   % Georreferenced         % Digitalized      No. records    % Georreferenced         % Digitalized
                                                                                 records                                                 Amazon         Amazon                   Amazon
Herbario Amazónico            COAH      Instituto Amazónico     Plants           107.150                              1                  160.725                                 1
Colombiano COAH                         de Investigaciones
                                        Científicas - SINCHI
CALT                          CAL                               Animals          18.464
Instituto Alexander von       IAvH      Instituto de            Animals          349.054     Birds and butterflies:   Birds and          Birds: 1817;   Birds and butterflies:   Birds and
Humboldt                                Investigación de                                     100%; other insecta:     butterflies:       butterflies:   100%; other insecta:     butterflies:
                                        Recursos Biológicos                                  50%; mammals: 10%;       100%; other        550; fishes:   50%; mammals: 10%;       100%; other
                                        "Alexander von                                       amphibia and reptilia:   insecta: 50%;      730 coll.;     amphibia and reptilia:   insecta: 50%;
                                        Humboldt"                                            0%                       mammals:           amphibia,      0%                       mammals:
                                                                                                                      10%; amphibia      mammals and                             10%; amphibia
                                                                                                                      and reptilia: 0%   reptilia:                               and reptilia: 0%
                                                                                                                                         pending
Herbario Federico Medem       FMB       Instituto de            Plants           159.000     0,85                     0,5                8.742          100                      100
IAvH                                    Investigación de
                                        Recursos Biológicos
                                        "Alexander von
                                        Humboldt"
Colección de Zoología         ICN       Instituto de Ciencias   Animals          948.368                              Birds: 100%;       Amphibia:
                                        Naturales                                                                     amphibia: 80%;     4000
                                        Universidad Nacional                                                          general: 60%
                                        de Colombia
Herbario Nacional             COL       Instituto de Ciencias   Plants           1.180.000   0,1                      0,2                Many spec.
Colombiano                              Naturales                                                                                        Quantity not
                                        Universidad Nacional                                                                             specified
                                        de Colombia
Museo Micológico - Hongos     MMUNM     Universidad Nacional    Microorganisms   2.970
fitoparásitos                           de Colombia
Museo Entomológico            MEFLG     Universidad Nacional    Animals          342.340                              0,1
"Francisco Luis Gallego"                de Colombia sede
                                        Medellín
Herbario Pontificia           HPUJ      Pontificia              Plants           71.000                               0,375
Universidad Javeriana                   Universidad
                                        Javeriana
Museo Javeriano de Historia   MPUJ      Pontificia              Animals          2.864.808   72.5% of the             1
Natural Lorenzo Uribe s.j               Universidad                                          amphibia, 92.2% of
                                        Javeriana                                            the fishes, 94.7% of
                                                                                             the reptilia, 96.7% of
                                                                                             the birds, 80.7% of
                                                                                             mammals and a low -
                                                                                             not specified-
                                                                                             percentage of the
                                                                                             Orthoptera
Herbario Nacional de          HNM       Corporación             Plants           1.420                                0,05
Malezas                                 Colombiana de
                                        Investigación
                                        Agropecuaria
                                        Corpoica

                                                                                                                                                                                             36
Collection name                Acronym    Institution             General Group     No. total   % Georreferenced   % Digitalized   No. records   % Georreferenced   % Digitalized
                                                                                    records                                        Amazon        Amazon             Amazon
Herbario Gabriel Gutierrez     MEDEL      Universidad Nacional    Plants,           100.960                        0,5
Villegas (MEDEL)                          de Colombia sede        Microorganisms
                                          Medellín
Herbario de la Orinoquía       Llanos     Universidad de los      Plants            19.320                         0
Colombiana                                Llanos
Herbario Ciat                  CIAT       Centro Internacional    Plants            32.018                         0,98
                                          de Agricultura
                                          Tropical - CIAT
Jardín Botánico José           JBJCM      Jardín Botánico de      Plants            7.036                          100% of the
Celestino Mutis                           Bogotá J.C.M.                                                            Herbarium and
                                                                                                                   20% of fruit
                                                                                                                   collection
Herbario Forestal              UDBC       Universidad Distrital   Plants            35.000                         0,9
Universidad Distrital                     Francisco José de
Francisco José de Caldas                  Caldas
Herbario Universidad de        HUA        Universidad de          Plants            138.000     0,5                0,38            20% aprox.
Antioquia                                 Antioquia
Herbario José Cuatrecasas      VALLE      Universidad Nacional    Plants            37.438                         1
Arumi (VALLE)                             de Colombia Sede
                                          Palmira
Herbario Jardín Botánico       JAUM       Fundación Jardín        Plants            118.276     0,5                33772           Unknown
"Joaquín Antonio Uribe"                   Botánico Joaquín
                                          Antonio Uribe
Herbario CUVC                  CUVC       Universidad del Valle   Plants            106.800                        0,25
Colección Laboratorio de       CLUA       Laboratorio de          Animals           5.692                          0,5
Limnología Universidad de                 Limnología -
Antioquia                                 Universidad de
                                          Antioquia
Colección Entomológica         CEUA       Laboratorio             Animals           86.498                         1
Universidad de Antioquia                  Colecciones
                                          Entomológicas -
                                          Universidad de
                                          Antioquia
Vectores y Huéspedes           VHET       Universidad de          Animals           25.470                         0,5
Intermediarios de                         Antioquia
Enfermedades Tropicales
Museo del laboratorio de       MENT-UT    Universidad del         Animals           24.646
Entomología                               Tolima
Laboratorio de Investigación   LABUN      Universidad Nacional    Animals           40.080                         0,15
de Abejas - Labun                         de Colombia
Museo de Historia Natural      MHN-UC     Universidad del         Plants            83.742                         0,6
Universidad del Cauca                     Cauca
Entomológica Forestal          EF-UDFJC   Universidad Distrital   Animals           6.200                          1
Universidad Distrital                     Francisco José de
Francisco José de Caldas                  Caldas
Museo Historia Natural         MUD        Universidad Distrital   Plants, Animals   3.274                          1
Universidad Distrital                     Proyecto Curricular
                                          Licenciatura Biología
Colección de Artrópodos de     UVS        Universidad del Valle   Animals           10.720      0                  0               2.000         0                  0


                                                                                                                                                                               37
Collection name                Acronym    Institution             General Group   No. total   % Georreferenced   % Digitalized   No. records   % Georreferenced   % Digitalized
                                                                                  records                                        Amazon        Amazon             Amazon
Importancia Médica                        , Facultad de Salud
Museo de Historia Natural      MHNUPN     Universidad             Animals         58.378
Universidad Pedagógica                    Pedagógica Nacional
Nacional
Colección Biológica U.D.C.A.   UDCA       Corporación             Plants          22.018                         1
                                          Universitaria de
                                          Ciencias Aplicadas y
                                          Ambientales,
                                          U.D.C.A.
Colección Zoológica de         IMCN       Instituto para la       Animals         40.448                         0,85
Referencia Científica "IMCN"              Investigación y
                                          Preservación del
                                          Patrimonio Cultural y
                                          Natural del Valle del
                                          Cauca- INCIVA
Colección de Insectos          CIACIB     Corporación para        Animals         29.246
Acuáticos de Colombia CIB                 Investigaciones
                                          Biológicas (CIB)
Colección de Mosquitos de      CMCCIB     Corporación para        Animals         8.882
Colombia (CIB)                            Investigaciones
                                          Biológicas (CIB)
Instituto Nacional de Salud    INS        Instituto Nacional de   Animals         21.088                         0,5
                                          Salud
Colección Taxonómica           CTNI       Corporación             Animals         37.568                         0,2
Nacional de Insectos "Luis                Colombiana de
María Murillo"                            Investigación
                                          Agropecuaria
                                          Corpoica
Museo Entomológico             ME"MB"     Federación Nacional     Animals         17.070                         0,9
"Marcial Benavides"                       de Cafeteros-
                                          Cenicafé
Colección Familia              CFC                                Animals         9.000                          0
Constantino-CFC
Colección Entomología:         PC                                 Animals         104
Hemiptera Acuáticos
Hans W. Dahners                PETALUDA                           Animals         14.200                         1
Colección Piéridos de          CPCRTN                             Animals         1.834
Colombia Rodrigo Torres
Nuñez
Colección Jean Francois le     JFLC                               Animals         60.000
Crom
Colección Personal Angela      CPAA                               Animals         8.200
Amarillo
Colección Personal Carlos      CPCS                               Animals         3.200
Sarmiento
Colección "Da Ros"             C"DR"      Fundación Ciencia       Animals         4.980                          0,4
                                          Ecología, Arte e
                                          Historia Fundación
                                          C.E.A.H. (Museo


                                                                                                                                                                             38
Collection name               Acronym   Institution             General Group    No. total   % Georreferenced   % Digitalized   No. records   % Georreferenced   % Digitalized
                                                                                 records                                        Amazon        Amazon             Amazon
                                        Vittoriano)
Colección Entomológica        CEUNP     Universidad Nacional    Animals          57.400                         1
Universidad Nacional Sede               de Colombia sede
Palmira                                 Palmira
Colección de Insectos         CONIF     Corporacion             Animals          62.800                         0,9
asociados a plantaciones                Nacional de
forestales de Colombia                  Investigación y
                                        Fomento Forestal -
                                        CONIF
Herbario TULV - Jardín        TULV      Instituto para la       Plants           34.000      0,45               1
Botánico Juan María                     Investigación y
Céspedes                                Preservación del
                                        Patrimonio Cultural y
                                        Natural del Valle del
                                        Cauca- INCIVA
Serpentario de la             SUA       Universidad de          Animals          5.064
Universidad de Antioquia                Antioquia
Museo de Historia Natural     UPTC      Universidad             Animals          2.202
"Luis Gonzalo Andrade"                  Pedagógica y
                                        Tecnológica de
                                        Colombia, Facultad
                                        de Ciencias, Escuela
                                        de Ciencias
                                        Biológicas
Museo de Entomología de la    MUSENUV   Universidad del Valle   Animals          101.864                                                      occasional
Universidad del Valle
Vertebrados-Aves                        Universidad del Valle   Animals          12.914                         1                             occasional
                                        - Biología
Museo Entomológico            UNAB      Universidad Nacional    Animals          200
Facultad de Agronomía                   de Colombia
Colección de vertebrados,     UV-C      Universidad del Valle   Animals          28.774                                                       occasional
anfibios y reptiles                     - Biología
Herbario "Armando Dugand      DUGAND    Universidad del         Plants           6.070
Gnecco"                                 Atlántico
Colección Efraín Henao        CEH                               Animals          10.760                         0,1
Colección de Vertebrados e    MHN-Uca   Universidad de          Animals          13.416
Invertebrados                           Caldas
Colección Familia Pardo       CFPL                              Animals          11.100
Locarno
Vertebrados e Invertebrados   MHNCC     Comunidad               Animals          10.360                         0,9
                                        Hermanos Maristas
Banco de Cepas y Genes,       IBUN      Instituto de            Plants,          3.689                          0,48
Instituto de Biotecnología,             Biotecnología,          Microorganisms
Universidad Nacional de                 Universidad Nacional
Colombia                                de Colombia
Colección Microorganismos     CEN       Federación Nacional     Microorganisms   689                            0,5
de CENICAFE                             de Cafeteros -
                                        Centro Nacional de
                                        Investigaciones de


                                                                                                                                                                            39
Collection name               Acronym    Institution             General Group    No. total   % Georreferenced   % Digitalized   No. records   % Georreferenced   % Digitalized
                                                                                  records                                        Amazon        Amazon             Amazon
                                         Café - CENICAFE
Colección de Referencia       CRM-UV     Universidad del Valle   Animals          13.060
(Moluscos)                               - Biología
Procedencias de               FCISSPA    Fundación Centro        Plants           98                             0,9
Trichanthera Gigantea (H. &              para la Investigación
B.) Nees                                 en Sistemas
                                         Sostenibles de
                                         Producción
                                         Agropecuaria- CIPAV
Jardín Botánico Juan María    JBJMC      Instituto para la       Plants           8.824                          0,5
Cespedes                                 Investigación y
                                         Preservación del
                                         Patrimonio Cultural y
                                         Natural del Valle del
                                         Cauca- INCIVA
Fundación Zoológico           FZS        Fundación Zoológico     Animals          728
Santacruz                                Santacruz
Piscilago Zoo                 PZ         Caja Colombiana de      Animals          1.558                          0,5
                                         Subsidio Familiar -
                                         Colsubsidio
Jardín Botánico "Alejandro    JBAVH      Universidad del         Plants           1.346                          0,1
von Humboldt"                            Tolima
Hongos, Univalle              UV-mico    Universidad del Valle   Microorganisms   3.740
                                         - Facultad de Salud
Colección de                  M-UBCB     Corporación para        Microorganisms   4.069                          1
Microorganismos                          Investigaciones
                                         Biológicas (CIB)
Secretaria de Agricultura -   SA.A       Departamento de         Animals          9.854                          0,2
Antioquia                                Antioquia
Parque Zoológico Santa Fe     PZSF       Sociedad de Mejoras     Animals          3.046                          0,5
                                         Públicas de Medellín
Jardín Botánico "Joaquín      JAUM -JB   Fundación Jardín        Plants           19.000                         0,9
Antonio Uribe"                           Botánico "Joaquín
                                         Antonio Uribe"
Colección de Ciencias         MUA        Universidad de          Animals          33.224                         0,3
Naturales                                Antioquia - Museo
                                         Universitario
Xiloteca                      X-UNCM     Universidad Nacional    Plants           5.836                          1
                                         de Colombia sede
                                         Medellín
Zoológico de Barranquilla     ZOOBAQ     Fundación Botánica      Animals          848                            1
                                         y Zoológica de
                                         Barranquilla
Banco de Germoplasma de       CCoM       Corporación             Microorganisms   4.555
Microorganismos de Interés               Colombiana de
en Agricultura                           Investigación
                                         Agropecuaria
                                         Corpoica
Jardín Botánico José          JBJCM      Jardín Botánico de      Plants           31.200                         0,57


                                                                                                                                                                             40
Collection name                 Acronym            Institution            General Group     No. total   % Georreferenced   % Digitalized   No. records   % Georreferenced   % Digitalized
                                                                                            records                                        Amazon        Amazon             Amazon
Celestino Mutis                                    Bogotá J.C.M.
Colección de Microbiología -    CIMIC              Universidad de los     Microorganisms    278                            0,7
CIMIC                                              Andes- Centro de
                                                   Investigaciones
                                                   Biológicas - CIMIC
Museo de la Salle               M.L.S. EN          Congregación           Plants, Animals   164.370                        0,25
                                ZOOLOGIA -         Hermanos Escuelas
                                B.O.G EN           Cristianas
                                BOTANICA
Jardín Botánico de Popayan      JBP                Fundación              Plants            2.028                          0,3
                                                   Universitaria de
                                                   Popayán
Cepario Corpogen                CG                 Corporción Corpogen    Microorganisms    3.634                          0,07
Colección Malacofaunica         UMNG-MT            Universidad Militar    Animals           5.480                          1
Terrestre de la Facultad de                        Nueva Granada
Ciencias de la UNMG
Colección Entomológica de       UMNG-Ins           Universidad Militar    Animals           6.148                          1
la Facultad de Ciencias de la                      Nueva Granada
Universidad Militar Nueva
Granada
Zoólogico de Cali               FZC                Fundación Zoológica    Animals           3.818                          1
                                                   de Cali
Fundación Centro de             FCP                Fundación Centro de    Animals           682                            1
Primates                                           Primates, FUCEP
Museo Entomológico Piedras      MEPB               Caja de                Animals           16.000                         1
Blancas                                            Compensación
                                                   Familiar -
                                                   COMFENALCO
                                                   ANTIOQUIA
Colección Viva Programa de      COLVIOFAR          Universidad de         Animals           520                            1
Ofidismo/Aracnidismo                               Antioquia
Universidad de Antioquía:
Ofidios - Reptiles
Colección Viva Programa de                                                Animals           236                            1
Ofidismo/Aracnidismo
Universidad de Antioquía:
Escorpiones - Artropodos
Jardín Botánico de Plantas      JB-Medicinales -   Corpoamazonia          Plants            1.028                          0,97
Medicinales del C.E.A.          CEA
Museo de Historia Natural       UAM                Universidad de la      Animals           396                            0,3
Universidad de la Amazonía                         Amazonía
Colección de Insectos           ICQ                Universidad del        Animals           6.010                          0,3
Universidad del Quindio                            Quindío
Colección Zoológica Viva        CEBTRF             Centro Estación de     Animals           616                            1
Centro Estación de Biología                        Biología Tropical
Tropical "Roberto Franco"                          "Roberto Franco"
CEBTRF                                             Facultad de
                                                   Ciencias,
                                                   Universidad Nacional


                                                                                                                                                                                       41
Collection name        Acronym   Observational      Data                 Data available Online?           Standards      Hardware         Staff          Internet Access   Acces to
                                 records            sistematization      URL                              &              and Software                                      Information
                                                                                                          Protocols
Herbario Amazónico     COAH                         ACCES 97,            http://www.sinchi.org.co/herb    Compatible
Colombiano COAH                                     ARC-VIEW 3.2         ario.php?page=servicios&op       con
                                                    CDS-ISIS ver.        cion=herbario&subopcion=c        estándar
                                                    3.07                 oleccion                         RRBB SIB
CALT                   CAL                          Biota
Instituto Alexander    IAvH      Sistematized at    Access-VBA ,         Butterflies:                     Standard                        insufficient   dedicated         Unrestricted
von Humboldt                     the same databse   SQLServer.           http://www.siac.net.co/sib_d     for                                                              (except for
                                                    Fishes and           escargas.php?ArchivoDespl        documment                                                        sensible data
                                                    insecta (except      egado=635                        ation of                                                         for
                                                    butterflies) still                                    biological                                                       endangered
                                                    in Excel and                                          records,                                                         spp.)
                                                    may be                                                version 5.0,
                                                    incorporated                                          XML
                                                    soon
Herbario Federico      FMB                          Access-VBA,          Not yet. Soon at                 Standard                        insufficient   dedicated         Unrestricted
Medem IAvH                                          SQL Server           www.siac.net.co/sib              for                                                              (except for
                                                                                                          documment                                                        sensible data
                                                                                                          ation of                                                         for
                                                                                                          biological                                                       endangered
                                                                                                          records,                                                         spp.)
                                                                                                          version 5.0,
                                                                                                          XML
Colección de           ICN                          Spica                                                                                                dedicated         restricted
Zoología
Herbario Nacional      COL                          Spica                http://aplicaciones.virtual.un   Compatible                      Understaffe    dedicated         unrestricted
Colombiano                                                               al.edu.co/colecciones/datos/     with                            d. Lacking
                                                                         herbario/consultasHerbario.j     standard                        professional
                                                                         sp                               RRBB SIB                        s and
                                                                                                                                          auxiliars to
                                                                                                                                          process the
                                                                                                                                          data.
                                                                                                                                          Investments
                                                                                                                                          needed
Museo Micológico -     MMUNM                        Data not
Hongos fitoparásitos                                systematized
Museo Entomológico     MEFLG                        Specify              http://www.unalmed.edu.co/
"Francisco Luis                                                          %7Ementomol/
Gallego"
Herbario Pontificia    HPUJ                         Excel                                                                Require tools    Scarce                           restricted
Universidad                                                                                                              for              economic
Javeriana                                                                                                                systematizatio   resources.
                                                                                                                         n not defined    Curators
                                                                                                                         as yet           needing
                                                                                                                                          more time
Museo Javeriano de     MPUJ                         Excel and
Historia Natural                                    ArcView
Lorenzo Uribe s.j


                                                                                                                                                                                           42
Herbario Nacional de    HNM        Not specified
Malezas
Herbario Gabriel        MEDEL      FoxPro 4.0;
Gutierrez Villegas                 BRAHMS 5 to
(MEDEL)                            be implemented
Herbario de la          Llanos
Orinoquía
Colombiana
Herbario Ciat           CIAT       Oracle
Jardín Botánico José    JBJCM      Access             http://www.jbb.gov.co/web/h
Celestino Mutis                                       ome.php?pag=products
Herbario Forestal       UDBC        Acces Plattform
Universidad Distrital              50% and Arc
Francisco José de                  View 3.2 (12000
Caldas                             spec. in 6
                                   books)
Herbario Universidad    HUA        Excel                                            Compatible   adequate   insufficient
de Antioquia                                                                        with
                                                                                    standard
                                                                                    RRBB SIB
Herbario José           VALLE      Access
Cuatrecasas Arumi
(VALLE)
Herbario Jardín         JAUM       Arkas, Excel,                                                 adequate   insufficient   100 Kbps
Botánico "Joaquín                  Biotica
Antonio Uribe"
Herbario CUVC           CUVC       Excel
Colección Laboratorio   CLUA       Excel
de Limnología
Universidad de
Antioquia
Colección               CEUA       Excel
Entomológica
Universidad de
Antioquia
Vectores y              VHET       Excel, Word
Huéspedes
Intermediarios de
Enfermedades
Tropicales
Museo del laboratorio   MENT-UT    In progress
de Entomología
Laboratorio de          LABUN      FileMaker,
Investigación de                   Excel
Abejas - Labun
Museo de Historia       MHN-UC     Excel
Natural Universidad
del Cauca
Entomológica            EF-UDFJC   Excel
Forestal Universidad
Distrital Francisco


                                                                                                                                      43
José de Caldas
Museo Historia          MUD                             Excel
Natural Universidad
Distrital
Colección de            UVS        Birds:                                  No              PC with        Staff to       unrestricted
Artrópodos de                      Threskiornithidae:                                      adequate       digitalize
Importancia Médica                 Mesembrinibis                                           programs       the data
                                   cayennensis in
                                   Putumayo
Museo de Historia       MHNUPN                          The manager
Natural Universidad                                     collection
Pedagógica Nacional
Colección Biológica     UDCA                            Excel
U.D.C.A.
Colección Zoológica     IMCN                            Excel                   Standard   insufficient   insufficient   unrestricted
de Referencia                                                                   RRBB SIB
Científica "IMCN"
Colección de Insectos   CIACIB                          FileMaker
Acuáticos de                                            (MAC)
Colombia CIB
Colección de            CMCCIB                          FileMaker
Mosquitos de                                            (MAC)
Colombia (CIB)
Instituto Nacional de   INS                             Access
Salud
Colección               CTNI                            FoxPro
Taxonómica Nacional
de Insectos "Luis
María Murillo"
Museo Entomológico      ME"MB"                          Excel
"Marcial Benavides"
Colección Familia       CFC                             Catalog, listing
Constantino-CFC                                         by families
Colección               PC
Entomología:
Hemiptera Acuáticos
Hans W. Dahners         PETALUDA                        Excel
Colección Piéridos de   CPCRTN                          The manager
Colombia Rodrigo                                        collection
Torres Nuñez
Colección Jean          JFLC
Francois le Crom
Colección Personal      CPAA
Angela Amarillo
Colección Personal      CPCS
Carlos Sarmiento
Colección "Da Ros"      C"DR"                           Acces, Word,
                                                        Excel
Colección               CEUNP                           Access
Entomológica
Universidad Nacional


                                                                                                                                        44
Sede Palmira
Colección de Insectos   CONIF     Shared
asociados a                       database
plantaciones
forestales de
Colombia
Herbario TULV -         TULV      Access
Jardín Botánico Juan
María Céspedes
Serpentario de la       SUA       Acces 100%
Universidad de
Antioquia
Museo de Historia       UPTC
Natural "Luis Gonzalo
Andrade"
Museo de                MUSENUV   Arkas 2000
Entomología de la
Universidad del Valle
Vertebrados-Aves                  Access
Museo Entomológico      UNAB
Facultad de
Agronomía
Colección de            UV-C      Arkas, amphibia
vertebrados, anfibios             70%, Reptilia
y reptiles                        40%
Herbario "Armando       DUGAND    Access
Dugand Gnecco"
Colección Efraín        CEH       Winissis
Henao
Colección de            MHN-Uca
Vertebrados e
Invertebrados
Colección Familia       CFPL
Pardo Locarno
Vertebrados e           MHNCC     Excel
Invertebrados
Banco de Cepas y        IBUN      Access
Genes, Instituto de
Biotecnología,
Universidad Nacional
de Colombia
Colección               CEN       Excel
Microorganismos de
CENICAFE
Colección de            CRM-UV    Excel
Referencia
(Moluscos)
Procedencias de         FCISSPA   Excel
Trichanthera
Gigantea (H. & B.)
Nees


                                                    45
Jardín Botánico Juan     JBJMC        BGRecorder
María Cespedes
Fundación Zoológico      FZS          ICOZOO
Santacruz                             (clínical stories )
Piscilago Zoo            PZ           Excel, ICOZOO
Jardín Botánico          JBAVH        Access
"Alejandro von
Humboldt"
Hongos, Univalle         UV-mico
Colección de             M-UBCB       Access
Microorganismos
Secretaria de            SA.A         Excel
Agricultura -
Antioquia
Parque Zoológico         PZSF         Excel 40%,
Santa Fe                              Word 10%
Jardín Botánico          JAUM -JB     BGRecorder
"Joaquín Antonio
Uribe"
Colección de             MUA          Not specified
Ciencias Naturales
Xiloteca                 X-UNCM       Access
Zoológico de             ZOOBAQ       ARKS 4.0-ISIS:
Barranquilla                          records;
                                      ZOOTRITION:
                                      nutritional
                                      information;
                                      ICOZOO:
                                      medical reports
Banco de                 CCoM         Excel and
Germoplasma de                        Access
Microorganismos de
Interés en Agricultura
Jardín Botánico José     JBJCM        BGRecorder
Celestino Mutis
Colección de             CIMIC        Word
Microbiología - CIMIC
Museo de la Salle        M.L.S. EN    Access
                         ZOOLOGIA -
                         B.O.G EN
                         BOTANICA
Jardín Botánico de       JBP          Excell, Acces,
Popayan                               BG recorder 2.
                                      J.B.P.
Cepario Corpogen         CG           Filemaker
Colección                UMNG-MT      Excel
Malacofaunica
Terrestre de la
Facultad de Ciencias
de la UNMG
Colección                UMNG-Ins     Excel


                                                            46
Entomológica de la
Facultad de Ciencias
de la Universidad
Militar Nueva
Granada
Zoólogico de Cali       FZC           ARKS-ISIS and
                                      others to be
                                      implemented
Fundación Centro de     FCP           Excel
Primates
Museo Entomológico      MEPB          Excel
Piedras Blancas
Colección Viva          COLVIOFAR
Programa de
Ofidismo/Aracnidismo
Universidad de
Antioquía: Ofidios -
Reptiles
Colección Viva
Programa de
Ofidismo/Aracnidismo
Universidad de
Antioquía:
Escorpiones -
Artropodos
Jardín Botánico de      JB-           BGRecorder
Plantas Medicinales     Medicinales
del C.E.A.              -CEA
Museo de Historia       UAM           Access, Excel
Natural Universidad
de la Amazonía
Colección de Insectos   ICQ           Access
Universidad del
Quindio
Colección Zoológica     CEBTRF        Tracker Ce
Viva Centro Estación                  Brain Id
de Biología Tropical                  (includes a tool
"Roberto Franco"                      for automatic
CEBTRF                                identification)
                                      100%




                                                         47

				
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail vixychina@gmail.com.Thank you!