Architecture - Download as DOC
Document Sample


ABBIF Proposal: Architecture
(Draft March 14, 2006)
Index
Introduction ............................................................................................. 1
DarwinCore .................................................................................................................... 3
ABCD – Access to Biological Collection Data ................................................................ 3
Protocols for Data Exchange ............................................................................................. 4
DiGIR ............................................................................................................................. 4
BioCASe ........................................................................................................................ 4
TAPIR ............................................................................................................................ 5
GBIF Architecture .............................................................................................................. 5
Network Infrastructure ..................................................................................................... 11
Latin America ............................................................................................................... 11
Brazil............................................................................................................................ 15
Analysis ................................................................................................ 16
Peru ................................................................................................................................ 16
Questionnaires ............................................................................................................. 16
Information System ...................................................................................................... 17
Venezuela ....................................................................................................................... 19
Bolivia.............................................................................................................................. 20
Colombia ......................................................................................................................... 21
Questionnaire .............................................................................................................. 21
Information System ...................................................................................................... 21
French Guyana................................................................................................................ 22
Ecuador ........................................................................................................................... 23
Brazil ............................................................................................................................... 23
Collections ................................................................................................................... 23
Information Systems .................................................................................................... 25
Strategic Plan .............................................................................................................. 29
Strategy: Proposed Network ................................................................. 31
Elements of the Architecture............................................................................................ 32
ABBIF coordination ...................................................................................................... 32
Data Providers ............................................................................................................. 32
Portal ........................................................................................................................... 34
Resource Registry & Discovery .................................................................................... 35
Tools ............................................................................................................................ 35
Data archive ................................................................................................................. 35
Proposal ................................................. Error! Bookmark not defined.
Participants (to be confirmed) ............................................. Error! Bookmark not defined.
Workshop Program............................................................. Error! Bookmark not defined.
Annex 1: Answers from Collections of Colombia ................................... 36
Introduction
There are a number of possibilities to design an information system when its data is actually
produced and shared by different parties. Basically, a system can be centralized, distributed, or
combined (mixed), with a number of variations.
A centralized system (figure 1) is recommended when data providers do not have the necessary
infrastructure (hardware, software, connectivity) or expertise or even when data will only be
produced for that particular system.
User
Central System
Data Providers
Figure 1. Diagram of a Centralized Information System
By adopting this architecture, data providers don’t need to store any local data and they usually
interact with an administrative interface to manage everything remotely. They also have to agree to
a common format and content to be implemented in the central database. The great advantage is
the low demand on informatics that will be imposed on data providers and the fact that developers
will have a very controlled system to work on. The challenge is to maintain data providers actively
validating and updating their data.
A distributed architecture is a system where the data is distributed but the query is centralized
(figure 2) or where both data and query are distributed (figure 3).
Col 3 Col 3
Col 4 Col 2 Col 4
Col 2
Col 5 Col 1 Col 5
Col 1
Central
Repository
program program
interface interface
query query
Figure 2. Distributed data: centralized query Figure 3. Distributed data and query
1
Advantages include “real time” updating, clarity as to who the data provider is, and the possibility of
a closer interaction between data providers and users. Disadvantages include the greater demand
on infrastructure and expertise of each data provider and the complexity of developing and
maintaining a distributed system.
The proposal is that ABBIF focuses on species and specimen data. Data will include specimen
records in biological collections, observation data of field surveys, and taxonomic names. A
strategy for each data component must be established.
The choice of the best architecture depends on the existing infrastructure and expertise of each
data provider and custodian. Besides that, biological collections hold their data using different
software in different operational systems, different formats, and recording different data elements
(figure 4).
Data Model
Linux
MySQL Win98
Win98
biota FreeeBSD
Access
Win2000 Col 3 PostgreSQL
Brahms Col 2 Col 4
Col 1 Col 5
Communication Protocol
programa
interface
buscar
Figure 4. Diagram showing the complexity of integrating data from biological collections
In order to integrate these systems it is necessary that data providers agree to use a common data
exchange model.
To determine the best architecture to be proposed for the ABBIF network, it is important to study:
What standards and protocols are available;
What standards and protocols are the existing networks, of direct interest to ABBIF,
adopting; and,
What is the situation of local data providers and custodians concerning infrastructure and
expertise.
Standards
The adoption of standards and protocols for the exchange of data and information about
biodiversity is fundamental for the development of interoperable systems. In general, one can
define a standard as “something established by authority, custom, or general consent as a model
or example”1. A communication protocol can be defined as a formal description of rules and
message formats that two systems must adopt to communicate and interact. Perhaps the most
important and known protocols are TCP/IP (Transmission Control Protocol / Internet Protocol),
1
Merriam-Webster Online Dictionary (www.webster.com)
2
SMTP (Simple Mail Transfer Protocol), POP (Post Office Protocol) and IMAP (Internet Message
Access Protocol). This group represents the basis for all data transmission through the Internet.
Standard languages such as HTML (Hyper Text Markup Language) and XML (eXtensible Markup
Language) are also important as they define rules for formatting the vast majority of documents
through the Internet.
An important group that is discussing and developing standards and protocols for data on species
and specimens is TDWG (International Working Group on Taxonomic Databases)2.
TDWG’s mission is to:
To provide an international forum for biological data projects;
To develop and promote the use of standards; and
To facilitate data exchange.
A number of working groups have been established within TDWG to develop and promote the use
of standards and protocols. Of immediate interest to ABBIF we include: DarwinCore; ABCD –
Access to Biological Collection Data; DiGIR; BioCASe; and TAPIR.
DarwinCore3
DarwinCore (DwC) is a standard that began to be developed within the scope of the Species
Analyst network based at the University of Kansas Natural History Museum and Biodiversity
Research Center. The idea was to define common data fields to all taxonomic groups and this way
standardize the integration of primary data of biological collections. This standard uses XML
(defined by an XML-Schema) and is being used by most networks such as GBIF4, MaNIS
(Mammal Networked Information System)5, OBIS (Ocean Biogeographic Information System6),
speciesLink7 in Brazil, among others.
It is based on a non-hierarchical set of data elements which include: InstitutionCode,
CollectionCode, CatalogNumber, ScientificName, BasisOfRecord, Kingdom, Phylum, Class, Order,
Family, Genus, Species, Subspecies, ScientificNameAuthor, IdentifiedBy, YearIdentified,
MonthIdentified, DayIdentified, TypeStatus, ColectorNumber, FieldNumber, Collector,
YearCollected, MonthCollected, DayCollected, JulianDay, TimeOfDay, ContinentOcean, Country,
StateProvince, County, Locality, Longitude, Latitude, CoordinatePrecision, BoundingBox,
MinimumElevation, MaximumElevation, MinimumDepth, MaximumDepth, Sex, Preparationtype,
IndividualCount, PreviousCatalogNumber, RelatedCatalogNumber, RelatedCatalogItem,
RelationshipType, Notes, DateLastModified. The standard accepts extensions that have been
proposed for geospatial, curatorial, paleontology, microbial, and observation data8.
ABCD – Access to Biological Collection Data9
ABCD is a highly structured standard for data about objects in biological collections. Its objective is
the same as DarwinCore, except with much more detail as it has around 500 elements against 50
elements of DarwinCore. There are specific elements for observational data sets and for the
following types of collections:
Herbaria and Botanical Gardens
2
http://www.tdwg.org/
3
http://darwincore.calacademy.org
4
http://www.gbif.net
5
http://elib.cs.berkeley.edu/manis/
6
http://www.iobis.org/
7
http://splink.cria.org.br
8
http://darwincore.calacademy.org/Extensions/
9
http://www.codata.org/taskgroups/TGbiocollection/
3
Zoological Collections
Culture Collections
Mycological Collections
Plant Genetic Resources
Paleontological Collections
This data model is being used by the Biological Collection Access Service for Europe, BioCASE10.
As DarwinCore it uses XML (defined through an XML Schema). ABCD version 2.0611 has been
recommended by the TDWG meeting in St. Petersburg as the adopted version of the standard and
has since then been ratified by TDWG members.
Protocols for Data Exchange
Networks that serve data from biological collections, besides using a standard data model (such as
DarwinCore and ABCD) also require a protocol for transferring data.
DiGIR12
One of the first networks of biological collections to be developed as a distributed system was The
Species Analyst (TSA), at the end of the 90’s. TSA used the ANSI/NISO Z39.50 protocol which
was first adopted in 1988 and was used to interconnect libraries. It defines a communication
standard between computers to retrieve information. An important characteristic is the fact that it
supports a client-server environment which allows the separation of the user interface from the
data server. Z39.50 has also been implemented on a range of platforms. Whilst Z39.50 was an
effective solution, there were some issues with the protocol that convinced Species Analyst
network developers to study another solution. At the time, the protocol was found to have a
complicated specification, which meant a very steep learning curve for developers. Conceptual
schemas were not defined with a formal language such as XML Schema; and at the time, there
was limited support for XML and Unicode
In order to address these issues, developers of the Species Analyst network and a number of
people involved with the TDWG13 held a small workshop in Santa Barbara to start discussing a
solution to replace Z39.50 for the biodiversity informatics community. The goal was to develop a
protocol that was based entirely on the use of XML documents for messaging between clients and
data providers, with a data transport mechanism that was predominantly based on HTTP. DiGIR
was designed to offer the same capabilities as Z39.50 except using simpler technologies and a
more formal specification for description of information resources. The result is a distributed
information retrieval solution that provides an easy entry for participation in distributed information
networks.
DiGIR became operational in 2003 and was adopted by a number of networks such as The
Mammal Networked Information System (MaNIS), the Ocean Biogeographic Information System
(OBIS), the Global Biodiversity Information Facility (GBIF), and the speciesLink Network in Brazil.
BioCASe14
The Biological Collection Access Service for Europe (BioCASE), a network of biological collections,
adopted ABCD as the concept schema, and for this purpose modified the DiGIR protocol to meet
its needs. This modified protocol is known as the BioCASE data transmission protocol or just
10
http://www.biocase.org/
11
http://www.bgbm.org/TDWG/CODATA/Schema/
12
http://www.digir.net/
13
www.tdwg.org/
14
http://www.biocase.org/dev/protocol/index.shtml
4
simply BioCASE. The protocol is based on the DiGIR protocol, but was forced to incorporate some
BioCASE-specific changes that unfortunately make the two incompatible.
TAPIR15
In 2004 GBIF promoted a study to develop a new merged protocol that would meet the needs of
both DiGIR and BioCASE networks (Döring & Giovanni, 2004). This protocol was named TAPIR
(TDWG Access Protocol for Information Retrieval) and shall be tested in 2006. It is expected that
both networks, BioCASE and those that have adopted DiGIR, migrate to the new protocol. The
new protocol is being tested by implementing it in two data provider software packages,
representing each of the existing network communities, BioCASe (the BioCASe PyWrapper
software ) and DiGIR (a new Java provider package currently named DiGIR2). A detailed TAPIR
specification document is also being developed.
GBIF Architecture
We have discussed possible architectures (centralized, distributed, and combined or mixed) and
standards and protocols that are being adopted internationally. Another important feature of this
analysis is to observe what GBIF, that is openly serving species and specimen data on the
Internet, is using. GBIF plays a fundamental role as it is the global initiative that is integrating
species and specimen data worldwide. Whatever architecture and strategy is adopted by ABBIF
must be compatible with this initiative.
In 2003 GBIF established its “architecture fundamentals” which are important and relevant when
designing an information facility (see GBIF Biodiversity Data Architecture, 200316). The basic
principal was not to impose any specific software or technology, but having the access to
biodiversity data as its key goal.
The document presents as basic principals:
Free access to data: this implies that any restrictions must be carried out at the data provider
level, the system would not control user access to data;
Support for global users: the idea is to enable the implementation of different human
languages in presentation services;
Consider human and machine users: the system would be implemented to be accessed by
web browsers and web services;
Consider structured and unstructured data: the document acknowledges the importance of
defining both structure and content of data (fundamental for interoperability and machine
analysis) but also includes that it is important to make unstructured data available;
Reusable, replaceable, and redundant components: the idea is to develop a framework
where new data providers can be rapidly added; promote the maintenance of persistent data
sources, as opposed to databases where their lifetimes are tied to a project; planning for
redundancy, replicating working components to different locations across the globe; and
adopting an open technology framework, where operating systems, database management
systems, web servers, programming languages, and other tools are a choice to be made by
each participant according to existing needs and skills.
GBIF has developed a network based on nodes (figure 5).
15
http://ww3.bgbm.org/tapir
16
http://circa.gbif.net/irc/DownLoad/kjeFA-
J1mmGHrfOtAyTZ74s8jUwq9HoJ/p6hpeSGHkYZQWMiF42pMFYPs7fCtNHv-
/GBIFBiodiversityDataArchitecture-v0.7-draft.pdf
5
Figure 5. GBIF Network: major classes of nodes
GBIF is responsible for running the network, establishing standards, and developing tools. The
portal is the hub for the development of any service that must be centralized such as the registry of
metadata and for serving data from the biodiversity data index to the end user. GBIF participants’
nodes are established to share biodiversity data. They may be gateways to data nodes or data
nodes themselves. They may also provide services such as mapping, analysis, and hosting of
orphaned data sets. Data nodes are primary providers of data.
When GBIF was first designed, key elements of the Portal were the Biodiversity Data Index and
the Taxonomic Name service (figure 6).
Figure 6. Diagram of the GBIF portal
6
The Biodiversity Data Index holds a subset of the data held by the data nodes and includes
specimen identifiers associated with identification, geospatial and temporal information.
Centralization of these subsets of data supports a much more rapid response to user queries,
minimizing network traffic. Although taxonomic names provide the primary organizational structure
for biodiversity data, no complete catalogue of names is available today. This is an ever evolving
task which requires international collaboration. GBIF is also involved in a number of initiatives to
create web services such as mapping, georeferencing, and data cleaning. This portal presently is
much more complex and figure 7 presents a diagram of how the future portal is expected to
operate.
7
Figure 7. GBIF’s data portal deployment model
The central column represents functions which should be executed centrally (marked as GBIF
Secretariat). The components involved in delivery of services to end users and portals are shown
as replicated to a number of mirror sites. The Master Data Store needs to be implemented in a
8
single location (and should at least be associated with a "Master" instance of the Despatcher
component, but the Crawler and Validation Chain components could also be mirrored for
efficiency.
The existing GBIF UDDI registry would need significant enhancement before it could properly
support the process illustrated here.
The Schema Repository should be developed in close conjunction with the TDWG Technical
Architecture Group and can initially be represented by a small stub implementation that offers
equivalent function to the rest of the Data Portal.
The Crawler corresponds largely to the Indexer component of the existing prototype Data Portal. It
includes a scheduler which identifies data resources which should be indexed or checked for
updates and develops an appropriate strategy in each case for accessing modified data. It should
maintain a "map" monitoring the progress made in indexing any resource so that the process can
be interrupted and restarted, and also so that data providers can be notified of any records from
their resource which could not be accessed for any reason. The data offered by the Service
Registry will provide the basis for the Crawler's activity (including endpoint URLs, protocols and
data standards supported, acceptable times and days for crawling each provider's data, any
agreements made with providers as to how much data the Data Portal should cache in the Master
Data Store, etc.). The Crawler should process the data retrieved by placing an object into the
Validation Chain for each record found (new and modified records; also objects indicating the
completion of an indexer operation for a given provider to allow for clean-up of obsolete records,
etc.).
The Validation Chain corresponds largely to the Data Validation Services described in the GBIF
Data Portal Strategy, but also includes some other function from the Indexer component of the
current prototype Data Portal. This is a configurable workflow component that allows a range of
processing steps to be applied to each object placed into the chain. The exact steps will vary
according to the nature of the record concerned. It will include the generation of a series of
annotations to the object based on routines to validate or interpret the data in the record. The aim
is to reach the end of the Validation Chain with a clear understanding of what the record represents
in as much detail as possible, including an evaluation whether there are ambiguities or problems
with any of the data elements. By the end of the chain, all objects should be in a form that can
readily be stored in the Master Data Store.
The Despatcher is a new addition to the model to ensure the greatest possible flexibility in how the
Data Portal may operate. The key role of this component is to forwarding the objects from the
Validation Chain into the Master Data. It will however be the natural point to process information
which should be included in a report to each data provider at the end of each visit to index their
data. Upon further review and discussion with GBIF stakeholders (including data providers) a
range of other notification services could be implemented at this point (e.g. forwarding objects or
notifications to thematic and regional portals whenever records appear which are of interest to
those portals; management of notifications to users of the addition of data relating to their taxa of
interest). Such extensions would be a future option, but the development of a generic Despatcher
will make this easy.
The Master Data Store (Data Index) is implemented as a database used solely for managing the
best possible overview of the data in the GBIF network and does not itself support requests from
users or remote portals. All such requests will be made against Slave Data Stores maintained by
MySQL replication.
The Access Portal is a layered application making use of Hibernate to access data from a Slave
Data Store and including a Service Layer implementing all logic associated with the Data Portal's
processing of data for display. Axis will be used to provide an XML access interface to the methods
offered by the Service Layer. These methods will be those required to develop an HTML User
Portal based on the GBIF data. Axis will allow these same methods to be exposed easily as SOAP
web services for use by other portals. This interface will represent a "GBIF Native Portal Interface"
which will not always map directly to TDWG standards (since frequently only a tiny number of data
elements are needed and these should be combined in different ways from the standards).
Additional access interfaces (TAPIR, WFS, etc.) can also be implemented and exposed from the
9
Service Layer. The Data Portal's own HTML User Portal and User Services will be implemented by
a JSP layer based on the XML Data Services (the "GBIF Native Portal Interface").
Mirroring will be implemented by a combination of multiple DNS records and Apache redirection.
But GBIF is more then just a portal. Figure 8 shows GBIF’s data-exchange architecture.
Figure 8. Diagram of GBIF’s data exchange architecture
This diagram emphasizes GBIF’s basic layers, as follows (from the bottom up):
Resources - There is an increasingly large number of digital resources relating to
biodiversity. These may be in just about any format (various databases with all kinds of data
models; human readable text documents in various formats; images; etc.) and may or may
not yet be connected to the Internet.
Access – To make these resources accessible in a practical way, it is important to select a
limited number of agreed transfer protocols and formats to expose them on the Internet.
GBIF has adopted various TDWG data standards and protocols for this purpose
(DiGIR/BioCASe/TAPIR, Darwin Core, ABCD, Taxon Concept Schema) and also expects to
handle access through plain URLs where appropriate or via Globally Unique Identifier (GUID)
resolution services as these are agreed and implemented.
Discovery – Once these resources are available on the Internet, it is important to advertise
them to potential users. GBIF has established a (UDDI) registry for this purpose to store
information describing the content and access interfaces for resources and to allow GBIF
and others to find resources of interest for various purposes. GBIF has been operating its
registry for over two years and plans soon to replace the existing implementation with one
that offers richer function for describing resources and for searching for resources of interest.
Other registries may be developed to meet the interests of different networks and
communities. These may still benefit from the use of the same protocols and data standards
adopted by the GBIF network (access to reusable software components; ability for some
resources to be part of both networks; etc.).
10
Indexing – Within a large distributed network, it is important to maintain a dynamic map of
the content of the network at a finer level of detail than is possible with the metadata stored
in a service registry. GBIF is therefore developing a central index of biodiversity data by
crawling the contents of resources registered in the UDDI registry. This index will itself be
exposed through a range of web services to allow users to get rapid answers to many basic
questions and to provide pointers to relevant data records throughout the network. Again it is
likely that other groups may develop their own special-purpose indexes based on the
underlying infrastructure, benefiting from the common core of standard access mechanisms
and discovery services.
Presentation & analysis – These underlying layers should provide a common set of core
services suitable for GBIF and others to build a wide range of applications and portals. GBIF
will continue to develop a central portal for rapid discovery of basic information, but other
groups may develop more specialized portals which integrate information from the central
GBIF index and all the underlying network resources with other information managed by the
groups concerned. Since the interfaces to the GBIF index and other resources will be
exposed as web services, it will also be possible to include these data within workflow
applications of various kinds.
In general, GBIF expects ultimately to see increasing diversification at the higher levels in this
diagram, but strongly encourages the shared use of as many of the lower layers as makes sense
in each case. Its goal is to support the replication of the GBIF data services on a regional basis to
ensure that the information from the GBIF registry and index are available for inclusion within local
applications and portals.
We believe that ABBIF must follow GBIF’s general concept of a Network Portal with data nodes
or data providers and participant nodes that encourage local participation and may themselves
act as data nodes. It is important to analyze the answers to the questionnaires to identify local
institutions that already are GBIF nodes or that may contribute to the network.
Network Infrastructure
Another important element that helps define the architecture is the existing or potential
communication infrastructure. The present analysis is based on the document Redes Nacionais de
Educação e Pesquisa: Situação no Brasil e América Latina17 written to offer subsidies to Brazil’s
national strategy for biological collections.
Latin America
The digital divide is something that concerns scientific research due to our ever increasing
dependency on network infrastructure and on information and communication services that many
times are not available in developing countries. Latin American countries present a very
heterogeneous and fragile situation especially when compared to more developed regions. 10
Latin American countries hold operational academic networks, the best being located in México,
Brazil, and Chile. In Colombia, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Paraguay
and Peru the academic networks are still in an organization phase.
The Americas Path (AMPATH) project led by Florida International University in 2001 established a
high performance exchange point in Miami, Florida to facilitate peering between U.S. and
international research and education networks (figure 9).
17
http://www.cria.org.br/cgee/documentos/redesALC310505.doc
11
Figure 9. Diagram of the international, high-performance research connection point in Miami, Florida
(AMPATH)
Recently, in 2004, with the support of the European Commission (@LIS program) another network,
RedCLARA, began to operate and will include 18 countries. This is certainly a milestone in Internet
connectivity in Latin America. Besides facilitating the development of new networks this certainly is
an opportunity to build common research agendas of regional and global interest.
Figure 10 presents a diagram of the network.
12
Figure 10. ALICE Project and Red CLARA
The topology for RedCLARA includes connections of 155 Mbps with the main national networks
(Argentina, Chile, Brazil and Mexico) and of 10 to 45 Mbps to the other South American countries.
Peru and Uruguay were recently connected and the next will be Costa Rica, El Salvador,
Nicaragua, Guatemala, Panama and Ecuador. Connections to Bolivia, Ecuador and Colombia are
planned. A connection of 622 Mbps leaves Brazil and interconnects RedCLARA to the GÉANT
network of research and education in Europe.
Tables 1 and 2 present a synthesis of the situation of national research and education networks
(NRENs). When comparing data from more developed with developing countries it is clear that the
situation of Latin America is not good, especially when one considers new developments and
applications that are adequate in environments with good infrastructure but may be prohibitive in
less developed countries. Table 2 also shows what countries will gain with RedCLARA.
13
Table 1. National Research and Education Networks of some countries
Country Organization Status Connectivity External Connected Institutions
Backbone Capacity
Germany G-WIN operating 2.5 – 10 Gbps US - 2 x 2.5 Gbps 550
EU - 5 Gbps
Korea KREONET2 / operating 2.5 – 10 Gbps US - 2 Gbps 277
KOREN EU - 155Mbps
Japan - 2 Gbps
Holland SURFnet5 operating 10 Gbps US - 10 Gbps 150
EU - 10 Gbps
Poland PIONER operating 10 Gbps US - 2 Gbps 21 Metropolitan Network
EU - 10 Gbps 5 High Performance Computing
Centers
France RENATER operating 2.5 Gbps EU – 10 Gbps 50
US – 4 x 2.5
Gbps
CA – 2 x 1 Gbps
USA Internet2 / Abilene operating 155 Mbps – 10 Gbps EU – 10 Gbps 220
Asia – 10 Gbps
Source: ICFA SCIC Report – Networking for High Energy and Nuclear Physics, February, 2004
Table 2. National Research and Education Networks of Latin America
Country Organization Situation Connectivity External Capacity Connected
Backbone Institutions
2004 2005 Before After
CLARA CLARA
Argentina RETINA operating 256 Kbps a 45M 59 Mbps +45 Mbps 56
34 Mbps
Bolivia under 64 a 128 1.5 Mbps In 18
development Kbps negotiation
Brazil RNP operating 34M a 622 Up to 10 Gbps 555Mbps +155 Mbps 220
Mbps to 10 states
Chile (*) REUNA operating 155 Mbps 1 Gbps 45 Mbps 90 Mbps 14
Colombia Universidad de Under 2 Mbps- 34 Mbps In 43
Cauca development 34Mbps negotiation
Costa Rica CR2 Net operating 45 Mbps 45 Mbps 8 Mbps 45 Mbps 8
Cuba REDUNIV Under 64Kbps-2 6 Mbps In 23
development Mbps negotiation
Ecuador REICYT operating 128 Kbps- 45 Mbps 8 Mbps 16 Mbps 20
5Mbps
El RAICES Planning phase 10 Mbps 9
Salvador
Guatemala RAGIE Planning phase In 7
negotiation
Honduras RHUTA Planning phase In -
negotiation
México CUDI operating 2Mbps a 155 2*1 Gbps 3*155 45 Mbps 60
Mbps Mbps
Nicaragua RENIE Planning phase In 8
negotiation
Panama REDCYT operating 2-5 Mbps 45 Mbps +10 Mbps 8
Paraguay ARANDU Planning phase 128Kbps- Up to 155 2 Mbps 12 Mbps 37
10Mbps Mbps to 2
sites
Peru RAAP operating 10 Mbps 45 Mbps 45 Mbps 45Mbps 8
Uruguay RAU operating 64 Kbps a 1 Up to 100 6 Mbps 18 Mbps 46
Mbps Mbps to 12
sites
Venezuela REACCIUN operating 26 Mbps 53 Mbps + 45 Mbps 78
Source: CLARA.
It is important to observe that in the global scenario NRENs are constantly evolving and achieving
higher levels of connectivity to meet the requirements of new applications developed by research
and education institutes worldwide. Countries that are catching up, such as Korea have their
backbones at a Gbps level. In these countries one can also see the greater level of investments to
guarantee a good connectivity in the extremes of the network (end-to-end). Latin American
countries are still in the Mbps level (one thousand times less) with the exception of Brazil, Mexico
and Chile. When compared to developed countries one can state that Latin America is in the
14
situation these countries were 5 years ago. This is certainly a constraint to international
cooperation in the field of science and technology.
Brazil
Brazil has its Research and Education Network (RNP) installed since 1989. RNP integrates all 26
Brazilian states and its capital through a backbone of up to 10 gigabits per second. São Paulo, Rio
de Janeiro, Minas Gerais and Brasília, are on a backbone of 10 Gbps; while Rio Grande do Sul,
Santa Catarina, Paraná, Bahia, Pernambuco and Ceará, at 2,5 Gbps. The rest of the states are
connected through links of up to 34 Mbps. It is expected that the whole network will be operating at
gigabits by the year 2007 (figure 11).
Figure 11. RNP Backbone
The national network (RNP) links about 300 universities, research institutions and federal
agencies. Integrated to the national network are the state networks that distribute the network from
the state presence point of RNP. The most important state networks are Santa Catarina, Paraná,
Rio de Janeiro, and São Paulo. In the case of São Paulo, there is also a Research and
Development Program (TIDIA)18 in different areas of information and communications technology,
telecommunications and computer networks, associated with the advanced internet.
18
Tecnologia da Informação no Desenvolvimento da Internet Avançada (www.tidia.fapesp.br/portal)
15
Analysis
The questionnaire that was sent out to evaluate the situation of data providers and custodians of
the region included questions of relevance for the definition of the best architecture, such as:
Standards Used
Data Model: Darwin core, ABCD, CABRI, Others (specify)
Protocol: DiGIR, BioCASE, Z39.50, http, xml, Others (specify)
Existing infra-structure: Hardware and software:
Staff: Adequate, Insufficient (specify)
Internet Access:
Type of Internet Access: None, Modem, dedicated line.
Data and Information Access Policy: Unrestricted access, Restricted access
Willingness to participate in this project.
All questionnaires from collections from countries located in the Amazonian region were analyzed.
Collections that don’t wish to participate in ABBIF or that don’t want to share their data were not
included. Institutions that don’t have specimen data were also not included.
Peru
Questionnaires
CRIA sent out the ABBIF questionnaires to 5 institutions and 15 individuals in Peru and received 7
answers. The answers from 6 institutions were sent by Siamazonia and 1 answer was sent from a
private collection. Table 3 shows the result of the questionnaire concerning standards, protocols,
infrastructure, Internet access and information policy.
Table 3. Answers to the questionnaire from institutions in Peru
Collection total records Digitized (no. & Georeferenced total Digitized Georeferenced
% of total) (no. & % of records (no. and (no. and % of
digitized) Amazon % of digitized)
total)
Siamazonia 60.000 60.000 30.000 60.000 60.000 30.000
Herbário MOL-FCF 11.428 6.857 5.000
Herbário 130.000 32.500 22.750
Amazonese
Herbário Regional
de Ucayali
Herbário Herrerensi 6.000 3.300 2.640 5.000 3.200 0 - 2500
UNMSM (11 1.500.000 200.000 400.000 40.000
collections)
Personal collection 100.000 5.000 5.000 5.000
of leaf beetles and
their host plants
Total (without 1.747.428 247.657 (14%) 30.390 (12%) 415.000 100.000 30.000 (30%)
Siamazonia (24%)
16
Standards & Infrastructure Internet Information observation
Protocols access Policy
Siamazonia DarwinCore, Sufficient hardware, require 512 Kbps unrestricted gbif node
DiGIR, http, xml software for data analysis,
mirroring, Arc IMS, sufficient staff
Herbário MOL-FCF Require more disk space, sufficient dedicated unrestricted
staff line
Herbário Amazonense DarwinCore, computers, disk space, camera, 256 Kbps unrestricted
http, xml require staff for the collection
Herbário Regional de No answers
Ucayali
Herbário Herrerensi DarwinCore, computers, disk space, camera, 512 Kbps unrestricted
http scanner and software for collection
management, insufficient staff
UNMSM (11 computers, camera, scanner, dedicated unrestricted the museum has
collections) memory (servers and software for line not adopted
image editing), require people for standards
digitization
Personal collection of
leaf beetles and their
host plants
Information System
Peru is in a very good situation as it has developed Siamazonia19, the information system for
biological and environmental diversity of the Peruvian Amazon (Sistema de Información de la
Diversidad Biológica y Ambiental de la Amazonía Peruana). Siamazonia was created in 2001
through the BIODAMAZ project (Proyecto Diversidad Biológica de la Amazonía Peruana), an
agreement between Peru and Finland, and was developed by the Instituto de Investigaciones de la
Amazonía Peruana (IIAP). IIAP is a GBIF node and therefore is a natural partner of the ABBIF
network.
Its structure is based on nodes, similar to GBIF. The following diagram was taken from its website:
Figure 12. Structure of the Siamazonia Network
19
www.siamazonia.org.pe/
17
In the diagram, the facilitating node is IIAP that has committed itself for long-term development and
maintenance of secretarial, technical, and administrative tasks of the system. Principal nodes are
universities or their museums, research institutes, and other institutions with valuable information
resources and interest in participating in the development of the system. Their representatives
(IIAP and principal nodes) constitute the Steering Committee, which is the major decisive body of
the system. Additional nodes may include a broad category of institutions of interest to the network,
but that don’t fulfill the requirements of principal nodes.
IIAP produced a technical document presenting an overview of the architecture for the planned
Peruvian Amazonian Biodiversity and Environmental Information System (IIAP, 2004)20. This
document is a result of five regional workshops held during the months of March and April, 2001.
Besides its proposed node structure, the document also presents a diagram of the information
system (figure 13) where it includes a linkage to GBIF.
Figure 13. General Structure of the Information System (IIAP, 2004)
The document also states that databases in general are of free access.
Siamazonia is already serving data to GBIF using DiGIR. One resource is Observations of flora y
fauna of Peruvian Amazon by BIODAMAZ project with 477 records and 112 taxons and the other
resource is Information of Flora and Fauna in Varzeas (Peruvian Amazon) with 11.009 records and
3.218 taxons.
20
Sistema de Información de la Diversidad Biológica y Ambiental de la Amazonía Peruana (SIAMAZONIA),
http://www.iiap.org.pe/biodamaz/faseii/download/literatura_gris/2.pdf
Serie IIAP-BIODAMAZ, ISBN N° 9972-667-10-3, 2004.
18
Venezuela
Questionnaires were sent to 21 institutions and 28 individuals and 11 answers were received.
Table 4 and 5 present the answers to the questionnaire as to available data, digitization, facilities,
and policy.
Table 4. Answers to the questionnaire concerning no. of records and digitization
No.
Records No. Rec. georef. Digit.
Acronym Checklists group (total) Georef. digitized Amazon Amazon Amazon Software
BIOTA
COP birds 80.000 80.000 80.000 database
Reports to
MARN on
diverse
EBRG vertebrates vertebrates 61.529 6.000 6.000 7.404 740 7.404 Excel
Checklists of
PORT Amazonian Visual Basic,
(BioCentro) phanerogams phanerogams 100.000 7.500 Access
ecoSIG -
interested Databases on maps, lists of
in Amazonian species Sistema de
participating amphibians, (amphibian, Información
as data birds and birds and Geográfica,
custodians mammals mammals) Arc-View
phanerogams,
GUYN cryptogams 18.625 13.000 Access
BioCentro -
Museo de
Zoologia fish 53.000 40.000 53.000 10.000 10.000 10.000 Specify
plants, fungi,
VEN algae 350.000 35.000 113.000 27.200 27.200 Access
phanerogams,
vertebrates,
MHNLS invertebrates 190.000 190.000 190.000 5.399 5.399 5.399 WinISIS
terrestrial
MBUCV vertebrates 14.898 10.000 14.898 1.628 1.628 1.628 Excel
PostgreSQL,
MIZA insecta 2.500.000 5.289 5.289 500.000 768 768 PHP, Excel
amphibian,
ULABG reptiles 200 0 0 200
Total 3.368.252 379.289 469.687 551.831 18.535 52.399
% 100,0% 11,3% 13,9% 16,4% 3,4% 9,5%
An important feature is that, with the exception of one institution, all are in the process of digitizing
their holdings. 14% of the over 3 million specimens are digitized, with a very high percentage of
georeferencing (over 80% of the digitized records). If one doesn’t consider MIZA’s collection of
insects (2.5 million of which less then 0.2% is digitized), we are considering over 850 thousand
specimens, of which more than 460 thousand records (or more than 50%) are digitized.
19
Table 5. Answers to the questionnaire concerning on-line data
Adequate
Standards Hardware
& & Adequate Internet Restricted Willingness to
Acronym available online Protocols Software Staff Access access participate
not before knowing
ABBIF conditions and
COP no yes yes ABA 256 yes aims
EBRG no no no none yes
PORT
(BioCentro) no no no none yes yes
ecoSIG -
interested
in
participating
as data 100
custodians http://ecosig.ivic.ve yes yes Mbps no yes
dedicated
GUYN no no no line yes yes
BioCentro -
Museo de
Zoologia no DIGIR yes no none yes yes
VEN no no no no yes
MHNLS no yes yes 192/128 no yes
MBUCV no no no none yes
http://www.miza- Darwin
MIZA fpolar.info.ve Core, http no no none yes yes
dedicated yes, if of mutual
ULABG no no no line yes benefit
Although the digitizing process seems to be in place, the same does not apply to on-line availability
of data. Venezuela, unlike Peru, does not hold a GBIF node or have a local organization working
on an information system to integrate biodiversity data. Two institutions indicated that they have
data on-line: Ecosig, a geographic information system; and the Museo del Instituto de Zoología
Agrícola (MIZA). MIZA is the collection with the largest holding (2.5 million specimens - insects) of
which only 0,2% is digitized.
Venezuela shows a need for resources to digitize data and also to develop a system to make
biodiversity data available on the Internet. There is a project that was recently approved on the
development of an integrated information system for vertebrate collections in Venezuela. The
project involves the following institutions: Museo de Historia Natural La Salle (MHNLS); Museo de
Biología de la Universidad Central de Venezuela (MBUCV); Museo Estación Biológica Rancho
Grande (EBRG); and the Colección Ornitológica Phelps (COP), all of which answered the ABBIF
questionnaire and together are responsible for 350 thousand specimens. The focal point of this
project is MHNLS. We believe that this project can learn from experiences such as GBIF, CRIA
and Siamazonia and use all open source developments that are available.
Bolivia
The questionnaire was sent to 14 institutions and 13 individuals and CRIA received answers from
one herbarium and two zoological collections all from the Museo de Historia Natural Noel Kempff
Mercado (MHNNKM)21.
21
http://www.museonoelkempff.org
20
No. Staff, Willingness
specime Data Software & to
Collection ns Software online Protocol Hardware Internet information participate
Herbario del
Oriente
Boliviano 65.000 Excel no http Inadequate ADSL Restricted yes
Zoological
collection 92.970 Excel no http Inadequate ADSL yes yes
Entomological
collection 500.000 Excel no http Inadequate ADSL yes yes
Total 657.970
There was no information on the percentage of digitized or georeferenced data. The Museum’s
website holds information on research that is being carried out, on maps and also presents lists of
species from the project of the Fundación para la Conservación del Bosque Seco Chiquitano,
Cerrado y Pantanal Boliviano (FCBC). Data of the collections’ holdings are not available on-line.
Colombia
Questionnaire
Based on the web survey on possible data providers that was carried out in the beginning of the
project, 124 questionnaires were sent out to Colombian institutions and 133 to individuals. Only 9
answers were received, but, as is the case of Peru, Colombia has a GBIF node, the Alexander von
Humboldt Biological Research Institute. They were contacted directly and carried out a very good
survey on 93 institutions involving 29 herbaria, 60 zoological and 9 microbial collections. These
institutions together hold a total of 4.071.632 records, more then 50% digitized and about 10% of
the digitized records georeferenced. Only approximately 2% of the records are from the Amazon
region, but this is undoubtedly an important initiative to be sponsored (tables in annex 1).
Information System
The Alexander von Humboldt Biological Research Institute are natural partners of ABBIF as they
are a GBIF node and are responsible for the Biodiversity Information System SIB (Sistema de
Información sobre Biodiversidad22) that is being developed in Colombia.
SIB is being implemented as a distributed network. Humboldt is the leading institution, and is
officially23 responsible for its design, implementation and general coordination. The structure
includes a Technical Committee that is responsible for:
defining general aspects of a national policy for biodiversity data and information
management;
validating technical elements and providing recommendations as to the implementation of
SIB at a local, regional, and national level;
establishing a line of capacity building, replicating the SIB model and promoting expertise in
other entities; and,
facilitating the articulation of SIB with other information initiatives at the national, regional,
and global lever.
The technical committee today is composed of members of the following institutions:
Instituto Amazónico de Investigaciones Científicas – SINCHI
Instituto Alexander von Humboldt
Instituto de Hidrología Meteorología y Estudios Ambientales – IDEAM
Instituto de Investigaciones Marinas y Costeras José Benito Vives de Andréis – INVEMAR
Instituto de Investigaciones Ambientales del Pacífico – IIAP
22
http://www.siac.net.co/Home.php
23
Ley 99 de 1993 y los decretos reglamentarios 1600 y 1603 de 1994
21
Instituto de Ciencias Naturales de la Universidad Nacional – ICN
Ministerio de Ambiente, Vivienda y Desarrollo Territorial
SIB is also composed of regional and thematic networks.
The data model that is being used is DarwinCore V2 standard (as the minimum acceptable
content) and the Estándar para intercambiar información al nivel de organismo designed by the
project team24. The interface for distributed searches is not available. SIB communication protocol
was being developed concurrently with DiGIR, but it seems clear now that in order to share data
with other initiatives it is important to use a common protocol. GBIF recommended that SIB should
use TAPIR that should be ready for testing in the near future.
SIB (Dec 2005) has four datasets publicly available: Butterflies of the Schmidt-Mumm Biological
Collection (Humboldt), Pteridophyta of the FMB biological collection (Humboldt), Leguminosae of
the FMB biological collection (Humboldt) and Selected records from the National Herbarium of
Colombia (ICN-UN). According to Ángela M. Suárez-Mayorga25, standardization and proper
documentation of biological data and/or metadata are ongoing in nearly 30 organizations and four
networks of data administrators: Red Nacional de Observadores de Aves, Red Nacional de
Jardines Botánicos, Red de Colecciones Biológicas de los Andes and SIRAP-Eje Cafetero. All are
also documenting metadata for biological datasets, following the "Estándar para la documentación
de metadatos de conjuntos de datos relacionados con biodiversidad26. One problem that Humboldt
faces, as do many other data custodians, is in convincing data providers to openly share their data
on the Internet.
French Guyana
Although French Guiana is an overseas department of France and, consequently, is politically a
part of Europe, it is located in South America and for this reason will be included in this report as
an “Amazonian country”.
The “Herbier de Guyane (CAY)”, a Center of the Institute de Recherche pour le Developpment
(IRD) in Cayenne answered the questionnaire. The herbarium houses approximately 160,000
vascular plant, bryophyte, and fungal specimens collected in the Guiana’s area, mainly in French
Guiana, more than 125,600 of which are digitized in the AUBLET2 database (2,500 with digitized
images), ca. 90,000 georeferenced. Of the 160,000 specimens, 452 are nomenclatural types. IRD
is a member of the “Flora of the Guianas” consortium. Moreover, the Herbarium contributes to the
“Flora Neotropica” program (New York Botanical Garden) and to the “Checklist of the vascular
plants of the Guyana Shield” (Smithsonian Institution, Washington DC).
Of the 160 thousand specimens, 125,600 (78.5%) are digitized and 90,000 (56%) are
georeferenced. The database software used is Oracle and the data model RIHA (Réseau
Informatique des Herbiers Africains) that is compatible to ABCD. Biocase is used as a
communication protocol
The data is freely available on-line at http://www.cayenne.ird.fr/aublet2/ and the herbarium also
serves data through gbif (123,634 records).
The herbarium requires scanning equipment for digitizing images of the specimens, especially the
types. They are also interested in establishing a digitizing program of the non vascular plants and
of the Guiana Shield collections and would require additional staff.
The herbarium is open to collaboration with new partnerships of the Amazonian countries to share
experience and their data is available without restrictions. This data could therefore be immediately
available to ABBIF.
24
http://www.siac.net.co/sib_descargas.php
25
personal email January 10, 2006
26
http://www.siac.net.co/sib_descargas.php
22
Ecuador
The questionnaire was sent to 16 institutions and 23 individuals, based on the Internet survey.
Three answers were received, including 2 collections that have interest in participating in ABBIF.
Name coll. Acronym Checklists No. % No. software
Specimens Digitalized records
Amazon
Unión Mundial para la UICN UICN databases Access, Cold
Naturaleza Fusion
Escuela Superior Politecnica CHEP TROPICOS 8.700 5.000 1.500 TROPICOS
de Chimborazo (ESPOCH), database (pick)
Herbarium
Pontificia Universidad Católica Herbario Checklists for 250.000 30.000 10.000 Filemaker
del Ecuador QCA Equadorian Pro
Angiosperms and
Pteridophytes
Acronym Data available Standards Hardware Staff Access Information Willingness
online? URL & Protocols & to policy to
Software Internet access participate
IUCN http Adequate Inadequate dedicated Restricted Yes
www.sur.iucn.org 512
CHEP no Tropicos Inadequate Inadequate Modem Unrestricted Yes
(Pick)
Herbario no Inadequate Inadequate Ethernet, Unrestricted Yes
QCA 54kbp
These collections require support to digitize and to make data available on the internet. IUCN,
although leading the conservation commons initiative27 does not have a clear data sharing policy in
Ecuador or have data on species readily available.
The Herbaria of the Pontificia Universidad Católica del Ecuador is a collaborator of Missouri
Botanical Garden together with the Herbario Nacional at Museo Ecuatoriano de Ciencias Naturales
and the Department of Systematic Botany of Aarhus University, in the Catalogue of the Vascular
Plants of Ecuador project28. A possible strategy may be the establishment of a regional server with
a portal at QCA to begin structuring local data and to serve data to the ABBIF network.
Brazil
Collections
Brazil has two very important projects underway that are of direct interest to ABBIF: the
speciesLink network29 and PPBio – MCT30, the biodiversity research program of the Ministry of
Science and Technology.
The speciesLink network involves 40 collections, one centralized information system of observation
data from São Paulo State (SinBiota31) and one centralized network with 9 microbial collections
(SICol32):
27
http://www.conservationcommons.org
28
http://www.mobot.org/mobot/research/ecuador/welcome.shtml
29
http://splink.cria.org.br/
30
http://ppbio.inpa.gov.br/
31
http://sinbiota.cria.org.br/atlas/
23
Collections no. of records Digitized (no. and % of total) Georeferenced (no. and % of digitized)
Plants (herbaria, algae, wood) 1.430.250 289.487 (20%) 70.544 (24%)
Zoological collections 935.523 374.208 (40%) 191.101 (51%)
Microbial Collections 8.724 8.724 (100%) 0 (0%)
Observation Data 71.866 71.866 (100%) 71.866 (100%)
Total records 2.446.363 744.285 (30%) 333.511 (45%)
The total number of records from the Amazon region in Brazil in the speciesLink network is
127.473 and from other Amazon Basin countries 122.624. Of the total (250.097), 128.097 are
georeferenced. It is important to stress the fact that there are some very specialized collections in
São Paulo with important holdings from the Amazon region such as
the fish collection of the São Paulo State University Museum (MZUSP); and,
The Bee Collection (RPSP) of the biology department FFCLRP/USP.
PPBio in its first phase is concentrating on the Amazon and the semi-arid regions. Partner
institutions include INPA – Instituto Nacional de Pesquisas da Amazônia; MPEG – Museu
Paraense Emílio Goeldi; and INSA-CF – Instituto Nacional do Semi-Árido Celso Furtado. These
institutions will have support to digitize and make their data available on-line.
The tables that follow present a summary of the status of the institutions that answered the
questionnaire.
Collection Plants Animals Micro. total Digitized (no. and % of Georeferenced (no. and % of
records total) digitized)
DZSJRP - pisces 1 7.500 7.500 4.684
UNIR - Fish 1 23.229 23.229 23.229
UNIR - Mammals 1
(CRM)
MEFEIS 1 10.200 1.000
UFRR 1 2.751 2.751
UFAM Coleção 1 191.992 0 0
Zoológica
MIRR 1 5.914 4.666
INPA - Peixes 1 24.536 17.000
INPA - CMIM 1 7.459 7.459 0
INPA - Mammals 1 4.819 4.819
INPA - invert. 1 303.015 1.022
INPA - Amphi 1 13.500 13.500 6.750
INPA - Herbaria 1 215.000 200.000 86.000
INPA - Aves 1 631 631 400
JBRJ 1 410.000 40.000
SPF - USP 1 145.000 15.000 7.000
Herbário MG 1 174.000 165.000
Instituto Butantan - 1 9.298 2.295 593
IBSP
HRCB 1 40.000 5.000 0
MPEG - Invert 1 2.000.000 20.000
MPEG - Fish 1 11.000 8.500
MPEG - Herp. 1 60.000 58.000 2.000
IPT - BCTw (xiloteca) 1 19.500 7.600 0
MPEG - Masto 1 34.000 16.000 1.000
MPEG - Coleção 1 74.965 71.200
Ornitológica
INPA - xiloteca 1 10.392 3.100
Total 9 16 1 3.798.701 695.272 (18%) 131.656 (19%)
32
http://sicol.cria.org.br/cv/
24
Records from the Amazon Basin:
Collection Plants Animals Micro. total Digitized (no. Georeferenced (no. on-line
records and % of total) and % of digitized)
amazon
DZSJRP - pisces 1 390 309 343 splink.cria.org.br
UNIR - Fish 1 23.229 23.229 23.229 no
UNIR - Mammals 1 no (100% digitized -
(CRM) Word)
MEFEIS 1 splink.cria.org.br
UFRR 1 2.751 2.751 no
UFAM Coleção 1 191.992 0 0 no
Zoológica
MIRR 1 Nno
INPA - Peixes 1 no
INPA - CMIM 1 no
INPA - Mammals 1 4.819 4.819 no
INPA - invert. 1 no
INPA - Amphi 1 13.500 13.500 6.750 no
INPA - Herbaria 1 no
INPA - Aves 1 631 631 400
JBRJ 1 splink.cria.org.br
SPF - USP 1 splink.cria.org.br
Herbário MG 1 no
Instituto Butantan 1 splink.cria.org.br
- IBSP
HRCB 1 splink.cria.org.br
MPEG - Invert 1 no
MPEG - Fish 1 8.000 8.000 no
MPEG - Herp. 1 59.000 58.000 2.000 no
IPT - BCTw 1 splink.cria.org.br
(xiloteca)
MPEG - Masto 1 32.000 16.000 no
MPEG - Coleção 1 63.720 60.500 no
Ornitológica
INPA - xiloteca 1 8.300 no
Total 9 16 1 408.332 187.739 (46%) 32.722 (17%)
Information Systems
Brazil has been actively involved in the discussions of the clearing-house mechanism of the
Convention on Biological Diversity and in the discussion of the establishment of biodiversity
information systems. Although the country is not a GBIF member, CRIA has been participating in a
number of international initiatives collaborating in the establishment of standards, protocols and
information systems. This experience and the products of this work will certainly contribute to the
establishment of ABBIF.
CRIA developed an information network to link data from biological collections located in the State
of São Paulo called speciesLink and a centralized information system called SinBiota to receive
data from surveys carried out by researchers financed through the Biota/Fapesp Program33. Both
of these developments are project based and were financed by The State of São Paulo Research
Foundation (Fapesp). Another system developed by CRIA is SICol, a centralized information
system with data from microbial collections of biotechnological interest.
SinBiota
SinBiota34 adopted a centralized model (figure 14). Data providers are individual researchers or
groups working in the field. It really doesn’t make any sense to expect each and every researcher
to maintain his/her data in a private information system, on the Internet, interoperable with a
number of databases of other researchers. So the natural strategy was to develop a centralized
database that could be fed by each researcher through a password-controlled web interface. A
common format for field records for all taxa was adopted and all use the same web interface to
enter, alter, or delete data from the database. Associated to the field record is a list of species. The
33
www.biota.org.br
34
http://sinbiota.cria.org.br/atlas/
25
web server, besides freely and openly serving data to any internet user, integrates the database
with maps through the mapCRIA35 web service.
Maps
Map Service
User Web server
Database
Web Interface
Researcher of Surveys and
the Biota/Fapesp associated
Program lists of species
Figure 14. Diagram of SinBiota’s architecture
SICol
Before defining the architecture for the microbial culture collection information system, a survey
was carried out to determine what infrastructure and expertise was available. Holdings of microbial
collections are small when compared to herbaria and most zoological collections. The survey
showed that most collections use spread sheets or text files to organize their data and have
problems such as lack of local expertise in informatics and inadequate Internet access. For these
reasons the option was the development of a centralized system (figure 15) where collections
could “deposit” their data. This system was named SICol36.
35
http://www.cria.org.br/mapcria/doc/
36
http://sicol.cria.org.br/
26
Data providers
(culture collections)
users
HTTP
administrative Perl & virtual
interface Apache catalog
updates queries SQL
relational database
PostgreSQL
Figure 15. Diagram of SICol’s Architecture
A user friendly interface was developed to allow curators to simply upload a flat file with the data of
their holdings. An extension of DarwinCore for microbial data was developed based on CABRI37
guidelines for minimum, recommended and full data sets for catalogue production.
speciesLink
The third system developed by CRIA was speciesLink38. The aim was to integrate data from
biological collections located in the State of São Paulo that were willing to share their data. The
system to be developed should also be interoperable with SinBiota and The Species Analyst
Network. This was clearly the case of a distributed architecture, but it would also have to
acknowledge problems such as lack of expertise and poor Internet connectivity.
speciesLink is based on a DiGIR network, which typically involves 3 components:
Presentation layer: the software that interacts with the user offering a friendly interface for
queries and presentation of the results. This layer also interacts with the next layer, the
portal.
Portal: the portal is responsible for the distribution of messages. It is the software
responsible for receiving queries from the presentation layer and distributing them to each
data provider connected to the network. Communication with the providers is carried out
using the DiGIR protocol.
Provider: is the software responsible for receiving queries from the portal and translating
them to the query language used by the local database. The translation process includes
mapping of the local fields according to the conceptual schema used by the network.
The original idea would be to connect each collection directly to the portal through this protocol.
But due to lack of good connectivity, infrastructure and/or expertise, the solution found was to
develop regional servers that mirror the data held by these collections (figure 16).
37
CABRI (Common Access to Biological Resources and Information) – www.cabri.org
38
http://splink.cria.org.br/
27
queries virtual
users
catalog
HTTP / XML
DiGIR
Portal
HTTP / XML
DiGIR
Provider
Regional
DiGIR DiGIR DiGIR Server
Provider Provider Provider
SOAP
Collections (data providers)
Figure 16. Diagram of a combined system
For this architecture other interfaces were developed to read records and update the databases
held at the regional server. Filters that allow the curator to omit sensitive data and have full control
over the data he/she wishes to make freely available were also developed. Figure 17 presents the
diagram of the architecture adopted by speciesLink.
speciesLink site
speciesLink site
Lib DiGIR
DiGIR
Lib
DiGir Portal
DiGir Portal
Presentation Layer
Presentation Layer
Fast and stable connectivity
Collection A Regional Server
SQL SQL Provider
Provider Data
Data Postgres PHP
PHP
Collection
Management Mirror
SOAP server
System
Slow or unstable connectivity
Collection B Collection C
SQL SQL spLinker
spLinker
Data Data
Java Java
Collection Collection Data
Data
Management Repository Management Repository
System System
Figure 17. Diagram of the speciesLink Architecture
28
Another important feature of the speciesLink network was the development of a number of tools for
mapping, monitoring and data cleaning39. This increased the interaction between participating
collections and CRIA’s staff.
Figure 18 presents a diagram of the data cleaning process which is initiated every night. The
system identifies collections that have updated their databases and then runs the process. A report
for each collection is generated and made available on the web. Suspect records for names and
lat/long are highlighted and a number of diagrams and charts with the collection’s profile and data
cleaning progress are presented. All information is made publicly available40 so that users can also
evaluate the quality of each collection’s data.
out/2004
Collections of São Paulo International Collections
Col 1 Col 2 Col 3 ... Col n Col 1 Col 2 ... Col n
spLink Portal
spLink Portal
J ava
daily import of updates
Local database
Local database
Suspect Preparation of
dc_tax
records dc_geo
diagrams
Perl & profiles
Pos tg reSQ L
chart.pm (Perl)
Tables with suspect
Tables with suspect
records
records PostgreS Q L
Web
Figure 18. Diagram of the data cleaning process
All developments carried out by CRIA use free and open source software (Intel hardware; Linux
Red Hat operating system; Apache web server; Perl, PHP and Java programming languages; and
HTTP, SOAP, XML and DiGIR protocols).
Strategic Plan
Another important study of relevance to this project, contracted by the Brazilian Ministry of Science
and Technology through the CGEE (Center for Strategic Management and Studies on Science,
Technology and Innovation) was the definition of a national strategy for the modernization of
Brazilian biological collections and the development of an integrated information system about
biodiversity.
The Brazilian Societies of Botany, Zoology, and Microbiology were invited to coordinate this
process together with CRIA. A number of documents were produced by specialists and were
39
See the speciesLink data & tools page at http://splink.cria.org.br/tools
40
http://splink.cria.org.br/dc
29
presented and discussed at a workshop held in June and the proposed strategy was presented at
a workshop held in July with approximately 80 participants, including visiting specialists from
abroad41. All documents are available on-line42 and present the state-of-art of biological collections,
information systems, and the Internet in Brazil.
The strategy for the establishment of a program for the next 10 years was presented and is being
discussed within the Ministry. There already are some concrete results of this work, such as:
A call for proposals sent out by the Ministry of Science and Technology for biological
collections that includes setting up on-line information systems, with a total budget of R$ 5
million for 2 years.
A Taxonomy Program established by the National Council for Scientific and Technological
Development (CNPq)
Another interesting development is the replication of the speciesLink experience in other states.
The following networks shall be developed in 2006 using the same standards, protocols, and
decentralized architecture, all integrated with speciesLink:
Parana Network of Biological Collections: this is one of the 8 projects approved in the recent
call for proposals of the Ministry of Science and Technology and involves 8 collections from
Parana State;
Biota do Espírito Santo: with 16 collections from 3 institutions (Universidade Federal do
Espírito Santo, Museu de Biologia Mello Leitão, and INCAPER – Instituto Capixaba de
Pesquisa, Assistência Técnica e Extensão Rural).
Discussions have also begun with the state of Bahia and collections of the semi arid region of the
Northeast of Brazil.
41
http://www.cria.org.br/cgee/col
42
http://www.cria.org.br/cgee/col/documentos
30
Strategy: Proposed Network
The analysis of existing experiences (GBIF, Siamazonia, Humboldt, and CRIA) with standards,
protocols, tools, and architecture indicate that there isn't a universal solution for all situations.
Technology for a truly distributed system exists and the speed of the Internet is increasing, but a
decision as to the architecture to be adopted (centralized, distributed, or combined) for each
situation will depend on an evaluation of the data provider and user, and on the available
resources (expertise, hardware, software, communication). Factors that are independent of the
architecture are that the data provider must have full control over his/her data/information and that
target users have complete access to the information they require in a format that they can use.
It is clear that the proposed architecture for ABBIF must reflect the aims of the project which
include:
the establishment of an integrated regional information system for the Amazonian region,
based on free and open access to taxonomic information and specimen data;
the development of a system where each data/information provider or custodian will be fully
responsible for his/her own data/information;
the development of a system where each provider can undertake frequent updating;
the development of a system that will help promote data validation;
the development of a system where full attribution to data/information sources are given;
strengthening of local stakeholders – biological collections and data custodians;
strengthening and integration of existing information systems at local, national, and regional
levels; and,
integration of ABBIF with GBIF.
In order to propose a strategy it is important to think about the different actors that will compose the
network. The actors of ABBIF are:
data providers;
data custodians;
users; and
financing agencies
Data providers can be biological collections, researchers carrying out inventories, taxonomic
studies, etc, and researchers of other fields with complimentary data such as climate, vegetation,
satellite images, etc. They have a series of responsibilities within the network with include following
certain standards in registering data and metadata and attesting the quality of their. Biological
collections as data providers must also have a clear data and information policy, allowing free and
open access to data that is not confidential or sensitive.
Data custodians or administrators of databases and/or information systems have an important
role to play. Developing, running, and maintaining information systems is a highly professional
activity. It is not for amateurs. Data custodians therefore must be trustworthy and competent and
must participate in the development or at least adopt internationally accepted standards. They
have an important role to play in offering support to data providers as to the use of standards and
must promote the interoperability and integration of systems. Data custodians must guarantee data
integrity and respect any restrictions indicated by each data provider, protecting property rights,
confidentiality and other restrictions if necessary or pertinent. Data custodians are also responsible
for system back up, migration to new technologies and maintenance in general. It is desirable that
they have a highly specialized team in data bases, qualified to develop tools of interest to data
providers and users.
Users also have an important role to play. They must adhere to adopted standards and respect
restrictions and limits of data use, acknowledging authorship and credits. They must also offer
feed-back to authors and to custodians indicating possible errors and discussing the possibility of
implementing new services.
31
Financing Agencies must also prepare themselves for this new digital age and have a clear data
and information policy especially for data that is already born digital. Public funding in activities of
public interest should generate systems that provide free and open access to data and information
that are not confidential or sensitive. There must also be a policy to digitize historical data, such as
biological collection records, and a long term policy to maintain information systems. In a regional
information facility such as ABBIF, it is also important that an inter-agency policy be established to
maximize resources and better integrate activities.
Elements of the Architecture
The aim is the establishment of a data infrastructure open to all interested, where the data provider
has complete control over his/her data.
ABBIF coordination
A distributed coordinating effort is perhaps that greatest challenge to be faced. The whole concept
proposed is the strengthening of local data providers, offering the necessary infrastructure for open
and free dissemination of data, but without their losing control and responsibility for the data.
In the case of ABBIF we believe that local data custodians such as Siamazonia, Humboldt and
CRIA have a significant role to play. It is important that the project strengthens these initiatives at
the country level and, at the same time, is able to use these capacities at a regional level. Country
data custodians should act as facilitating nodes and should be part of an ABBIF development
council together with GBIF. At the same time it is important that there is a “secretariat” in place,
responsible for the network, monitoring activities and promoting ABBIF, identifying new country or
regional partners.
An ABBIF secretariat should work on coordinating and strengthening these efforts and capacities
and, at the same time offering services to countries and institutions that want to share data and
don’t have the necessary local expertise and infrastructure.
The coordination structure should be further discussed at a workshop with country representation.
Data Providers
We think it is important to determine target data providers of ABBIF’s initial phase. In our opinion
focus should be given to specimen and specie data, so therefore biological collections and
observation data (inventories) would be our first targets. We also believe that the organization of
data providers must be country driven, meaning that the articulation and involvement of different
providers will be carried out nationally.
Biological collections, due to the nature of their activities, are information centers. They must have
sufficient infrastructure and expertise to set up their own information system for internal purposes.
Those that also have the necessary infrastructure and expertise to hold an internet information
system available 24 hrs a day can serve their data directly to the network. Those that don’t have or
don’t want to maintain dynamic links should have a mechanism to submit, alter, and delete their
data at a regional server (or cache node).
Figure 19 shows a diagram of the network.
32
ABBIF Portal
Regional Servers
Collections with
dynamic links Collections mirroring their data in
regional servers
Figure 19. Component data provider: biological collections
Collections with dynamic links and regional servers must adopt compatible standards and
protocols and must be held in institutions capable of maintaining the system and serving data
through fast Internet connections.
Observation data and taxonomic descriptions represent two other groups of data providers,
individuals or research groups. This is the case where facilities must be offered by data custodians
where researchers may deposit their data for full and open access on the internet. This is not a
task for amateurs. There must be a highly specialized staff that has as its main activity the
development and maintenance of information systems that guarantee the preservation and
dissemination of data.
Based on the open and free access to data concept, this element of the network will be called
digital data commons space43. The network may have more then one servers that guarantee the
necessary infrastructure for preservation, maintenance, recuperation, and dissemination of the
data. Internet connectivity must be stable and fast (figure 20).
43
see National Science Board. Draft Report: Long-Lived Digital Data Collections: Enabling Research and
Education in the 21st Century. NSB-05-40. March 30, 2005.
http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf
33
portal
Internet 2
Data commons space “data commons” “data commons”
Observation data
Taxonomic data Other data
Figure 20. Architecture element: digital data commons space
We believe that this element could involve the conservation community that hold important
observation data that are normally disseminated through books and reports.
Portal
GBIF today has a data index that serves data to the system. A subset of over 85 million records,
with name and locality data is harvested from 152 data providers and maintained at a centralized
database. This makes the basic search system much quicker and solves problems such as slow or
unstable connectivity. After carrying out the basic search the user obtains a list of providers with
the number of records found. Users can then display the list of records corresponding to each
provider. Users can also download the selected records. This moment he/she may choose to
download the data directly from the data providers or from the GBIF index (faster), and the format
of the downloaded file. There is also a map illustration of the distribution of the requested records
that can be produced dynamically.
CRIA developed a fully distributed system. When a query is processed it is sent out to the
providers that search the databases and dynamically send the results. At the moment, the
speciesLink Network has 6 regional servers (mirroring data from 38 collections) 2 collections with
dynamic links, one centralized database with observation data (at CRIA), and one centralized
information system of microbial collections (with 9 collections). This architecture is interesting for
advanced users that can search any field and retrieve the full data set as a file. Speed and the
“fragility” of the network is a disadvantage. If a server for any reason is off line, that “branch” of the
network will be unavailable. Maps are also produced dynamically.
CRIA also developed an indexing service of a subset of the data which is used for data cleaning.
At the moment CRIA is thinking in providing the user with the possibility of searching its index for
the data subset to provide faster results and a more stable system. But the distributed search
system will continue to be offered as we believe it is very powerful and important to advanced
users.
Based on the Internet connectivity study that was carried out for Latin America (RedCLARA) one
can see that some links of Amazonian countries are still not in place. At the same time, it is
important to develop a truly distributed network helping countries “catch up” with both, the
technology and infrastructure. For this reason we believe that the ABBIF portal should have both,
an index system that will harvest data from all regional providers, and a distributed search service.
34
The index system will be used to quickly serve a data subset to users and for data cleaning. The
dynamic search system will be available for advanced users.
Resource Registry & Discovery
In a distributed environment with many data providers it is desirable to have a central registry
defining at the software level who are the network participants and how to interact with them
(species-level services might use different protocols from specimen-level services, and even
specimen-level services could potentially use different protocols among them or different protocol
versions). As the number of data providers may increase over time, a means for automatic
discovery will certainly be necessary. GBIF’s UDDI registry seems to be the most reasonable
alternative since it is already available for the whole biodiversity community and ABBIF resources
should also be integrated with the GBIF network. UDDI offers a simple mechanism to enable
configuration of thematic networks (through service categories) that could easily be used to
distinguish ABBIF participants from other resources.
Tools
Another important activity is the development of tools for data providers and users. These tools
should be preferably developed as web services to be able to used more freely at all levels local,
country, and regional.
Data archive
As a last element of the network, it would also be important to address the problem of long term
data archiving. This may also be a task for country data custodians or their partners. It is important
that the scientific council discusses this issue to determine priorities as to what data should be
added to a permanent archive and identify an institution or a pool of institutions responsible for this
activity.
Figure 21 below presents a diagram of the system.
Web services
•Maps
•Modeling
•Data cleaning portal
•automatic
georeferencing
•Other services
regional server data commons data commons
space space
biological collections observation data taxonomic data
long term data archive
Figure 21. Diagram of the system
35
Annex 1: Answers from Collections of Colombia
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Herbario Amazónico COAH Instituto Amazónico Plants 107.150 1 160.725 1
Colombiano COAH de Investigaciones
Científicas - SINCHI
CALT CAL Animals 18.464
Instituto Alexander von IAvH Instituto de Animals 349.054 Birds and butterflies: Birds and Birds: 1817; Birds and butterflies: Birds and
Humboldt Investigación de 100%; other insecta: butterflies: butterflies: 100%; other insecta: butterflies:
Recursos Biológicos 50%; mammals: 10%; 100%; other 550; fishes: 50%; mammals: 10%; 100%; other
"Alexander von amphibia and reptilia: insecta: 50%; 730 coll.; amphibia and reptilia: insecta: 50%;
Humboldt" 0% mammals: amphibia, 0% mammals:
10%; amphibia mammals and 10%; amphibia
and reptilia: 0% reptilia: and reptilia: 0%
pending
Herbario Federico Medem FMB Instituto de Plants 159.000 0,85 0,5 8.742 100 100
IAvH Investigación de
Recursos Biológicos
"Alexander von
Humboldt"
Colección de Zoología ICN Instituto de Ciencias Animals 948.368 Birds: 100%; Amphibia:
Naturales amphibia: 80%; 4000
Universidad Nacional general: 60%
de Colombia
Herbario Nacional COL Instituto de Ciencias Plants 1.180.000 0,1 0,2 Many spec.
Colombiano Naturales Quantity not
Universidad Nacional specified
de Colombia
Museo Micológico - Hongos MMUNM Universidad Nacional Microorganisms 2.970
fitoparásitos de Colombia
Museo Entomológico MEFLG Universidad Nacional Animals 342.340 0,1
"Francisco Luis Gallego" de Colombia sede
Medellín
Herbario Pontificia HPUJ Pontificia Plants 71.000 0,375
Universidad Javeriana Universidad
Javeriana
Museo Javeriano de Historia MPUJ Pontificia Animals 2.864.808 72.5% of the 1
Natural Lorenzo Uribe s.j Universidad amphibia, 92.2% of
Javeriana the fishes, 94.7% of
the reptilia, 96.7% of
the birds, 80.7% of
mammals and a low -
not specified-
percentage of the
Orthoptera
Herbario Nacional de HNM Corporación Plants 1.420 0,05
Malezas Colombiana de
Investigación
Agropecuaria
Corpoica
36
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Herbario Gabriel Gutierrez MEDEL Universidad Nacional Plants, 100.960 0,5
Villegas (MEDEL) de Colombia sede Microorganisms
Medellín
Herbario de la Orinoquía Llanos Universidad de los Plants 19.320 0
Colombiana Llanos
Herbario Ciat CIAT Centro Internacional Plants 32.018 0,98
de Agricultura
Tropical - CIAT
Jardín Botánico José JBJCM Jardín Botánico de Plants 7.036 100% of the
Celestino Mutis Bogotá J.C.M. Herbarium and
20% of fruit
collection
Herbario Forestal UDBC Universidad Distrital Plants 35.000 0,9
Universidad Distrital Francisco José de
Francisco José de Caldas Caldas
Herbario Universidad de HUA Universidad de Plants 138.000 0,5 0,38 20% aprox.
Antioquia Antioquia
Herbario José Cuatrecasas VALLE Universidad Nacional Plants 37.438 1
Arumi (VALLE) de Colombia Sede
Palmira
Herbario Jardín Botánico JAUM Fundación Jardín Plants 118.276 0,5 33772 Unknown
"Joaquín Antonio Uribe" Botánico Joaquín
Antonio Uribe
Herbario CUVC CUVC Universidad del Valle Plants 106.800 0,25
Colección Laboratorio de CLUA Laboratorio de Animals 5.692 0,5
Limnología Universidad de Limnología -
Antioquia Universidad de
Antioquia
Colección Entomológica CEUA Laboratorio Animals 86.498 1
Universidad de Antioquia Colecciones
Entomológicas -
Universidad de
Antioquia
Vectores y Huéspedes VHET Universidad de Animals 25.470 0,5
Intermediarios de Antioquia
Enfermedades Tropicales
Museo del laboratorio de MENT-UT Universidad del Animals 24.646
Entomología Tolima
Laboratorio de Investigación LABUN Universidad Nacional Animals 40.080 0,15
de Abejas - Labun de Colombia
Museo de Historia Natural MHN-UC Universidad del Plants 83.742 0,6
Universidad del Cauca Cauca
Entomológica Forestal EF-UDFJC Universidad Distrital Animals 6.200 1
Universidad Distrital Francisco José de
Francisco José de Caldas Caldas
Museo Historia Natural MUD Universidad Distrital Plants, Animals 3.274 1
Universidad Distrital Proyecto Curricular
Licenciatura Biología
Colección de Artrópodos de UVS Universidad del Valle Animals 10.720 0 0 2.000 0 0
37
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Importancia Médica , Facultad de Salud
Museo de Historia Natural MHNUPN Universidad Animals 58.378
Universidad Pedagógica Pedagógica Nacional
Nacional
Colección Biológica U.D.C.A. UDCA Corporación Plants 22.018 1
Universitaria de
Ciencias Aplicadas y
Ambientales,
U.D.C.A.
Colección Zoológica de IMCN Instituto para la Animals 40.448 0,85
Referencia Científica "IMCN" Investigación y
Preservación del
Patrimonio Cultural y
Natural del Valle del
Cauca- INCIVA
Colección de Insectos CIACIB Corporación para Animals 29.246
Acuáticos de Colombia CIB Investigaciones
Biológicas (CIB)
Colección de Mosquitos de CMCCIB Corporación para Animals 8.882
Colombia (CIB) Investigaciones
Biológicas (CIB)
Instituto Nacional de Salud INS Instituto Nacional de Animals 21.088 0,5
Salud
Colección Taxonómica CTNI Corporación Animals 37.568 0,2
Nacional de Insectos "Luis Colombiana de
María Murillo" Investigación
Agropecuaria
Corpoica
Museo Entomológico ME"MB" Federación Nacional Animals 17.070 0,9
"Marcial Benavides" de Cafeteros-
Cenicafé
Colección Familia CFC Animals 9.000 0
Constantino-CFC
Colección Entomología: PC Animals 104
Hemiptera Acuáticos
Hans W. Dahners PETALUDA Animals 14.200 1
Colección Piéridos de CPCRTN Animals 1.834
Colombia Rodrigo Torres
Nuñez
Colección Jean Francois le JFLC Animals 60.000
Crom
Colección Personal Angela CPAA Animals 8.200
Amarillo
Colección Personal Carlos CPCS Animals 3.200
Sarmiento
Colección "Da Ros" C"DR" Fundación Ciencia Animals 4.980 0,4
Ecología, Arte e
Historia Fundación
C.E.A.H. (Museo
38
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Vittoriano)
Colección Entomológica CEUNP Universidad Nacional Animals 57.400 1
Universidad Nacional Sede de Colombia sede
Palmira Palmira
Colección de Insectos CONIF Corporacion Animals 62.800 0,9
asociados a plantaciones Nacional de
forestales de Colombia Investigación y
Fomento Forestal -
CONIF
Herbario TULV - Jardín TULV Instituto para la Plants 34.000 0,45 1
Botánico Juan María Investigación y
Céspedes Preservación del
Patrimonio Cultural y
Natural del Valle del
Cauca- INCIVA
Serpentario de la SUA Universidad de Animals 5.064
Universidad de Antioquia Antioquia
Museo de Historia Natural UPTC Universidad Animals 2.202
"Luis Gonzalo Andrade" Pedagógica y
Tecnológica de
Colombia, Facultad
de Ciencias, Escuela
de Ciencias
Biológicas
Museo de Entomología de la MUSENUV Universidad del Valle Animals 101.864 occasional
Universidad del Valle
Vertebrados-Aves Universidad del Valle Animals 12.914 1 occasional
- Biología
Museo Entomológico UNAB Universidad Nacional Animals 200
Facultad de Agronomía de Colombia
Colección de vertebrados, UV-C Universidad del Valle Animals 28.774 occasional
anfibios y reptiles - Biología
Herbario "Armando Dugand DUGAND Universidad del Plants 6.070
Gnecco" Atlántico
Colección Efraín Henao CEH Animals 10.760 0,1
Colección de Vertebrados e MHN-Uca Universidad de Animals 13.416
Invertebrados Caldas
Colección Familia Pardo CFPL Animals 11.100
Locarno
Vertebrados e Invertebrados MHNCC Comunidad Animals 10.360 0,9
Hermanos Maristas
Banco de Cepas y Genes, IBUN Instituto de Plants, 3.689 0,48
Instituto de Biotecnología, Biotecnología, Microorganisms
Universidad Nacional de Universidad Nacional
Colombia de Colombia
Colección Microorganismos CEN Federación Nacional Microorganisms 689 0,5
de CENICAFE de Cafeteros -
Centro Nacional de
Investigaciones de
39
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Café - CENICAFE
Colección de Referencia CRM-UV Universidad del Valle Animals 13.060
(Moluscos) - Biología
Procedencias de FCISSPA Fundación Centro Plants 98 0,9
Trichanthera Gigantea (H. & para la Investigación
B.) Nees en Sistemas
Sostenibles de
Producción
Agropecuaria- CIPAV
Jardín Botánico Juan María JBJMC Instituto para la Plants 8.824 0,5
Cespedes Investigación y
Preservación del
Patrimonio Cultural y
Natural del Valle del
Cauca- INCIVA
Fundación Zoológico FZS Fundación Zoológico Animals 728
Santacruz Santacruz
Piscilago Zoo PZ Caja Colombiana de Animals 1.558 0,5
Subsidio Familiar -
Colsubsidio
Jardín Botánico "Alejandro JBAVH Universidad del Plants 1.346 0,1
von Humboldt" Tolima
Hongos, Univalle UV-mico Universidad del Valle Microorganisms 3.740
- Facultad de Salud
Colección de M-UBCB Corporación para Microorganisms 4.069 1
Microorganismos Investigaciones
Biológicas (CIB)
Secretaria de Agricultura - SA.A Departamento de Animals 9.854 0,2
Antioquia Antioquia
Parque Zoológico Santa Fe PZSF Sociedad de Mejoras Animals 3.046 0,5
Públicas de Medellín
Jardín Botánico "Joaquín JAUM -JB Fundación Jardín Plants 19.000 0,9
Antonio Uribe" Botánico "Joaquín
Antonio Uribe"
Colección de Ciencias MUA Universidad de Animals 33.224 0,3
Naturales Antioquia - Museo
Universitario
Xiloteca X-UNCM Universidad Nacional Plants 5.836 1
de Colombia sede
Medellín
Zoológico de Barranquilla ZOOBAQ Fundación Botánica Animals 848 1
y Zoológica de
Barranquilla
Banco de Germoplasma de CCoM Corporación Microorganisms 4.555
Microorganismos de Interés Colombiana de
en Agricultura Investigación
Agropecuaria
Corpoica
Jardín Botánico José JBJCM Jardín Botánico de Plants 31.200 0,57
40
Collection name Acronym Institution General Group No. total % Georreferenced % Digitalized No. records % Georreferenced % Digitalized
records Amazon Amazon Amazon
Celestino Mutis Bogotá J.C.M.
Colección de Microbiología - CIMIC Universidad de los Microorganisms 278 0,7
CIMIC Andes- Centro de
Investigaciones
Biológicas - CIMIC
Museo de la Salle M.L.S. EN Congregación Plants, Animals 164.370 0,25
ZOOLOGIA - Hermanos Escuelas
B.O.G EN Cristianas
BOTANICA
Jardín Botánico de Popayan JBP Fundación Plants 2.028 0,3
Universitaria de
Popayán
Cepario Corpogen CG Corporción Corpogen Microorganisms 3.634 0,07
Colección Malacofaunica UMNG-MT Universidad Militar Animals 5.480 1
Terrestre de la Facultad de Nueva Granada
Ciencias de la UNMG
Colección Entomológica de UMNG-Ins Universidad Militar Animals 6.148 1
la Facultad de Ciencias de la Nueva Granada
Universidad Militar Nueva
Granada
Zoólogico de Cali FZC Fundación Zoológica Animals 3.818 1
de Cali
Fundación Centro de FCP Fundación Centro de Animals 682 1
Primates Primates, FUCEP
Museo Entomológico Piedras MEPB Caja de Animals 16.000 1
Blancas Compensación
Familiar -
COMFENALCO
ANTIOQUIA
Colección Viva Programa de COLVIOFAR Universidad de Animals 520 1
Ofidismo/Aracnidismo Antioquia
Universidad de Antioquía:
Ofidios - Reptiles
Colección Viva Programa de Animals 236 1
Ofidismo/Aracnidismo
Universidad de Antioquía:
Escorpiones - Artropodos
Jardín Botánico de Plantas JB-Medicinales - Corpoamazonia Plants 1.028 0,97
Medicinales del C.E.A. CEA
Museo de Historia Natural UAM Universidad de la Animals 396 0,3
Universidad de la Amazonía Amazonía
Colección de Insectos ICQ Universidad del Animals 6.010 0,3
Universidad del Quindio Quindío
Colección Zoológica Viva CEBTRF Centro Estación de Animals 616 1
Centro Estación de Biología Biología Tropical
Tropical "Roberto Franco" "Roberto Franco"
CEBTRF Facultad de
Ciencias,
Universidad Nacional
41
Collection name Acronym Observational Data Data available Online? Standards Hardware Staff Internet Access Acces to
records sistematization URL & and Software Information
Protocols
Herbario Amazónico COAH ACCES 97, http://www.sinchi.org.co/herb Compatible
Colombiano COAH ARC-VIEW 3.2 ario.php?page=servicios&op con
CDS-ISIS ver. cion=herbario&subopcion=c estándar
3.07 oleccion RRBB SIB
CALT CAL Biota
Instituto Alexander IAvH Sistematized at Access-VBA , Butterflies: Standard insufficient dedicated Unrestricted
von Humboldt the same databse SQLServer. http://www.siac.net.co/sib_d for (except for
Fishes and escargas.php?ArchivoDespl documment sensible data
insecta (except egado=635 ation of for
butterflies) still biological endangered
in Excel and records, spp.)
may be version 5.0,
incorporated XML
soon
Herbario Federico FMB Access-VBA, Not yet. Soon at Standard insufficient dedicated Unrestricted
Medem IAvH SQL Server www.siac.net.co/sib for (except for
documment sensible data
ation of for
biological endangered
records, spp.)
version 5.0,
XML
Colección de ICN Spica dedicated restricted
Zoología
Herbario Nacional COL Spica http://aplicaciones.virtual.un Compatible Understaffe dedicated unrestricted
Colombiano al.edu.co/colecciones/datos/ with d. Lacking
herbario/consultasHerbario.j standard professional
sp RRBB SIB s and
auxiliars to
process the
data.
Investments
needed
Museo Micológico - MMUNM Data not
Hongos fitoparásitos systematized
Museo Entomológico MEFLG Specify http://www.unalmed.edu.co/
"Francisco Luis %7Ementomol/
Gallego"
Herbario Pontificia HPUJ Excel Require tools Scarce restricted
Universidad for economic
Javeriana systematizatio resources.
n not defined Curators
as yet needing
more time
Museo Javeriano de MPUJ Excel and
Historia Natural ArcView
Lorenzo Uribe s.j
42
Herbario Nacional de HNM Not specified
Malezas
Herbario Gabriel MEDEL FoxPro 4.0;
Gutierrez Villegas BRAHMS 5 to
(MEDEL) be implemented
Herbario de la Llanos
Orinoquía
Colombiana
Herbario Ciat CIAT Oracle
Jardín Botánico José JBJCM Access http://www.jbb.gov.co/web/h
Celestino Mutis ome.php?pag=products
Herbario Forestal UDBC Acces Plattform
Universidad Distrital 50% and Arc
Francisco José de View 3.2 (12000
Caldas spec. in 6
books)
Herbario Universidad HUA Excel Compatible adequate insufficient
de Antioquia with
standard
RRBB SIB
Herbario José VALLE Access
Cuatrecasas Arumi
(VALLE)
Herbario Jardín JAUM Arkas, Excel, adequate insufficient 100 Kbps
Botánico "Joaquín Biotica
Antonio Uribe"
Herbario CUVC CUVC Excel
Colección Laboratorio CLUA Excel
de Limnología
Universidad de
Antioquia
Colección CEUA Excel
Entomológica
Universidad de
Antioquia
Vectores y VHET Excel, Word
Huéspedes
Intermediarios de
Enfermedades
Tropicales
Museo del laboratorio MENT-UT In progress
de Entomología
Laboratorio de LABUN FileMaker,
Investigación de Excel
Abejas - Labun
Museo de Historia MHN-UC Excel
Natural Universidad
del Cauca
Entomológica EF-UDFJC Excel
Forestal Universidad
Distrital Francisco
43
José de Caldas
Museo Historia MUD Excel
Natural Universidad
Distrital
Colección de UVS Birds: No PC with Staff to unrestricted
Artrópodos de Threskiornithidae: adequate digitalize
Importancia Médica Mesembrinibis programs the data
cayennensis in
Putumayo
Museo de Historia MHNUPN The manager
Natural Universidad collection
Pedagógica Nacional
Colección Biológica UDCA Excel
U.D.C.A.
Colección Zoológica IMCN Excel Standard insufficient insufficient unrestricted
de Referencia RRBB SIB
Científica "IMCN"
Colección de Insectos CIACIB FileMaker
Acuáticos de (MAC)
Colombia CIB
Colección de CMCCIB FileMaker
Mosquitos de (MAC)
Colombia (CIB)
Instituto Nacional de INS Access
Salud
Colección CTNI FoxPro
Taxonómica Nacional
de Insectos "Luis
María Murillo"
Museo Entomológico ME"MB" Excel
"Marcial Benavides"
Colección Familia CFC Catalog, listing
Constantino-CFC by families
Colección PC
Entomología:
Hemiptera Acuáticos
Hans W. Dahners PETALUDA Excel
Colección Piéridos de CPCRTN The manager
Colombia Rodrigo collection
Torres Nuñez
Colección Jean JFLC
Francois le Crom
Colección Personal CPAA
Angela Amarillo
Colección Personal CPCS
Carlos Sarmiento
Colección "Da Ros" C"DR" Acces, Word,
Excel
Colección CEUNP Access
Entomológica
Universidad Nacional
44
Sede Palmira
Colección de Insectos CONIF Shared
asociados a database
plantaciones
forestales de
Colombia
Herbario TULV - TULV Access
Jardín Botánico Juan
María Céspedes
Serpentario de la SUA Acces 100%
Universidad de
Antioquia
Museo de Historia UPTC
Natural "Luis Gonzalo
Andrade"
Museo de MUSENUV Arkas 2000
Entomología de la
Universidad del Valle
Vertebrados-Aves Access
Museo Entomológico UNAB
Facultad de
Agronomía
Colección de UV-C Arkas, amphibia
vertebrados, anfibios 70%, Reptilia
y reptiles 40%
Herbario "Armando DUGAND Access
Dugand Gnecco"
Colección Efraín CEH Winissis
Henao
Colección de MHN-Uca
Vertebrados e
Invertebrados
Colección Familia CFPL
Pardo Locarno
Vertebrados e MHNCC Excel
Invertebrados
Banco de Cepas y IBUN Access
Genes, Instituto de
Biotecnología,
Universidad Nacional
de Colombia
Colección CEN Excel
Microorganismos de
CENICAFE
Colección de CRM-UV Excel
Referencia
(Moluscos)
Procedencias de FCISSPA Excel
Trichanthera
Gigantea (H. & B.)
Nees
45
Jardín Botánico Juan JBJMC BGRecorder
María Cespedes
Fundación Zoológico FZS ICOZOO
Santacruz (clínical stories )
Piscilago Zoo PZ Excel, ICOZOO
Jardín Botánico JBAVH Access
"Alejandro von
Humboldt"
Hongos, Univalle UV-mico
Colección de M-UBCB Access
Microorganismos
Secretaria de SA.A Excel
Agricultura -
Antioquia
Parque Zoológico PZSF Excel 40%,
Santa Fe Word 10%
Jardín Botánico JAUM -JB BGRecorder
"Joaquín Antonio
Uribe"
Colección de MUA Not specified
Ciencias Naturales
Xiloteca X-UNCM Access
Zoológico de ZOOBAQ ARKS 4.0-ISIS:
Barranquilla records;
ZOOTRITION:
nutritional
information;
ICOZOO:
medical reports
Banco de CCoM Excel and
Germoplasma de Access
Microorganismos de
Interés en Agricultura
Jardín Botánico José JBJCM BGRecorder
Celestino Mutis
Colección de CIMIC Word
Microbiología - CIMIC
Museo de la Salle M.L.S. EN Access
ZOOLOGIA -
B.O.G EN
BOTANICA
Jardín Botánico de JBP Excell, Acces,
Popayan BG recorder 2.
J.B.P.
Cepario Corpogen CG Filemaker
Colección UMNG-MT Excel
Malacofaunica
Terrestre de la
Facultad de Ciencias
de la UNMG
Colección UMNG-Ins Excel
46
Entomológica de la
Facultad de Ciencias
de la Universidad
Militar Nueva
Granada
Zoólogico de Cali FZC ARKS-ISIS and
others to be
implemented
Fundación Centro de FCP Excel
Primates
Museo Entomológico MEPB Excel
Piedras Blancas
Colección Viva COLVIOFAR
Programa de
Ofidismo/Aracnidismo
Universidad de
Antioquía: Ofidios -
Reptiles
Colección Viva
Programa de
Ofidismo/Aracnidismo
Universidad de
Antioquía:
Escorpiones -
Artropodos
Jardín Botánico de JB- BGRecorder
Plantas Medicinales Medicinales
del C.E.A. -CEA
Museo de Historia UAM Access, Excel
Natural Universidad
de la Amazonía
Colección de Insectos ICQ Access
Universidad del
Quindio
Colección Zoológica CEBTRF Tracker Ce
Viva Centro Estación Brain Id
de Biología Tropical (includes a tool
"Roberto Franco" for automatic
CEBTRF identification)
100%
47
Shared by: Jun Wang
About
Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail vixychina@gmail.com.Thank you!
Related docs
Other docs by hcj