GISIN background document—summary of online e-discussions Edited and reorganized... in an attempt to group relevant points that have been covered. DATABASE FIELDS: A STANDARD SET? NIS-BASE AS A MODEL I3N AND GISD GBIF’S ROLE WEB SERVICES AND SOAP DARWIN CORE AND SPECIES ANALYST DATABASE SOFTWARE PACKAGES LATITUDE/LONGITUDE OR GAZETEER? DEFINITION OF INVASIVE SPECIES REGULATORY FRAMEWORK NEEDED WHO WILL USE GISIN DATA AND WHAT FOR? CDs, LOW BANDWIDTH, AND ACCESSIBILITY ISSUES TAXONOMIC AUTHORITY(IES) LANGUAGE STANDARDS DATABASE FIELDS: A STANDARD SET? LIZ SELLERS: In an article published in the journal BioScience (2000) 50(3): 239-244, Ricciardi, et. al., outlined a list of key information (fields) that should be included in a standardized IAS database. In 2002, the Conference of the Parties to the Convention on Biological Diversity discussed "possible formats, protocols and standards for improved exchange of biodiversity-related data." In order to collect quality global IAS data, provide global data-access to professionals, and effective database linkage and collaboration, it is widely agreed that a minimum group of standard data fields must be identified. These fields must then be populated with data in a form that is species and database independent, and that does not exclude the participation of contributors, or users in the global IAS information system. What basic database fields should be included in an IAS database standard set? Ricciardi et. al., provided "key information" covering - Diagnostics, Distribution, Basic Biology, Dispersal, Impacts, Biotic Associations, Modes of Dispersal, Control Methods, Bibliographies and Expert Contact Information. Geospatial data has risen to the forefront of data collection as predictive and modeling technologies gain popularity and discover new applications in the natural world--and especially in relation to IAS population and invasion ecology. Should we therefore include geographic coordinates in the standard set of IAS Database fields? MICHAEL BROWNE: Yes, at the meeting we should focus on minimum data required for the different data types. The document "GISD Database Elements" describes minimum data required for lists of species, species fact sheets, distribution data, eradication projects, and pathways and dispersal data. Please bear in mind that the GISD's goals and data requirements differ somewhat from those of collection databases. CHARLOTTE CAUSTON: It would be particularly useful if the IAS database fields are compatible with questions and criteria used in predictive methodologies/risk assessments that determine the potential of a species to be introduced into a country, the potential invasiveness of a species once it has established and the feasibility of control/containment. Weed and pest RAs [risk assessments] from Australia, NZ and Galapagos are useful sources for this type of information as they also consider the impact of the species on the environment. [See Table, “Desirable Database Elements” fields comparison] BRIAN STEVES: As a next step, assuming we can agree about the use of XML as an exchange protocol, I propose we consider Bob Morris' earlier comment that "Indeed the problem is reduced to writing an XML Schema" and try to work towards that. In this case however, we might need to consider a wide variety of schemas for various IAS topics (species summaries, regional species lists, species observations, management efforts, etc..). In the case of species occurrence data for IAS, we should consider whether we can use the DiGIR protocol and an extension/subset of the Darwin Core. The addition of a few key fields to indicate whether a species at a particular location is considered native or not, a potential pathway, and whether or not this species observation represents a viable population would make a DiGIR system a very useful tool for most of us. DiGIR is also flexible enough that we could develop our own federated schema's for some of the other topics beyond occurrence data. MICHAEL BROWNE: As Brian points out, a variety of different types of data are created depending on the intentions, topics and resources of the agency or person doing the collecting. The most basic IAS data collected are the names of invasive species in a country, region or location. I believe that a record should contain the following elements as a minimum: species name, location, biostatus (see below) and documentation. Should status (native/alien) be a core element? Additional requirements should be handled at 'lower' levels because there is so much variation. If we can agree on a minimum record, the next steps are to propose good ways of recording these elements and, as Charlotte Causton says, to agree on definitions in order to avoid confusion. Then on to XML if that is the preferred option. Occurrence 1.0 Absent 1.1 Recorded in error 1.2 Extinct 1.3 Eradicated 1.4 Border intercept 2.0 Reported 2.1 Established 2.2 Established and expanding 2.3 Established and stable 2.4 In captivity/cultivated 2.5 Sometimes present 2.6 Present/controlled 3.0 Uncertain Status 1.0 Alien 2.0 Native 2.1 Native - Endemic 2.2 Native - Non-endemic 3.0 Not specified 4.0 Biostatus uncertain Invasiveness 1.0 Invasive 2.0 Not invasive 3.0 Not specified 4.0 Uncertain BRIAN STEVES: I like the apparent heirarchy of the vocabularies Michael has presented here. It will allow us to compare between those who report at a coarse level (e.g. Occurence = "Absent") with those who report at a fine level (e.g. Occurence = "Eradicated") by knowing the relationship between the two terms. Where would these standardized vocabularies reside? Is there a centralized location they can be posted? We can also incorporated these lists into our XML schemas as they are created to promote their use. ANGELA SUAREZ-MAYORGA: I understand we agree on „the minimum‟ like who is where (and maybe when). But I‟m in doubt about if we can define minimal contents for a global database in the way we are doing (maybe that way is too weak to support data from multiple sources, as Bob says). Talking about other core elements, I too think that the way to define contents (no matter if they are the minimum or not) is much like the one Michael used. Just for example: our idea in the information system I represent is to obtain the terms from authority files (controlled vocabularies) or thesaurus. Thesaurus is, at the same time, a reference data set for the system, where we can know the relationships between terms just following the hierarchical structure they bear. It makes possible to describe every field of interest as detailed as it is required. A very useful tool also is to link the thesaurus with a glossary; to be sure we all understand the same meaning of the term. By the way, we defined for plants a level called „naturalized‟ that identifies well established invasive species that have viable wild populations --level 2.3 of Occurrence in Michael‟s schema --. However, I think that level (same as 2.1 and 2.2) is describing status, not occurrence. BRIAN STEVES: If we keep the minimal required set of elements short enough and the controlled vocabularies with general enough options we should be able to do better than just defining "what, where, and when". If this is all we can agree upon, why don't we just adopt an unmodified Darwin Core as our standard? What we really want is to develop something that's a bit more specific to IAS. A nice thing about Michael's authority files is that he gives an easy solution for people who don't have the answers to the extra IAS specific fields... they can simply put down "Uncertain". I would however agree that combined terms like "Established and Stable" can be better represented by separate fields. In the case of the datasets I work with, occurrence terms like "Established" and "Absent" are in one field while range status terms like “Stable”, “Expanding”, and “Declining” are in another. One concern with this sort of splitting is that it introduces a new potential error into our system; records with incompatible values for separate fields. PANKAJ OUDHIA: If possible, the section on 'visitors comment' may be added at the end of the standard set. I think it can add the new information automatically. While visiting many databases when I have to add some additional information, in general I find no such section. The visitors comments may be approved by the moderators. In general impact part as sub heading 'Allelopathic impact' may be added, because the impact can be described more scientifically through Allelopathy. To make the database more useful the emphasis on local names must be given, as the language or name of plant changes in every mile. NIS-BASE AS A MODEL LIZ SELLERS: The newly developed “NISBase” online at http://invasions.si.edu/nemesis/merge/SpSearch.jsp is a distributed database system or portal developed by the Smithsonian Environmental Research Center (SERC) to provide “simultaneous search access to multiple invasive species databases”. There are currently 5 databases included in the system. In this forum, NISBase will be examined as an example of a distributed database system for IAS information databases. ANNIE SIMPSON: If participants choose to accept the NISbase model and make their database compliant with that system, these are the requirements (according to a draft document posted by Brian Steves, accessible in the Data Standards & Formats / Documents / NISbase Documentation): NIS information 1. A web server with the ability to dynamically create html and xml through some scripting language (php, asp, perl, jsp, coldfusion, etc..) 2. A database that the dynamic webpages can draw data from (ms-sql, access, mysql, oracle, postgress, etc..) 3. Ability to query database with the current NISbase search criteria a. Taxonomic Group b. Genus c. Species d. Common Name 4. Ability to limit the returned result set based on the record limit parameter. 5. Ability to return query results in XML following the NISbase format. -See http://invasions.si.edu/nemesis/SpQueryResults.dtd 6. A static IP address on the server 7. Creation of metadata for your database provider 8. Acceptance by the NISbase charter members a. Verification that other provider requirements (1-8) have been successfully meet b. Acceptable of server response time c. Acceptable content This is one model that has been suggested as an integrated database solution, where you have your own database that you continue to manage and populate, but it would be cross searchable through the integrative interface of NISbase. STEVE CITRON-POUSTY: It seems like these fields are aimed at a general description of invasives not for field or herbarium specimens. Is there a proposed standard for minimum record here? IPANE data could provide some of this data but it‟s too general for most of our data which is about occurrences. Other minimum fields for field records might be: genus species abundance certaintity of Identification certifying authority data set name (which references a data set record) We are currently working on this minimum set for our DB so that we can accept and import records from other DBs containing invasives records in New England. I will report back as we get closer on our minimum set. It might help discussion and we would love the feedback. Just had a thought that perhaps there could be different "levels" of this species occurence data. We have different standards for herbarium specimens and for field forms. Perhaps we might want a field or a different profile of XML for each. BOB MORRIS: NISbase may be too weak to support integration of data from multiple sources. For example, it would be impossible for an application to determine whether two "Factsheet"s represent the same or different concepts. With reference to the Steves document and the "basic" results described by its return schema SpQueryResults.dtd: Beyond its simple taxonomic data, NISbase appears content to return URLs to other web sites, rather than actual XML, or protocols and parameters by which those sites can be induced to return XML. To this extent it is dedicated to human-centric applications such as the nemesis application at http://invasions.si.edu/nemesis. With the data returned per SpQueryResults.dtd, one can not expect that applications can integrate the returns from several NISbase providers since there is not(?) any provision for discovering the schema for the results of the embedded URLs (if they even return XML at all, which is probably not guaranteed). Is there some documentation of the Advanced Provider Implementation mentioned in the document? The Steves document suggests that NISbase may adopt DiGIR. May we know what is the status of that consideration, and---in technical terms (perhaps best discussed in a different thread)---what is the proposed integration schema? BRIAN STEVES: While I'll admit that NISbase is fairly simple in its current design, I prefer to dwell on the strengths of what it can do for us now, rather than on what it doesn't do for us yet. For one it is here today, it is running, and it does work. It allows us to select and search on multiple databases located around the world from a single portal and return a single species list of results with links to further information. After attending many of the previous workshops held on IAS databases and hearing about the promise of XML, I'm happy to report here is a system that is actually using it. Its current simplicity can also be considered one of its strengths in that it allows for rapid integration of new data providers. Like myself, most of the IAS web/data managers I know are biologists first and programmers second. With this in mind, I tried to design NISbase so that it could be quickly implemented using existing technical skills and data sets. Many of my colleagues with databases on the web are capable of scripting pages (using asp, jsp, php, perl, cold fusion, etc) that can search their database and return a table of results. These same skills can easily be used to modify their scripts to output a simple but standardized XML result set. For NISbase, this result set is the list of species from their database that matches the search parameters and includes URLs to the species summaries (fact sheets) and collections records that they've already placed on the their websites either dynamically or statically. As for the advanced NISbase implementation I mentioned in my documentation, that's still under development at this time. I will tell you that I've been working on creating an XML schema for a standardized species summary page, adding more searchable parameters to the current system, and attempting to modify and use DiGIR for IAS information. At the moment, DiGIR seems to be my most promising path to NISbase improvement. Currently I've managed to generate drafts of two DiGIR conceptual schemas for IAS information. The first draft schema is an extension of Darwin Core that adds IAS specific fields (pathway, status, and occurrence). The second schema is designed to handle information similar to the current implementation of NISbase (returning species list with links to further existing information). I have even managed to create a few DiGIR test providers and set up a DiGIR portal to use both of these schemas. Right now this system is running locally on my desktop computer, but it seems to work fairly well. LIZ SELLERS: What are the techology (hardware/software) and connectivity issues to be considered when planning and maintaining a distributed IAS database system with the goal of providing online-accessible information about IAS? Can we build on the NISBase model? V. PANOV: In general the online distributed database system is a good concept, and NISBase model can be considered as one of the approaches during development of the regional and subregional information hubs. I believe that this approach will be most effective on the subregional level (in such region as Europe it could be Nordic/Baltic, Ponto-Caspian, Mediterranean and other subregions). In the Baltic Sea area we are working on developing of this approach in frameworks of the relevant HELCOM project. I do not believe in effectiveness of one global distributed database system: it should be a network of such systems, or may be even network of networks. I think that developing European information system on IAS will be a network of subregional distributed database systems. BRIAN STEVES: While I agree that we shouldn't have a single global distributed database system (DDS), a heirarchy of networks might pose a few problems. One such problem is how to give proper acknowledgement to each network as it passes the data up the system. If we use a piece of information for a species observation from such a system, do we then have to acknowledge the original observer for the species, the data provider (database that compiled the information), the subregional DDS, the regional DDS, as well as the global DDS? This seems a bit excessive to me. Another concern is whether such a system of multiple DDS would slow down the system. A distributed search on distributed searches of distributed searches could potentially be quite slow. I think a better system would be one in which regional and subregional portals exist alongside other thematic/regional portals to information from a wide array of data providers. Ideally these regional/thematic portals would assist the development of new providers and submit pertinent metadata information to a global registry of such providers. In turn, regional/thematic ports could search for new providers from this registry that they might wish to add to their own portal. Proper acknowledgement to portals would be limited to any value- added products they develop (Maps, reports, modeling outputs, etc) but not for passing data from one DDS to the next. I3N AND GISD MICHAEL BROWN: Database Integration: It seems to me that we can make rapid progress if we use the NISbase as a model for the GISIN (see Brian Steves' document). Are there other models that we should examine? Capacity building: The I3N Cataloguer model can be used to develop a data capture tool with XML output for those at the early stages of database development. What other capacity building products will make the GISIN more useful and where are the models? Synthesis & Outreach: I think that it is reasonable that the GISD (Global Invasive Species Database) be examined as a model to facilitate broad access to IAS information, and the place where one may go to download fact sheets, view images, get assistance with identifications of potential IAS, etc. The GISD echnolog and presents on the Internet (and via a CD-ROM proposed for 2004) information that is currently widely dispersed and difficult to access, presenting a global picture. It addresses gaps (e.g. providing data about IAS that impact regions which currently have little expertise and information available). LIZ SELLERS: IABIN‟s I3N Project offers a cataloguing tool and associated documentation as an assistive technology/resource for those with IAS databases that they wish to serve online, or for those who have IAS information that they wish to migrate to a database format to eventually be served online. For reference, the I3N Project can be found online at http://www.iabin-us.org/projects/i3n/i3n_project.html BADRUL AMIN BHUIYA: In Bangladesh BRGB is preparing a species checklist of all living organisms recorded so far from this region although we have very little expertise and so very scanty information available. Information collected by BRGB will be presented on the internet through our own site, but I suggest that information about IAS of Bangladesh can be digitized by GISD to form GISIN. GBIF’S ROLE BOB MORRIS: GBIF has deployed its prototype portal http://www.gbif.net. See http://www.gbif.org/Stories/STORY1076335311for brief explanation. The current protocols are the DiGIR protocols of TDWG, the Taxonomic Data Working Group with the Darwin Core Metadata standard for collection records, and later with TDWG‟s ABCD Access to Biological Collection Data. For descriptive data, TDWG has just released a draft XML Schema for the Structure of Descriptive Data. See http://184.108.40.206/Projects/TDWG-SDD/ In general tdwg.org has a long history of data exchange standards making, recently with attention to distributed databases. HANNU SAARENMAA: A few words about GBIF‟s possible role. GBIF is trying to provide an information infrastructure for biodiversity data. This infrastructure has components such as portal, providers, and registry. The providers currently only provide “primary biodiversity data” in the Darwin Core format. Darwin Core is good for expressing things like “a species has been found in a certain place at certain time”. In future other formats and types of data will be included. All these types of data an protocols are defined in the registry. The providers advertise their data and services there, with entries like “I provide type A data with protocol B”. The registry is open for anybody to register their provider and any portal and search engine to discover the right providers. Now how does that fit with GISIN? If we look at the data types required (Diagnostics, Distribution, Basic Biology, Dispersal, Impacts, Biotic Associations, Modes of Dispersal, Control Methods, Bibliographies and Expert Contact Information) , we echnol that only Distribution can today be implemented with existing Darwin Core. Diagnostics can soon be covered with SDD. For others we need to select/write data exchange format and protocol, and data provider application. Technically speaking, GBIF could include all these information/provder types in its registry. This would be kind of a global phonebook of available IAS data and information. However, as we are not talking here only about biodiversity data but pest control etc, it might be more appropriate that another registry similar to GBIF‟s for IAS is established. The registries, if made using compatible echnology approaches like UDDI, can share their information where needed, like data on distribution and diagnostics. So, I hesitate to include all these information types in GBIF, but I think GBIF provides a model for an infrastructure that works and the linkages/data flows between GBIF and GISIN should be strong. The GBIF architecture is described at http://circa.gbif.net/irc/DownLoad/kjeFA- J1mmGHrfOtAyTZ74s8jUwq9HoJ/p6hpeSGHkYZQWMiF42pMFYPs7fCtNHv- /GBIFBiodiversityDataArchitecture-v0.7-draft.pdf BOB MORRIS: Some ideas for a technical discussion of exchange protocols, metadata schemas and content schemas. This is meant to center not on what should be represented, but rather how. The discussion is likely to bore anyone who is more inclined to google owl+species than owl+ontology Some sample topics: DiGIR vs SOAP? Does Z39.50 matter? Should the ADL Digital Gazeteer protocol be adopted? Is it a technical or a social issue to suggest that GISIN should just be a component of GBIF? WEB SERVICES AND SOAP STEVE CITRON-POUSTY: I am not sure if those are high requirements or not. It seems to me that this DB interoperability is a perfect scenario for Web services. Perhaps we should try to derive a web service profile for the exchange of invasives data. In this way most people would not have to write the code to write the XML by hand. Just a thought… BOB MORRIS: If you mean Web Services in the sense of WSDL certainly you are correct and this has the biggest chance of cross-platform success. However, more generally, nobody should ever have to write code to generate XML. That is what databinding frameworks like Castor do. See http://www.castor.org/ [Which states: “Castor is an open source data binding framework for Java[tm]. It’s basically the shortest path between Java objects, XML documents and SQL tables. Castor provides Java to XML binding, Java to SQL persistence, and then some more.” Also available for download at this URL.] That said, Web Services frameworks like the open source Apache Axis, and the proprietary Microsoft .NET leave even less work to do to exchange data with SOAP (if you make your NET play nice, which is not actually its default). Indeed the problem is reduced to writing an XML Schema, which is where most communities trying to address this issue nowadays focus their attention. For non-trivial domains, that is a non- trivial task, but one based on standards. STEVE CITRON-POUSTY: SOAP messaging is exactly what I was talking about. I mean this is even a case where UDDI actually makes sense. =) So perhaps at the meeting we should focus on schemas for minimum data required for the different data types (field records, data set records, species records, expert records…?). In this way we don‟t have to worry about what people implement on the backend as we can all send data to each other and do with it what we will. BOB MORRIS: Once a server can emit XML and has a published XML Schema, open source tools make it easy to deploy Web Services based on SOAP. For example, the Apache Axis framework will manage all the infrastructure in ways that allow applications to simply call api‟s to get make connections and fetch data. It‟s particularly easy to write wrappers around any such service. A set of demonstrations I did for a course I teach is described in the READ.ME.txt at http://www.cs.umb.edu/efg/gisin It tells both how to invoke the demos and what their architecture is. The code invoking the Axis APIs is also in that directory. This all means that once again, the big deal is getting the Schema right. Plus, the really interesting thing is: what do you do with the XML once you have it? BRIAN STEVES: When it comes to using Web Services for sharing our IAS information, I‟m a little concerned that we might not be able to get many (if any) IAS data managers to develop, deploy, and consume them at this time. However, I agree this is probably where this group should be headed in the future, so it‟s good that we seem to have a few experts on the topic attending GISIN (maybe we‟ll discover that Web Services will be easier to implement than I think). Until then, I still consider NISbase a positive step forward in achieving our goals of a distributed network of IAS information. DARWIN CORE AND SPECIES ANALYST LIZ SELLERS: Formats recommended by the Conference of the Parties to the Convention on Biological Diversity at the 6th meeting held in April 2002 included: The Dublin Core Metadata Initiative, the Federal Geographic Data Committee (FGDC) – ISO 19115; the FGDC Biological Data Profile; BIB-1, XML as a description echnolo, and HTML 3.1 as a presentation language. How will adherence to these formats effect decisions on collecting, providing and maintaining IAS information in online databases and a Global Invasive Species Information Network? Reference documents loaded in this project: “Scientific and Technical Cooperation and the Clearing- House Mechanism (cop-06-inf-18-en).pdf” and “COP6 Recommended Formats.pdf” BOB MORRIS: The document at the Species Analyst site http://tsadev.speciesanalyst.net/documentation/ow.asp?DarwinCoreV2#h2 is pretty easy to read, but is about a year out of date. It is a good place to for biologists to start and end, and a good place for informaticists to start. See also the corresponding thing at Manis, http://elib.cs.berkeley.edu/manis/darwin2ConceptInfo030315jrw.htm To stay current on DarwinCore and other DiGIR issues, it is better to track http://digir.sourceforge.net and thence to The current DC2 is 1.24. The changes are mostly technical, and you have to look at the DgGIR developer‟s mailing list archive to understand why there is now a Darwin Core and a Darwin Mantle. It doesn‟t matter much except to people trying to implement DiGIR, and the Species Analyst concept document is a good introduction. To me, using DC2 or ABCD as the specimen record schema seems a no-brainer, but it is probably missing a lot of what is important about invasive species, as distinct from just species. One important thing is that in its present form, DC2 is rather bound to the DiGIR protocol schema (which describes how DiGIR queries are made). I find that regrettable, but probably not fatal. When GBIF gets around to adding a SOAP interface---which is a stated aim of Donald Hobern, the GBIF Data Access archictecture program---I guess there will need to be a cleaner separation. Also, since the BioCase ABCD gang have a small technical problem with this tie, maybe the separation will come sooner. DAVE VIEGLAS: A SOAP interface to DiGIR / Darwin Core is currently in development as part of the SEEK project (http://seek.ecoinformatics.org) and currently exists in prototype form. BOB MORRIS: Dublin Core was designed for the kinds of things that concern library collection management. Bad choice for metadata about organisms and the things that describe them. Googling “FGDC” provokes 234,000 responses; googling “XML” provokes 32,500,000 responses Perhaps more importantly: FGDC+middleware gives 643 googles; XML+middleware gives 291,000 Is there something more to say? ANNIE SIMPSON: Vishwas Chavan gave a presentation at a Regional Meeting in Asia concerning the link between Taxonomy and Invasives, as a representative of the GISP Informatics Working Group. I have uploaded both audio and non-audio versions of his ppt presentation into the general documents section of the “Data Standards & Formats” project. On his slide 13, XML (as a descriptive language), Dublin Core, FGDC, and even BIB-1 are all mentioned as recommended formats (and HTML 3.1 as a presentation language). These recommendations came out of a global meeting held in Montreal by GISP and the CBD in February of 2002, which several of our group attended. Though Dublin Core was created by “library types” and FGDC by “geography types,” both have been modified/expanded for biological use on the Web. Is using XML to describe these formats an inclusive and satisfactory solution? BOB MORRIS: This depends on the modifications and expansions. For Dublin Core, the major ones I know of are the Darwin Core and the Biocase/TDWG Access to Biological Collection Data (ABCD), both of which focus on specimen data. An effort at the Cornell Lab of Ornithology is contemplating extending ABCD to deal with observations. Are there references to other extensions of Dublin Core that we can look at to form an opinion? Ditto for FGDC? Most of the content on the web site http://biology.usgs.gov/fgdc.bio/index.html/wpln2000.html appears to have been last updated in 2000. Is there more recent activity somewhere that we should look at? At a glance, it appears that the FGDC Biological Data Profile extensions comprise taxonomy, methodology, and geologic age stuff. If I‟m correct, I would have thought it is of minimal help to GISIN. It would be most excellent if this work continued somewhere, since among its contributors are people in the SEEK project and other more modern ways of doing distributed data on the web. DATABASE SOFTWARE PACKAGES LIZ SELLERS: So you‟ve got data, and you want to put it into a database. Or you‟ve got a database already. What software package should you use to enter/manage your data/database? Will the software you choose make it difficult for you to share your data with others? What are the key components and functions we should look for in this type of software package? (e.g. Export/import functionality that allows data to be shared with others in a delimited or spreadsheet format?). ANDREA GROSSE: Take a look at the I3N Cataloguer, downloadable from http://www.iabin-us.org/projects/i3n/i3n_tools/download_cataloguer.html. This tool is used in several countries in the Western Hemisphere. MAC-Caribbean Yahoo Group listserv: The main focus of these discussions is indeed the development of Invasive Alien Species (IAS) databases and distributed database systems. However, type specimen collections, herbaria and baseline/taxonomic collections are a valuable resource often referenced in support of IAS databases. Caribbean museums and the UK are reported to be using “MODES” (Museum Object Data Entry System) to manage and exchange specimen/collection data http://www.modes.org.uk/ which is purported to be a cheaper alternative to U.S. software packages. A similar application called “PAST PERFECT” was also mentioned. To support successful exchange of data between museum databases, any software application must have the ability to export/import data in either Excel, Access or CSV format. However, XML is quickly becoming the popular markup language for data exchange. A good example of the use of XML is the I3N Cataloguer. Another issue highlighted by Bruce Potter (listserv member), is that of the difficulty in reaching agreement [e.g. between database developers/owners] on the meaning/use of individual data fields. For example does the “Address” field refer to postal address, street address, or both? [Submitted with permission: Bruce Potter, MAC-Caribbean Yahoo Group, Island Resources Foundation, firstname.lastname@example.org] ANNIE SIMPSON: I‟m a splitter, not a lumper, so I tend to think “address” should be “address1” “address2” etc. Agreeing to disagree can also work, if programmers can create code to make translations between all the differing fields of the interoperating databases. At least that is the way I think it should be able to work. BOB MORRIS: In this context, normally this is the reason to have an agreed upon “integration schema”. It then becomes the task of the data provider to map their internal field names and data organization into those in the integration schema and respond to queries in a way that is described by that schema. When “schema” means XML schema, this is usually a straightforward task aided by the dbms itself nowadays. The main problems come if the db has stuff that cannot be expressed by the schema or vice-versa. Then some information will be lost when the query is answered. STEVE CITRON-POUSTY: Just a note to say I agree with Bob. Send standard fields across the wire and let people decide where they go on either end. So if the address data is echnolog on sending and receiving it‟s trivial to lump it together to put in your db while the inverse is not always true unless there are consistent delimeters in the address information. ROB EMERY: The group may be interested in the approach used in Australia to bring a dozen or so disparate collection databases together to form the Australian Plant Pest Database at http://www.planthealthaustralia.com.au/our_projects/display_project.asp?category=4&ID=1 The query form will load but can only be submitted by collaborators but I‟m sure you will get the idea of how it works. All of the collaborating collections are working taxonomic collections so only records of actual specimen data labels are returned. The first page returned lists the collection by name and the number of specimens held at each collection. There are links to an Australian distribution map as well as a specimen details link which returns: Family, Genus, Species, Common name, host (scientific and common names), location, lat/long, Collector, ID method and Stage. There are quite a few different databases used at the collections, CSIRO‟s Biolink is used at several as well as (I think) Texpress. My echnology developed an Access database in-house about 10 years ago and this had been used by a couple of collaborators as well. Our database holds about 130,000 specimen records. I think there is even an Excel spreadsheet out there that is part of the APPD. CSIRO Maths and Information developed the “Internet Marketplaces” software which involved Apache, broker and gateway software being installed on each collaborator‟s webserver along with a schema which maps the different field names. I‟m sorry if my description does not do this extensive project justice. STEVE CITRON-POUSTY: I talked about this over in the other discussion but maybe rather than going down a proprietary route (and by this I mean someone‟s custom format that uses only their custom software) we should think about using web services to exchange data. In this way all we have to agree on is the contents of the envelope, not how we make the letter or how we send and receive it. This would also allow us to be software, platform, and language agnostic. BOB MORRIS: Works for me. I prefer my web services with capital W and capital S, but not all the good ones use WSDL. Sigh. STEVE CITRON-POUSTY: yup I mean Web Services. And as far as playing nice I think thats the responsibility of the providers. There must be a set of technology tests (put out by oasis or somebody) that you would have to comply with to publish. The other big win for WS is that they work much better in low bandwidth situations and we could also use them for both push and pull of data. BOB MORRIS: I started a new Discussion named “Exchange Protocols” since this current Discussion seems mostly intended to discuss end-user data management issues, not software architecture. Showing Web Services access to IPANE would be great and I think we have time to pull it off. Let‟s take the discussion off line. Send me email as email@example.com You presumably know that we have recently submitted an NSF ITR proposal to build a toolkit for generating image-aware, spatially referenced observation systems like eBird.org. Our partners are the Cornell Lab for Ornithology, IPANE, and the MIT SeaGrant, with invasive species monitoring as the main proof of concept. It‟s an ambitious project with payoff several years out, and there were 1500 ITR proposals submitted, so we will look at other funding opportunities too. LATITUDE/LONGITUDE OR GAZETEER? ANGELA SUAREZ-MAYORGA: The majority of the standards for biological records are too general– even ours, probably because biodiversity is bio-complexity, then data models cannot go to the specific items if they seek to record any biological unit. However, as standardization is a must, minimal fields should be established. For me it is difficult to conceive a biological record (at the individual level, species level or community level) without considering a spatial and temporal reference. Consequently, geographic coordinates are very useful to set the spatial reference, but sometimes [it] may be better to use a different way to do so. If we deal with taxa of non-restricted geographic ranges (i.e. widely distributed species), probably a descriptive way is more helpful than recording many pairs of coordinates. If you want to take a look, the biological standard (in Spanish) of the Biodiversity Information System of Colombia is available at http://www.humboldt.org.co/sib/content.jsp?doc=documentos [one copy of this document will be available at the meeting for reference purposes] BUDDHISRILAL MARAMBE: Inclusion of geographic coordinates would no doubt be of significant importance with respect to future reference, monitoring the invasive behaviour of species, etc. STEVE CITRON-POUSTY: I think pairs of coordinates in Lat Long WGS84 would be great. Or at least pairs of coordinates with a definition of projection and datum. This geographic information will certainly be vital for field records. I think the benefit of coordinates is that we then don‟t have to worry about maintaining a “global” gazetteer that we all match our records against. In our database alone (which is based off a subset of the GNIS) which is only 6 states in the U.S., there are 32.5k records. Using names would also force a level of conformity for entering field data that is rarely achieved. Perhaps names could work at the data set and expert level. DEFINITION OF INVASIVE SPECIES CHARLOTTE CAUSTON: It is also necessary that the fields distinguish between those species that are human and agricultural pests and species that are invasive according to the IUCN definition (i.e. environmental pests). In many cases the definition of the term invasive is confused. REGULATORY FRAMEWORK NEEDED RAUF ALI: The subject of legal issues while dealing with IAS appears to be important. I worked in the Andaman Islands, which are an Indian territory and India‟s laws apply there, including the wildlife laws. The greatest invasive on the islands is the spotted deer or chital (Axis axis) I believe this is a problem in Hawaii as well. I‟ve been documenting the vegetation changes caused by it, and have come to conclude (insofar as one can conclude anything) that major degradation of vegetation is taking place due to this. The issue is, these are protected in mainland India, being endemic there. The law therefore prevents any culling on the islands! Ministry of Environment and Forest officials are reluctant to set any precedent, because the issue of invasives has never come up before. However, it seems that we have a legal obligation under Sec 8 h) of CBD to eradicate these. Some nations probably lack any regulatory framework to eradicate invasives, and in some, like the case given above, existing legislation actually hampers efforts to control invasives. It would appear that a database of case studies dealing with problems in the regulatory framework may help decision makers in considering alternatives or changes to laws. I‟m not sure about how one would set up network-driven solutions to this. BOB IKIN: I am not sure how a database of case studies would assist countries in drafting legislation that gives effective control of organisms that have become a threat to the environment. A database of information would be very complex and difficult to use. The example you quote is an interesting one as the issue seems to fall between a number of pieces of legislation. This includes the Environment Protection Act (1986), the Wildlife Act (2003), the Biodiversity Act (2002) and even the Draft legislation on quarantine. All are so specific that they do not identify this particular scenario. What is needed is a guideline for a framework that identifies all the activities that are required to cover the management of organisms that can have a deleterious effect on agriculture, the environment and possibly lifestyle. I have been involved in the development of such a framework that builds upon the already existing legislative powers and capability of quarantine services, and developes linkages with the national authorities (such as environmental, wildlife, marine and others) so that all issues with introduction, management/contingency planning/and eradication can be dealt with. The need is for the recognition of the various powers to control organisms at all stages, and as a consequence the requirement for consultation at inter-deparmental level for this to be achieved. As an example that illustrates this problem consider the case of a plant that has been brought into a country as an ornamental and because of a particular set of circumstances becomes a weed in an aquatic environment. Departments of Agriculture, Environment, wildlife, fisheries, irrigation etc would all have a stake in its control eradication. In the particular case you outline the legislation would have to be able to identify the special status of the islands (which is already possible under some legislation) and the capacity to eradicate the animal (after identifying it as a pest/invasive) by killing it (wildlife legislation in India only permits control by relocation). LIZ SELLERS: Perhaps a source of information about examples of other national regulatory frameworks that have successfully defined and addressed the IAS problem would be useful to researchers/professionals that are consulting with governments/policy makers that are just beginning to develop or build on their IAS regulatory framework – where existing IAS regulations are only partially applicable (with respect to IAS), or where they do not exist at all? Of course, the first step to regulating anything is defining it… and defining any species as an IAS (and one that requires control or close attention) can be a complex task, and one that must be addressed by each nation according to its needs and priorities… e.g. conserving/protecting the national economy can mean regulating agriculture-related species, biodiversity-related species (e.g. tourism-based economy), or both, or conserving other economic sources effected by IAS. One of several organizations addressing this issue, the IUCN has produced several references addressing the issue of law-development and application with respect to IAS. I must admit, that I have not read these documents, but I think that perhaps a collection including these and similar reference material along with case studies of national attempts to implement a framework of IAS regulation may be a useful component of an IAS Legal Toolkit – available online or on CD? These references [are] located online at: http://www.issg.org/publications.html * Shine S., N. Williams and L. Gündling (2000). A Guide to Designing Legal Institutional Frameworks on Alien Invasive Species. IUCN, Gland, Switzerland, Cambridge and Bonn. Xvi + 138pp (English version only). * Legal and Institutional Dimensions of Alien Invasive Species Introduction and Control. Proceedings of the Workshop on the Legal and Institutional Dimensions of Alien Invasive Species Introduction and Control. Held at the IUCN Environmental Law Centre, Godesberger Allee 108-112, Bonn, Germany 10-11 December 1999. BOB IKIN: In addition to the publications listed by Liz the following – www.biodiv.org/doc/publications/cbd-ts-02.pdf REVIEW OF THE EFFICIENCY AND EFFICACY OF EXISTING LEGAL INSTRUMENTS APPLICABLE TO INVASIVE ALIEN SPECIES. Technical Review No 2. Secretariat of the CBD, Montreal, 2001. Not only lists the scope of the many legal instruments, but identifies areas of convergence in their application/understanding. Note para 124 –Capacity to address environmental, economic and social challenges posed by invasive alien species is not remotely sufficient. From the legal and institutional perspective, this paper has highlighted the complexity of existing regimes as well as strengths, gaps and inconsistencies. And para 126 –The task facing policy-makers is how to strengthen capacity to protect native biodiversity against invasion impacts without adding extra complexity or duplicating what already exists. …. I am currently looking at working with FAO Legal Office on an update of global phytosanitary guidelines, so would it be useful to explore the possibility of addressing the identified inconsistencies in this publication? WHO WILL USE GISIN DATA AND WHAT FOR? BOB IKIN: Stakeholders (users) of this biological information database that enables assessments to be made on the capacity of species to enter through pathways to become invasive, have varied responsibilities and backgrounds. Regulatory authorities who use this type of information include environmental scientists and administrators as well as quarantine authorities (National Plant Protection Organisations) who maintain the point of entry barriers at which regulatory action is taken. Those involved in making risk assessments also include research scientists and universities. The aim of any database is to serve its clients, to be inclusive and promote cooperation between those concerned with conservation and environmental impact and those with the impact of incursions on agriculture and related fields. In most developing countries the distinction between these two areas is almost non existent. BOB MORRIS: Traditional database design principles assert that before anything else, you should find out how your data will be used. That tradition is a death trap. YOU CAN NOT KNOW HOW YOUR DATA WILL BE USED and should design without assumptions. Nowadays, that often means being prepared to use ontology mechanisms to map whatever concepts you built into your database to those that meet the needs of some other audience than the builders. See the NSF SPIRE and SEEK projects, http://spire.umbc.edu/ and http://seek.ecoinformatics.org/ respectively. Jim Quinn is one of the Pis on SPIRE and hopefully there will be mention of its work at the meeting. MICHAEL BROWNE: This topic is about the role of the GISIN – what difference will the GISIN make and how will it do so? As Bob says, it will only make a difference in most developing countries if it provides access to data about species both with agricultural and biodiversity impacts. If we stick to this inclusive principal, it will serve the „data rich‟ just as well as „data poor‟ regions (e.g. how do we get data on species that are only invasive in central Africa), and it will meet research requirements as well as those of land managers and quarantine agencies. It will also serve those who have minimal access to the Internet or none at all. IAS are a global problem. Is the GISIN to be inclusive or exclusive? “JAMALIENS [SUZANNE DAVIS?] Consider the question “What are the objectives of a GISIN?” The answers to that question will greatly determine how inclusive or exclusive the GISIN is. Possible answers are: 1. To provide direct and easy access to global online IAS resources. [I see metadata being instrumental here]. Dealing with metadata would result in a high level of inclusivity. 2. To facilitate making existing databases interoperable and accessible via the Internet. The more technical persons could address this in more detail, but clearly the levels of technology used e.g. Internet accessibility and communication, types of database software and their compatibilities, etc. would probably limit the participation of some contributors to the GISIN. CDs, LOW BANDWIDTH, AND ACCESSIBILITY ISSUES MINGGUANG LI: Low bandwidth and lack of technical maintenance in some countries might hinder the accessibility of data from these areas. A mirror set up in a host providing high bandwidth is thus recommended. BOB IKIN: To which can be added the internet access costs in developing countries. Having worked for the last 4 years in Asia, the Pacific, Africa and the Caribbean I believe it is essential that users also should have access to data in CD-Rom format. CD readers are now widely available, CD production is now simple and cheap and by distributing data, say annually, access by internet for updates would not be too difficult or expensive. LIZ SELLERS: So perhaps we should refocus our initial efforts on encouraging the development of CD- ROM versions first, followed by online versions. Or a “preferred accessibility requirement” of developing a companion-CD-ROM version for any/all online IAS databases. Thus falling more in line with a typical Decision Support System approach where IAS database CDs can be incorporated into the existing suite of software and data tools already being used by clients. Which leads me to question whether a short list of “preferred accessibility requirements” for IAS databases might be a good product to develop during the GISIN pre-meeting discussions/experts meeting? Another option to add to the list as Mingguang_Li states – is a requirement for providing low and high bandwidth viewing options for online tools – such as those provided for viewing this online community (see „Portal Settings‟ at the top of your GISIN community page). MICHAEL BROWNE: Mingguang_Li „s recommendation to provide a high bandwidth mirror will resolve problems related to his server bandwidth and GISIN users accessing his data. Liz said we should provide low and high bandwidth viewing options at GISIN – should contributing databases be asked to offer low bandwidth access options (minimal graphics, short pages, text only) for users with low access bandwidth? I agree with Bob that annual CD-ROMs (“a companion-CD-ROM version for any/all online IAS databases”) would be a good way to deliver information where there is no Internet or poor/expensive access. The question is what content to include in the CD-ROM and how to present it effectively. ISSG is planning to produce a CD-ROM containing 300 invasive species profiles in late 2004, so we will soon be dealing with these issues. Liz has suggested we focus on some GISIN guidelines for accessibility, so let‟s do that. We can use Brian Steves‟ Provider Requirements as a basis for discussion (see NISbase Information for Developers.pdf under „Documents‟). BADRUL AMIN BHUIYA: Since the discussion started I had the same problem of low bandwidth and since today I could not acess to the discussions. At home I use a dial connection with 56k modem. This condition is similar in most of the places of Bangladesh. As a result, unsatisfactory maintenance of the GISIN will be a factor. Now at Chittagong University we are using 128kbps bandwidth through VSAT and in the office we do not have problem. So, suggestions for low as well as high bandwidth facilities are recommended for GISIN site. BOB MORRIS: Most such users would be well served by optical media PLUS a subscription based notification system that would tell them when a particular record has been updated or added, along with a mechanism for updating records of interest. CDs are rapidly becoming obsolete in favor of DVDs, which, however, are not fully standardized yet as to their encoding. MICHAEL BROWNE: What proportion of potential users of the GISIN, and what proportion of potential participating databases face problems associated with low bandwidth (or no Internet access at all)? I found a July 1999 map of global Internet access at http://mappa.mundi.net/maps/maps_007/ Has anyone got more recent data? BOB MORRIS: This map original comes from Matrix Map Quarterly, http://www.mids.org/mmq/index.html, which stopped publishing them in 2001. It shows number of internet users, and doesn‟t offer much insight into bandwith of enduser computers. For Africa, see http://www3.sn.apc.org/africa/afstat.htm and other stuff at the same site. The problems with looking for this stuff on the web are several: (1) Most data is about transborder bandwidth, not enduser bandwidth. (2) Most enduser bandwidth data is gathered as market research for retail e-commerce, so is about households. (3) Most GISIN clientele are probably investing in bandwidth much faster than households. (4) 4G wireless has data bandwidth up to 20Mbs, and many developing countries are leapfrogging developed countries by installing advanced wireless telecommunications faster (per capita) than many developed countries. Even 3G wireless can go to 2Mbs and is rapidly being deployed at data rates faster than ISDN. See http://www.infodev.org/symp2003/publications/wired.pdf MICHAEL BROWNE: Thanks for sharing the African report, Bob. It states that 5 million out of 800 million Africans use the internet and the rate of growth is slowing due to cost. It complements the 1999 Global Internat Access map at http://mappa.mundi.net/maps/maps_007 which shows that more than half the planet is in a similar situation. This map shows the geographic locations of the Internet hardware (networked computers, known as hosts). The number of hosts is aggregated for major cities and countries and then represented on the map by the coloured circles. This is not a map of the number of internet users. If we knew that most developing countries will soon have advanced wireless telecommunications, and that providers and users of data in those countries would have reasonably unrestricted and low-cost access to the Internet, we could spend less time ensuring that the GISIN also caters for the needs of the Internet „have nots‟. How do we find out if this is the case? In simple terms, GISIN clientele are made up of potential providers and users of data. It would be helpful to understand who these providers and users are and what their chances of participating in, and benefiting from, the GISIN in the near future are. For example, the telecommuncations infrastructure in most of the Pacific region is such that people working there do not expect significant improvements in the next 5 years. Some contributers to these discussions on the portal have already described their less-than-optimal access to the Internet. Perhaps they could offer a prognosis for their regions over next 5 years. BOB IKIN: Having visited most of the Pacific countries in the last four years on assignments to undertake training in risk analysis I would support the observation by Michael that little is likely to change with regard to internet access by the likely users of GISIN. Although national data might paint a picture of availability within specific parameters, those in whom we have interest are likely to be less well served. Issues include the effect of local infrastructure on local line delivery (I am not aware of any special treatment of technical government persons above those of the general public); the excessive cost of access which is seen as a government revenue earner, and limited access to the service within even official authorities (limited to senior staff, due to perceived capacity to surf non-work related sites). As a mechanism of exchanging technical information, the Internet is not a stable environment for these countries at the moment, and I have had to rely on other mechanisms. Hence my comments earlier on the need to continue to support exchange of information on CDs, with updates to be provided through Internet access. I have had similar but limited experiences with access in countries in Africa, where cost and local service reliability are limiting factors, particularly in locations remote from the capital and main towns. PHILIP THOMAS: My contention is (and always has been) that resources for whom part of the audience is the underserved population without internet access should be developed in such a way that they ARE accessible to “internet-free” zones (e.g. via CD). If the product is developed to work well on CD, it seems superfluous and a waste of valuable resources to spend much (if any) time developing other interfaces that serve the same data. That time would likely be better spent improving the CD-version interface (which–as has been mentioned earlier–should simultaneously be available on the web). The Pacific Island Ecosystems at Risk project (PIER; http://www.hear.org/pier/) (Jim Space, US Forest Service) seems to be the perfect example of how things “should be done.” His CD and website are identical, and he has given thought to how to best include appropriate information on the CD. This information is, of course, also available online. Virtually no modifications are made between the CD and web-based versions. The information is (to some extent, and soon to a much greater extent) automatically produced based on information in a database, but since the product is completely HTML (and PDF)-based, no special server side software/maintenance is required (therefore it works nicely as a CD product). (The raw PIER database will soon be available online as XML and/or SQL server, so other interfaces can be created based on the data. In fact, some of the data is already being used via http://pbin.nbii.gov:8080/NISbase/SpSearchpbin.jsp.) However, my point is that such auxiliary interfaces for those with higher-end capabilities should be viewed as PURELY OPTIONAL, and that the primary interfaces should be designed to work from CDs. If well-designed, products from this type approach will (by definition) be as useful as other interfaces which preclude important audience segments. ROB EMERY: We released our entomology website on CD for farmers without Internet access and it was very well received, in fact it was self propagating as people burned copies (with our permission) for neighbours and so on. The cost of CD production was minimal for large quantities. One of the problems we had was that many of our webpages which would be of interest to farmers (e.g. information data sheets) were database-driven and therefore needed to be individually saved as html pages. Also, I wish we had put some sort of “expiry header” so that people working with old CDs would know that the information is out of date. We also purchased a licence for a CD search tool, the name of which I can‟t remember, which improved CD navigation. Our plan was to maintain the website as the primary information source and press CDs a regular intervals. TAXONOMIC AUTHORITY(IES) LIZ SELLERS: The Integrated Taxonomic Information System (ITIS) provides “authoritative taxonomic information on plants, animals, fungi, and microbes of Norther American and the world.” Are there other taxonomic systems out there that should be considered as a chosen taxonomic authority for use in IAS information systems? For reference: In the Davis Declaration, that resulted from the 2001 “Workshop on Development of Regional Invasive Alien Species Information Hubs, Including Requisite Taxonomic Services, In North America and Southern Africa”, participants called upon the ITIS, the Global Biodiversity Information Facility (GBIF), BIONet International and the Global Taxonomy Initiative (GTI) to “make IAS a priority, establish global standards for IAS taxonomic classification, and improve the availability of accurate IAS taxonomic information. Reference the „Davis Declaration‟ included in this project (Davis Declaration (February 2001).pdf) GORDON RODDA: For reptiles outside of North America, I don‟t find ITIS very complete. However, the EMBL site (www.embl-heidelberg.de/~uetz/LivingReptiles.html) is not only complete and up to date, but also well linked to useful sites. It has a large number of corporate and NGO sponsors, including HL and SSAR, the leading North American scientific societies of relevance. It is unfortunate that there is no single site of preference for all taxa, but I prefer authority (EMBL is the only definitive site for global reptiles) over convenience. BOB MEESE: For vertebrates, the situation is pretty well in hand. As noted, the EMBL database is best for reptiles. For fishes, it would be Eschmeyer‟s Catalog of Fishes (http://www.calacademy.org/research/ichthyology/catalog/fishcatsearch.html) or FishBase (nomenclature supplied by Eschmeyer), amphibians it‟s AMNH (http://research.amnh.org/cgi-bin/herpetology/amphibia), and for mammals probably SI (http://www.nmnh.si.edu/cgi-bin/wdb/msw/names/form). For birds one could use Zoonomen (http://www.zoonomen.net/avtax/frame.html), but this lists current names only, while the others have synonyms. The sites for fishes, amphibians, and reptiles have complete listings including authors and references, but birds and mammals lack such completeness. For invertebrates, the situation is much more difficult, as there are primarily regional lists with narrow taxonomic and geographic focus. And for plants, there really is nothing approaching a global standard, but IPNI (http://www.us.ipni.org/ipni/query_ipni.html) is a useful name-checking resource which links several large databases. BOB MORRIS: The problem with most such resources is less their coverage---that‟s a matter of time and slogging---it is whether they are accessible by software. ITIS has a pretty good XML story, which in particular means it is possible to write applications against it, which does not presently seem to be the case for most of the other databases mentioned. Whenever I see the words “upload” or “download” uttered by end users of data, I know they are getting poor service, pretty much guaranteed to be out of date as soon as they have finished their *load. ANGELA SUAREZ-MAYORGA: I agree with the list and with the problems –invertebrates are difficult. As the Biodiversity Information System in Colombia, we recommend to our users the RBG database for vascular plants (http://www.rbgkew.org.uk/data/vascplnt.html) and W3MOST for non-vascular plants (http://mobot.mobot.org/W3T/Search/most.html). As long as in Colombia we have to deal with many species, we started to build our own authority files. Soon (four months from now) we will have online taxonomic authority files for Carabidae and Cicindelidae (Colombian species) plus Formicidae (to the genus level for the Neotropical region). Anyway, maybe the point here is not the completeness of the database but the quality of the information that the database gives to our purposes. Probably we don‟t need many names, but THE names (I mean, verified databases). MICHAEL BROWNE: I too prefer authority over convenience. It is worth stating that we were able to use ITIS as the taxonomic authority for 87% of the 300+ IAS that will be in the Global Invasive Species Database by July 2004. Perhaps out of this discussion and our meeting will can encourage the various sources of taxonomic information to cooperate and give IAS a high priority. We should be able to indicate to them where the gaps are. BOB IKIN: In terms of practicality many countries are already using the datasheets and taxonomic information that is contained in the Commonwealth Agricultural Bureau International Crop Protection Compendium (CABI CPC) to make decisions on the invasiveness of a wide range of organisms. Initially designed for decisions in crop protection, the system has evolved into a dataset that can be used for invasiveness decisions on plants, pathogens and many vertebrates and invertebrates. It is truly global. Each datasheet, produced by a technical expert on the organism, contains information on the taxonomy, distribution and biology of the organism that is fully referenced. Decisions can therefore be made on the likelihood of entry, establishment and spread for each. In making the case for harmonisation of systems I would like to emphasise the need for agreement on terminology. In the plant protection area, the Glossary of phytosanitary terms (produced in five languages by FAO) has done much to assist with the common understanding of invasive concepts, but this is not the case with the AIS agreements (which define phrases not words). As an example the process of movement of an organism to a new area is considered to be covered by entry, establishment and spread (introduction is entry and spread). I understand that recently a meeting was held between phytosanitary experts from FAO and representatives of the CBD to come to a common understanding of terminology, which is essential since the phytosanitary/biosecurity services of countries are often the only regulatory authority at a point of entry who are able/permitted to make decisions on the import and export of AIS (pests). LIZ SELLERS: For Reference: The Food and Agriculture Organization‟s Glossary of Phytosanitary Terms may be reviewed online at: http://www.fao.org/docrep/W3587E/w3587e00.htm. I have also loaded two PDF versions of the FAO‟s Glossary of Phytosanitary Terms into the Food and Agriculture Organization (FAO) Folder. BOB IKIN: I have uploaded the 2002 version of the Glossary of phytosanitary terms to the FAO documents folder. The glossary is revised every year as new terms and words are incorporated into the Phytosanitary international standards. SOETIKNO SASTAROUTOMO; Additional information for CABI-CPC. The 2004 revised version (will be available in July) will also include information on: “invasive pests of economic and environmental importance, focusing on alien species affecting agricultural and plantation crops and rangelands”. They are currently busy echnology the content for this edition which will add at least another 150 new full data sheets on invasive plants plus around 50 on other new invasive crop pests and some new data for selected existing data sheets. LANGUAGE STANDARDS LIZ SELLERS: Many online databases are presented on the Internet using the English language. How can we provide quality IAS information to customers that speak other languages? CHRISTINE CASAL: FishBase is able to provide language translation for some of the fields in the SpeciesSummary page. The language translation was achieved by: 1. Creating a table of all labels, headers, and notes in the 3 pages with corresponding translation to different languages, which is then accessed from a database and displayed on user browsers. 2. Utilizing the web service offered by Systran. Systran is the engine behind the translation routines in Google, AOL, Alta Vista and others. FishBase used Systran to translate the 3 fields: Diagnosis, Distribution, and Biology in the SpeciesSummary page. For Systran to provide better quality translations FishBase is currently doing this: 1. Bernd U. (University of Kiel) is building up a dictionary that will later be sent to Systran. The Systran engine should then detect if the request comes from FishBase, and should translate it according to our dictionary. For example, the word “Order” should not mean “Command or Instruction” but rather “Fish Order”. 2. Rainer (Froese also of the Univ. of Kiel) now is directing the FishBase encoders to make simple, complete sentences when entering data for Diagnosis, Distribution and Biology. When achieved, Systran will give better translations to these fields. LIZ SELLERS: You raise an interesting point. I think there are two issues here: 1) context-sensitive translation of English to other languages (as in your example of the word 'Order') and 2) selection of a basic set of languages to support? How many languages does Systran support with respect to context-sensitive translation (as in 'Order')? Did you choose a basic set of languages to translate that way? My online research shows that the top 5 spoken languages in the world are (1)Mandarin, (2)English, (3)Hindu-stani, (4)Spanish and (5)Portuguese. However, a chart of web content, by language (online at http://global-reach.biz/globstats/refs.php3) indicates a ranking of (1)English, (2)Japanese, (3)German, (4)Chinese and (5)French. Should participation in the GISIN 'lightly' require (perhaps the word is 'request') support of a basic set of spoken languages (assuming support is also provided to help participants meet the requirement)? If yes - then which ones should we choose?