									Major “validated” databases that cover chemical compounds and their biological properties. David J. Newman, Natural Products Branch, NCI 09FEB06

By” validated”, I am referring to links to published articles, the majority being in peer-reviewed journals and where the references are “anecdotal” they are usually listed. In addition to these, there are a series of papers by a Dutch pharmacist, Peter A. G. M. De Smet that should be read as they do a thorough job of pointing out the problems associated with lack of reproducibility, controlled diagnoses and lack of purity in the preparations used and/or reported in “clinical trials” of herbal preparations. Use De Smet, PAGM, Health Risks of herbal remedies: an update. Clin. Pharmacol. Ther., 2004, 76, 1-17 as your reference. This is not a complete list nor is it in priority order, but covers the major databases, both public and commercial. There are academic discounts available for a number of these and one item that I would recommend is obtaining (if you do not already have it) an academic copy of Chem Draw Office as this is often the “gateway” into a lot of the chemical databases as “ChemDraw” structures are the lingua franca of the chemistry community for searching.

AntiBase 2005 From Wiley USA Hartmut Laatsch ISBN: 0-471-74892-7 Software May 2005 US $8,250.00 Antibase 2005 is a comprehensive database of 31,022 natural compounds from microorganisms and higher fungi. The data in AntiBase have been collected from the primary and secondary literature and then carefully checked and validated. AntiBase includes descriptive data (molecular formula and mass, elemental composition, CAS registry number); physico-chemical data (melting point, optical rotation); spectroscopic data (UV, 13C-NMR, IR and mass spectra); biological data (pharmacological activity, toxicity); information on origin and isolation and a summary of literature sources. A unique feature of AntiBase is the use of predicted 13C-NMR spectra for those compounds where no measured spectra are available. These spectra have been produced using the spectrum prediction program SpecInfo. The CRC Dictionary of Natural Products is also available on CD. This is updated every six months and I understand form talking to the Editor in Chief (whom I have known since she was a graduate student), that for an extra $100 the CD does not expire, otherwise you have to renew every year and the CDs are not usable after an expiry date. In addition to the above: Web of Science, Thomson ISI, is a superb search tool for seeing who is working/publishing on what? Very expensive and based upon the ISI Science Citation Index concept.

Beilsteins’s Crossfire; excellent search tool in chemistry that does a thorough job on chemical structures. Can easily perform sub-structure searches which permits looking at derivatives of compounds that may have been modified form the natural product structure. SciFinder; Chemical Abstracts search tool. Very powerful but should really have an expert operate it as it has expensive on line search timings. Can run in batch mode. Pub Med & Pub Chem; free access via National Library of Medicine. NCCAM and Office of Dietary Supplements; there are databases under these groups that are being upgraded regularly and these are worth looking at. May have to browse around a bit as with all gov’t websites, they “improve” them regularly.

Jerald J. Baronofsky, Ph.D.

The chemical information expansion explosion continues at full speed. Staying on top of new developments and identifying trends, are not simple tasks. One new development is subscriptions to online databases. These webserver subscriptions allow scientists to gain immediate access to the latest information available. There exist subscribable servers devoted to reaction databases, organic and inorganic compounds, chemical catalogs, publications, patents, bibliographies, and even educational testing.

ChemFinder is the new lane, now with over 250,000 chemical products

For those companies providing browser programs, like ChemFinder, there is a need to have the information providers do so in a form compatible to the browser so that the databases become the equivalents of add-on packages to the browser. This enhances the acceptance of the browser and standardizes the industry, facilitating the exchange of data between different parties. Few Online Information Sources Offered Paradoxically, while the information content keeps growing, the number of sources offering the information is not. This is due to several consolidations in the past year resulting from acquisitions and mergers in the business world. This is somewhat surprising for what is a relatively new emerging field. However, normal business dynamics don't seem to apply to this field because of certain development constraints. The first development constraint is that most chemical information is freely (but not conveniently) available to the public. This imposes a constraint on what individuals are willing to pay to get the information in a convenient format, versus trying to retrieve the information themselves. Fortunately, many scientists recognize the value of saving time and are willing to subscribe to the available services. Another development constraint is that incorporating the information into a convenient

format is a tedious, time consuming, and painstaking process. Verifying the integrity of the information is absolutely necessary and has the same drawback. The upside is, once the major undertaking is over, it never has to be repeated, and the process of adding to the information is much easier. Thirdly, developing a system to view and utilize the information, which will please all, or even most users, is an almost impossible proposition. Standardization on a few formats is likely to occur and become universally accepted. Finally, chemical browser technology is a slave to general, all purpose browser technology, and has to live and deal with all its bugs and shortcomings. Thus, the fact that the number of participants has remained small, and is shrinking rather than expanding, can be interpreted as a strengthening of the industry around the survivors, similar to how the US auto industry reduced to the Big Three. It will be interesting to monitor this situation over the next few years. Now on to the Data Purveyors! CambridgeSoft Corporation ChemInfo continues to grow in content for the boxed product and ChemFinder WebServer. In the past year, the ChemACX catalog database has more than doubled the number of compounds listed to over 250,000 chemical products. The list of participating vendors has grown by a factor of five. ChemINDEX now sports close to a quarter million products, and is constantly growing. ChemACX and ChemINDEX represent only half of the components of ChemInfo; the others being ChemRXN, a pair of reaction databases with close to 30 thousand entries, and ChemMSDS, a database of Materials Safety Data Sheets. These sheets are increasingly becoming necessities for every chemistry laboratory. See table of ChemFinder Chemical Databases. Chemical Concepts/Wiley Publishing SpecInfo, described in the last article as a 500 thousand spectra database, is now available as SpecInfo Online. Currently, more than 100,000 NMR spectra are available together with more than 18,000 IR and about 60,000 MS spectra. Recent enhancements include crosslinks to the CAS Registry and the Beilstein database. ChemSW ChemSW offers eight specialized databases for several search engines. The databases are very reasonably priced and can be bought bundled together through CS ChemStore as a ChemFinder add-on. ChemSW also sells a Database of Physical Properties for over 20 thousand chemicals, with up to 19 different physical properties. CIS Pro is a chemical inventory program for storing the information on your in-house chemicals (see Managing

Chemical Inventory with CIS Pro on page 8). CRC Press CRC Press has several CD-ROM based products such as Properties of Organic Compounds V.5 (POC), containing physical and spectral data for more than 27,500 organic compounds, Properties of Organic Solvents, covering 564 of the most common solvents, and Properties of Inorganic Compounds V.2 , with property and crystallographic data for 4,000 inorganic compounds. The first two items are available in ChemStore and a ChemFinder compatible version of POC is in production. Derwent The Derwent Journal of Synthetic Methods is a structure-searchable chemical reactions database designed on the principles set out originally by Theilheimer. The database contains over 60 thousand records. The Derwent Drug Registry covers approximately 25,000 known drugs. Each document includes full name, activities, CAS registry number, Derwent Registry Name, chemical substructure and chemical ring codes. ESM Software TAPP is a Windows/Macintosh database of thermophysical properties of over 34,000 inorganic and organic compound phases for solids (18,000), liquids (7,400), and gases (8,800). A simple graphic interface allows rapid selection of compounds based on name, chemistry, structure and/or thermophysical properties. Properties are displayed in a spreadsheet format which allows calculation of temperature-dependent properties at userselected temperatures. Users can add new compounds and edit/supplement the data supplied with TAPP. Over 1,800 temp-erature-composition phase diagrams for metal, oxide and halide systems are also included. Some of the properties available include: density, thermal expansion, elasticity, heat capacity, enthalpy, entropy, free energy, viscosity, electrical conductivity, thermal conductivity, vapor pressure, and surface energy. TAPP is available in ChemStore. ISI ISI is home to several well known chemical information databases. Index Chemicus is a huge chemical compound database that is available on CD-ROM. The data is available as a ChemFinder add-on. For reactions, ISI offers Current Chemical Reactions (CCR) and the Reaction Citation Index (RCI). CCR is a monthly publication of new chemical reactions found in the literature, while RCI is a compilation of over 300 thousand reactions with bibliographic information. Later this year, both CCR and RCI will be available at the desktop for the first time in ChemFinder format for Windows NT, at the behest of several large

pharmaceutical and biotech companies. Marinlit Marinlit is a database featuring information on 10 thousand compounds of marine origin with structures. The database is unique to the ChemFinder search engine. Summary There really are only a handful of good chemical browser programs available at this time. These browsers will continue to evolve with enhanced capabilities. Development of new browsers may be curtailed by high entry costs, which stem from the need to have the data converted to a format compatible with the browser. New and updated databases of information will have to be developed with the accepted browser formats in mind, for both the browser developers and the information peddlers to create the symbiosis needed for both to succeed.
Marinlit is a database of the marine natural products literature. In addition to the usual bibliographic data, the database contains an extensive collection of keywords, trivial names, compound information including structures, formulae, molecular mass, numbers of various functional groups, and UV data. All of these items can be searched for either individually or in various logical combinations. Taxonomic data are also included, permitting the exploration of relationships at various levels such as genus, family or order. Bibliographic data can be exported in a format suitable for entry into EndNote.

Marinlit also has all of the structures available in ChemFinderT files. By using your own copy of CS ChemOffice, these files can be searched, from within Marinlit, for substructures. An optional version of Marinlit contains a mix of calculated and actual 13C and 1H NMR chemical shift data for all compounds in Marinlit. The 13C data can be searched for patterns of shifts from within Marinlit. These data have been generated through collaboration with ACD/Labs. If you choose this option, then you will also be able to access the IUPAC name for each compound in Marinlit.

You will also receive the ACD/Labs 13C and 1H data files (.CUD and .HUD) separately. In conjunction with the CNMR and HNMR Predictor packages (available directly from ACD/Labs) you will be able to visualise the spectra for each compound, along with the assignment of shifts to each position in the structure. You will also be able to view the calculated physical properties of each compound. There is a link between Marinlit and the CNMR and HNMR Predictors, so that the relevant data and features in the ACD packages can be accessed from within Marinlit. The ACD Predictors are only available as Windows versions. For Macintosh users of Marinlit, we recommend the use of Virtual PC so that all of these features are available on one computer.

The number of references from 980 journals/books is 15,500 with data for 15,100 compounds. These compounds have been derived from 3,088 species from 1,865 genera from 23 phyla. Annual updates to the Marinlit files are available for a small charge.

Marinlit has been developed as a stand-alone application written in Microsoft FoxPro.T The requirements for its use are either a Macintosh computer (system 7.1 or later) with at least 120Mb of disk space available, or a PC with any version of Windows, and having 200Mb of hard disk space available.

The Marinlit files are shipped on a CD, for installation on your hard disk. An Instruction Manual is included, in addition to the on-line help available from within the program.

NAPRALERTSM, an acronym for NAtural PRoducts ALERT, is the largest relational database of world literature describing the ethnomedical or traditional uses, chemistry, and pharmacology of plant, microbial and animal (primarily marine) extracts. In addition, NAPRALERTSM contains considerable data on the chemistry and pharmacology (including human studies) of secondary metabolites of known structure, derived from natural sources. NAPRALERTSM currently contains the extracted information from over 150,000 scientific research articles dating back from 1650 A.D. to the present. Approximately 80% of the file is a systematic survey of the literature from 1975 to the present. The remaining records are obtained by selective retrospective indexing dating back to 1650. These articles contain information on more than 151,000 plant, marine, microbial or animal species and more than 1.5 million records which associate these previous record types with biological activity. NAPRALERTSM database increases at a rate of approximately 600 articles per month covering the natural products literature in over 700 journals. Data retrieval is much more sophisticated than either abstracting services or citation listings. Retrievable data include: standard three part profiles which contain all ethnomedical, pharmacological as well as phytochemical information on the plant, marine, animal or microbe species in question. The database was specifically designed to be of value in the drug discovery and development processes. Data from NAPRALERT searches can enhance research efforts in this area by its ability to rank-order organisms as to their probability of having a specific biological activity. Searches generated by the NAPRALERT database are useful to: Scientists in both academia and industry, with regard to:
Reduction of time and costs of literature research. Facilitation of natural products drug discovery and development. Establishment of research priorities based on already published scientific literature. Analytical methodologies for natural products research: what

works, what doesn't. Analysis of quality, safety and efficacy of herbal products. Agricultural chemistry and biology.

Drug Regulatory Authorities, with regard to:
Assessment of quality, safety and efficacy of botanical products. Food safety and toxicology concerns.

Health-Care Professionals, with regard to:
Specific information on the use of herbal medicines and dosages. Drug interactions and contraindications of botanicals with prescription drugs or food. Information concerning the pharmacokinetics and toxicity of botanical products.

Health Food Industry, with regard to:
Standardization of botanical products, based on the active constituents of the herb in question. What type of analytical methodologies are best suited for standardization. Current status of the scientific literature to support structurefunction claims. Establishment of new product lines.

Data retrieval from NAPRALERTSM is easy, and a wide range of information can be retrieved. The most common queries are three-part profiles of a given organism, which contains all ethnomedical, pharmacological and phytochemical information in the database on that organism, along with full literature citations. Or your data retrieval may focus on specifics, such as chemical constituents, geographic locations or biological activity of any given family, genus or species of organism. Data on marine organisms can be selectively retrieved. The NAPRALERTSM database is fully accessible, for a reasonable fee, from your office or home via BITNET, INTERNET, PRODIGY and COMPUSERVE. It may also be accessed through the European network EARN. For those who do not have access to any of the aforementioned networks, NAPRALERTSM can provide off-line searches at competitive rates. The NAPRALERTSM database is also available through the Scientific and Technical Information Network (STN), and data retrieval is now even easier by using standard STN phrases. For further information please

The NAPRALERT database is maintained by the Program for Collaborative Research in the Pharmacological Sciences (PCRPS), College of Pharmacy, University of Illinois at Chicago, 833 South Wood Street (M/C 877), Chicago, IL 60612, USA. A world Health Organization Collaborating Centre for Traditional Medicines.

Dr. Duke's Phytochemical and Ethnobotanical Databases

Specific Queries of the Phytochemical Database
[Queries indicated in green are newer queries that search the most recent version of the database] Plant Searches Chemicals and activities in a particular plant. High concentration chemicals. Chemicals with one activity. Ethnobotanical uses. List chemicals and activities for a plant. Chemical Searches Plants with a chosen chemical. Activities of a chosen chemical. List activities and plants for a chemical. List common activities (synergies) for a list of chemicals. Activity Searches Plants with a specific activity. Search for plants with several activities. Chemicals with a specific activity. Lethal dose (LD) information for a chemical. Search for plants/chemicals with one or more activities. Search for plants/chemicals with a superactivity. Ethnobotany Searches Ethnobotanical uses for a particular plant. Plants with a particular ethnobotanical use. Database References Reference citations.

Use this URL http://tcm.cz3.nus.edu.sg/group/tcm-id/tcmid.asp for access to an excellent TCM database organized by the University of Shanghai and the University of Singapore. Again “validated”.

Traditional Chinese Medicine Information Database

A database to provide information on all aspects of TCM including formulation, herbal composition, chemical composition, moleculare structure and functional properties, therapeutic and toxicity effects, clinical indication and application, and related literatures.This database currently contains entries for 1197 formulae, 1098 medicinal herbs and 9852 herbal ingredients.
