E-Chemistry and Web 2.0
Document Sample


E-Chemistry and Web 2.0
Marlon Pierce
mpierce@cs.indiana.edu
Community Grids Lab
Indiana University
1
One Talk, Two Projects
NIH funded Chemical Proposed Microsoft-
Informatics and Funded Project: E-
Cyberinfrastructure Chemistry
Collaboratory (CICC) @ Carl Lagoze (Cornell),
IU. Lee Giles (PSU),
Geoffrey Fox Steve Bryant (NIH),
Gary Wiggins Jeremy Frey (Soton),
Rajarshi Guha Peter Murray-Rust
David Wild (Cambridge),
Mookie Baik Herbert Van de Sompel (Los
Kevin Gilbert Alamos),
And others Geoffrey Fox (Indiana)
And others
2
CICC Infrastructure Vision
Chemical Informatics: drug discovery and other academic chemistry,
pharmacology, and bioinformatics research will be aided by powerful,
modern, open, information technology.
NIH PubChem and PubMed provide unprecedented open, free data and
information.
We need a corresponding open service architecture (i.e. avoid stove-piped
applications)
CICC set up as distributed cyberinfrastructure in eScience model
Web clients (user interfaces) to distributed databases, results of high
throughput screening instruments, results of computational chemical
simulations and other analyses.
Composed of clients to open service APIs (mash-ups)
Aggregated into portals
Web services manipulate this data and are combined into workflows.
So our main agenda items: create interesting databases and build lots of
Web services and clients.
3
CICC Databases
Most of our databases aim to add value to
PubChem or link into PubChem
1D (SMILES) and 2D structures
3D structures (MMFF94)
Searchable by CID, SMARTS, 3D similarity
Docked ligands (FRED, Autodock)
906K drug-like compounds into 7 ligands
Will eventually cover ~2000 targets
Philosophy: we have big computers, so let‟s
calculate everything ahead of time and put the
results in a DB.
Building Up the Infrastructure
Our SOA philosophy: use standard Web services.
Mostly stateless
Some cluster, HPC work needed but these populate
databases
Services are aggregate-able into different
workflows.
Taverna, Pipeline Pilot, …
You can also build lots of Web clients.
See
http://www.chembiogrid.org/wiki/index.php/CICC_
Web_Resources for links and details.
Not so far from Web 2.0…. 5
Sample Services
Type Service Functionality Source License
Provides access to
the results of
docking a subset of
PubChem into a set
Indiana Freely
Database Docking of ligands.
University accessible
Searchable by 2D
structure and
docking docking
score
Provides access to
3D structure Indiana Freely
Database 3D Structure
generated for most University accessible
of PubChem
Extract chemical
Cambridge Freely
Cheminformatics OSCAR3 structures from
University accessible
text
Uses Google to
Cambridge Freely
Cheminformatics InChiGoogle search for an
University accessible
InChI
Generates a
Cambridge Freely
Cheminformatics CMLRSSServer CMLRSS feed from
University accessible
CML data
Converts chemical Cambridge Freely
Cheminformatics OpenBabel
file formats University accesible
6
Indiana
University &
Obtains toxicity Freely
Cheminformatics ToxTreeServer European
hazard predictions accessible
Chemical
Bureau
Indiana
Generates 166 bit University & Freely
Cheminformatics DBUtil
MACCS keys gNova accessible
Consulting
Evaluates 2D/3D
similarity and
Indiana
Molecular evaluate distance Freely
Cheminformatics University &
Similarity moments for 3D accessible
CDK
similarity
calculations
Generatesarious
descriptors Indiana
Molecular Freely
Cheminformatics including TPSA, University &
Descriptors accessible
XLogP, surface CDK
areas
Generates 2D Indiana
2D Structure Freely
Cheminformatics structure diagrams University &
Diagrams accessible
from SMILES CDK
Evaluates Indiana
Druglikeness Freely
Cheminformatics measures of University &
Methods accessible
druglikeness CDK
Generates hashed
Indiana
fingerprints, 2D Freely
Cheminformatics Utility Methods University &
coordinate accessible
CDK
generation etc.
7
Samples from
several
Sampling Indiana Freely
Statistics distributions
Distributions University accessible
(normal, uniform,
Weibull etc)
Builds linear Indiana Freely
Statistics Linear Regression
regression models University accessible
Builds neural
Indiana Freely
Statistics CNN Regression network regression
University accessible
models
Builds random
Indiana Freely
Statistics RF Regression forest regression
University accessible
models
Builds linear
Indiana Freely
Statistics LDA discriminant
University accessible
analysis models
Performs K-means Indiana Freely
Statistics K-Means
clustering University accessible
Performs feature
selection using Indiana Freely
Statistics Feature Selection
stepwise University accessible
regression
Generates 2D Indiana Freely
Statistics XY Plots
scatter plots University accessible
Generates Indiana Freely
Statistics Histogram Plots
histograms University accessible
8
Converts tab
Indiana Freely
Data Exchange TabToVOTables delimited files to
University accessible
VOTables
Converts VOTables
Indiana Freely
Data Exchange VOTablesToTab to tab delimited
University accessible
files
Converts VOTables
Indiana Freely
Data Exchange VOTablesToXLS to Excel
University accessible
spreadsheet
Retrieves field
names and data
Indiana Freely
Data Exchange VOTable Retrieve types from a
University accessible
VOTables
document
Extracts columns
Indiana Freely
Data Exchange VOTableExtract from a VOTables
University accessible
document
Handles file
Computational Varuna File Indiana Freely
formats for
Chemistry Format University accessible
QM/MM packages
Performs analysis
Computational Indiana Freely
Varuna Analysis of results from
Chemistry University accessible
Jaguar and ADF
Computational Searches the Indiana Freely
Varuna Query
Chemistry Varuna database University accessible
Submits input data
Computational Indiana Freely
Varuna Submit for calculation on a
Chemistry University accessible
local cluster
9
Openeye
Application Fred Performs docking Commercial
Software
Property
Openeye
Application Filter calculation and Commercial
Software
filtering
Generates 3D Openeye
Application Omega Commercial
conformers Software
Generates 1052 Digital
Application BCI Fingerprint Commercial
BCI structural keys Chemistry
Performs divisive Digital
Application BCI Clustering Commercial
k-means clustering Chemistry
Evaluates Indiana
pharmacokinetic University & Freely
Application PkCell
parameters for University of accessible
druglike molecules Michigan
Gets toxicity
predictions for RF Indiana
Scripps MLSCN Freely
Application models built using University &
Toxicity accessible
MLSCN cell-line Scripps, FL.
data
Gets anti-cancer
NTP DTP Anti- actvity predictions Indiana Freely
Application
cancer activity for the 60 NCI cell University accessible
lines
Ames Gets mutagenicity Indiana Freely
Application
Mutagenicity predictions University accessible
10
Web Client Interfaces
Name Functionality Type Lin ks
Interface to the htt p://w ww.chembiogrid.org/cheminf
PubDock Web
docking database o/dock/
Interface to the 3D htt p://w ww.chembiogrid.org/cheminf
Pub3D Web
structure database o/p3d/
Ident ify
compounds that
Frequent occur in multiple htt p://w ww.chembiogrid.org/cheminf
Web
Hitters assays, with links o/freqhit/fh
to individual
assays
Predict whether a
MLSCN T oxicity Web and htt p://w ww.chembiogrid.org/cheminf
compound will be
Predictions Pipeline Pilot o/rws/scripps
toxic or not
htt p://che minfo.informatics.indiana.e
Predict toxicity
ToxTree Web du/~rguha/code/java/cdkws/cdkws.
hazard class
html#tox
Predict whether a
DTP Anti - compound exhibits
htt p://w ww.chembiogrid.org/cheminf
Cancer anti-cancer activity Web
o/ncidtp/dtp
Predictions against the 60 NCI
cell lines
11
More Clients…
Predict whether a
Ames
compound is http://www.chembiogrid.org/cheminf
Mutagenicity Web
mutagenic or not in o/rws/ames
Predictions
the Ames test
Evaluate
http://www.chembiogrid.org/cheminf
PkCell pharmacokinetic Web
o/pkcell/
parameters
Natural language
http://cheminfo.informatics.indiana.e
Kemo interface to Web
du:8080/kemo/
PubChem
Generate RSS
feeds for various Web and RSS http://www.chembiogrid.org/cheminf
RSS Feeds
PubChem related feed o/rssint.html
queries
Statistical Download
http://www.chembiogrid.org/cheminf
Model statistical models Web
o/rws/mlist
Download as R binary files
Miscellaneous
functions such as http://cheminfo.informatics.indiana.e
Cheminformatic
structure Web du/~rguha/code/java/cdkws/cdkws.
s
diagrams, html
similarity etc.
12
More Clients…
http://129.79.139.29/filecon/Default.
File operations and aspx and
Varuna Web
result analysis http://129.79.139.29/utilityclient/De
fault.aspx
Plotting data using http://gf1.ucs.indiana.edu:9080/axis
VOTables as well /VOTables.html and
VOTables Web
as using Excel files http://www.chembiogrid.org/cheminf
via VOTables o/rws/xlsvor
.Net interface to Desktop http://darwin.informatics.indiana.edu
PubChemSR
PubChem application /juhur/Tools/PubChemSR/
http://cran.r-
R packages to
project.org/src/contrib/Descriptions/
rpubchem and interface with the Desktop
rcdk.html and http://cran.r-
rcdk CDK and access applciation
project.org/src/contrib/Descriptions/
PubChem
rpubchem.html
A plugin to allow Desktop
Chimera to utilize application http://poincare.uits.iupui.edu/~h eila
Chimera plugin
the PubDock (requires nd/cicc/code/
database Chimera)
A Greasemonkey
Web (requires
script that shows
PubChem 3D Firefox and http://rna.informatics.indiana.edu/hg
3D structures
View Greasemonkey opalak/3DStructView.user.js
when viewing
)
Pubchem pages
13
Example: PubDock
Database of approximately 1
million PubChem structures (the
most drug-like) docked into
proteins taken from the PDB
Available as a web service, so
structures can be accessed in
your own programs, or using
workflow tools like Pipeline Polit
Several interfaces developed,
including one based on Chimera
(right) which integrates the
database with the PDB to allow
browsing of compounds in
different targets, or different
compounds in the same target
Can be used as a tool to help
understand molecular basis of
activity in cellular or image
based assays
14
Example: R Statistics applied to
PubChem data
By exposing the R statistical package, and the Chemistry Development Kit
(CDK) toolkit as web services and integrating them with PubChem, we can
quickly and easily perform statistical analysis and virtual screening of
PubChem assay data
Predictive models for particular screens are exposed as web services, and
can be used either as simple web tools or integrated into other applications
Example uses DTP Tumor Cell Line screens - a predictive model using
Random Forests in R makes predictions of probability of activity across
multiple cell lines.
15
A protein implicated in tumor
growth with known ligand is
selected (in this case HSP90 taken
from the PDB 1Y4 complex)
The screening data from a Example assay Docking results and
cellular HTS assay is activity patterns fed into
similarity searched for screening R services for building of
compounds with similar activity models and
2D structures to the workflow: finding correlations
ligand.
cell-protein
relationships Least Random Neural
Squares Forests Nets
Regression
Similar structures are
filtered for drugability, are
converted to 3D, and are Once docking is complete,
automatically passed to the user visualizes the high-
the OpenEye FRED scoring docked structures
Similar structures to the docking program for in a portlet using the JMOL
ligand can be browsed docking into the target applet.
using client portlets. protein.
16
Relevance to Web 2.0
Some Web 2.0 Key Features
REST Services
Use of RSS/Atom feeds
Client interfaces are “mashups”
Gadgets, widgets for portals aggregate clients
So…
We provide RSS as an alternative WS format.
We have experimented with RSS feeds, using Yahoo
Pipes to manipulate multiple feeds.
CICC Web interfaces can be easily wrapped as
universal gadgets in iGoogle, Netvibes.
Alternative to classic science gateways.
17
RSS Feeds/REST Services
Provide access to DB's via RSS feeds
Feeds include 2D/3D structures in CML
Viewable in Bioclipse, Jmol as well as Sage etc.
Two feeds currently available
SynSearch – get structures based on full or partial
chemical names
DockSearch – get best N structures for a target
Really hampered by size of DB and Postgres
performance.
Tools and mashups based on web service
infrastructure
http://www.chembiogrid.org/projects/proj_tools.html 19
Mining information from journal
articles
Until now SciFinder / CAS only chemistry-aware portal
into journal information
We can access full text of journal articles online (with
subscription)
ACS does not make full text available … but there are
ways round that!
RSC is now marking up with SMILES and GO/Goldbook
terms!
www.projectprospect.org
Having SMILES or InChI means that we can build a
similarity/structure searchable database of papers: e.g.
“find me all the papers published since 2000 which
contain a structure with >90% similarity to this one”
In the absence of full text, we can at least use the abstract
20
Text Mining: OSCAR
A tool for shallow, chemistry-specific natural
language parsing of chemical documents (e.g. journal
articles).
It identifies (or attempts to identify):
Chemical names: singular nouns, plurals, verbs etc., also
formulae and acronyms.
Chemical data: Spectra, melting/boiling point, yield etc. in
experimental sections.
Other entities: Things like N(5)-C(3) and so on.
Part of the larger SciBorg effort
See
http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html)
http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/O
scar3
21
QuickTime™ and a
Create a database containing the
text of all recent PubMed abstracts
TIFF (Un compressed) decompressor
are neede d to se e this picture.
(2006-2007 = ~500,000)
Use OSCAR to extract all of the
chemical names referred to in the
abstracts and covert to SMILES
DATABASE SERVICE
+
DOCKING SERVICE
Convert molecules to
3D and dock into a
protein of interest
Visualize top docked
molecules in a Google-
like interface
Mash-Up: What published compounds might bind to this protein?
E-Chemistry and Digital
Libraries
We can‟t wait to get started….
23
E-Chemistry and Digital Libraries
Key problem with our SOA-based e-Science is
information management.
Where is the service that I need?
What does it do?
We may consider our data-centric services to be
digital libraries.
Data is diverse
Documents
Not just computational information like structures.
Another point of view: how can I link together
publications, results, workflows, etc?
That is, I need to manage digital documents.
24
Digital Libraries
Open Archives Initiative Object Reuse and Exchange
Project (OAI-ORE)
Developing standardized, interoperable, and machine-
readable mechanisms to express information about
compound information objects on the web.
Graph-based representations of connected digital objects.
Objects may be encoded in (for example) RDF or XML,
Retrievable via repositories with REST service interfaces
(c.f. Atom Publishing Protocal)
Obtain, harvest, and register
25
QuickTime™ and a
TIFF (LZW) decomp resso r
are neede d to see this picture.
QuickTime™ and a
TIFF (LZW) decomp resso r
are neede d to see this picture.
Challenges for E-Chemistry
Can digital library principals be applied to data as
well as documents?
Can you link your workflow to your conference paper?
Can we engineer a publishing framework and
message formats around Web 2.0 principals?
REST, Atom Publishing Protocol, Atom Syndication
Format, JSON, Microformats
Can we do this securely?
Access control, provenance, identify federation are key
problems.
28
Institution Project Focus
Cambridge Retrospective Data Extraction
Searching and Indexing
Data Models/Ontologies
Tools and Applications
Cornell Data Models
Interoperability infrastructure
Project Management
Publicity and outreach
Indiana Infrastructure Integration
Trust and Provenance
Tools and Applications
LANL Data Models
Interoperability infrastructure
PuBChem Chemical Structure Archive
Results of Experimental Biological Activity Testing
Cross References to BioMedical Databases
Penn State Retrospective Data Extraction
Searching and Indexing
Analysis
Southampton Prospective & Retrospective Data Provision
Tools and Applications
In-process capture of eChemistry data
Data Linking Š in analysis and publication
More Information
Project Web Site: www.chembiogrid.org
Project Wiki: www.chembiogrid.org/wiki
Contact me: mpierce@cs.indiana.edu
30
31
Chemical Informatics and Cyberinfrastucture Collaboratory
CICC Funded by the National Institutes of Health CICC
www.chembiogrid.org
CICC Combines Grid Computing with Chemical Informatics
Large Scale Computing Challenges Science and Cyberinfrastructure
Chemical Informatics is non-traditional area of high CICC is an NIH funded project to support chemical
performance computing, but many new, challenging informatics needs of High Throughput Cancer
problems may be investigated. Screening Centers. The NIH is creating a data deluge
of publicly available data on potential new drugs.
NIH OSCAR
Cluster Toxicity
.
PubMed Text Docking
Grouping Filtering
DataBase Analysis
Initial 3D OSCAR-mined molecular signatures can
Chemical Structure be clustered, filtered for toxicity, and
Calculation
informatics docked onto larger proteins. These are
text analysis classic “pleasingly parallel” tasks. Top-
programs can ranking docked molecules can be further
process Molecular examined for drug potential.
100,000‟s of Mechanics
abstracts of Calculations
online journal
articles to Big Red (and the TeraGrid) will
also enable us to perform time CICC supports the NIH mission by combining state of
extract Quantum
chemical Mechanics
NIH
PubChem
consuming, multi-stepped the art chemical informatics techniques with
Quantum Chemistry
signatures of Calculations DataBase
calculations on all of PubMed.
• World class high performance computing
potential
drugs. Results go back to public • National-scale computing resources (TeraGrid)
databases that are freely • Internet-standard web services
IU’s accessible by the scientific
POVRay
Parallel Varuna community.
• International activities for service orchestration
Rendering DataBase • Open distributed computing infrastructure for scientists
world wide
Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
MLSCN Post-HTS Biology Decision
Percent Inhibition or Support
IC50 data is retrieved
from HTS
Grids can link data
Workflows encoding plate analysis ( e.g image
& control well statistics, processing developed in
Question:Was this
distribution analysis, etc
screen successful? existing Grids),
traditional Chem-
informatics tools, as well
Question:What should the Workflows encoding as annotation tools
active/inactive cutoffs be? distribution analysis of (Semantic Web,
screening results
del.icio.us) and enhance
lead ID and SAR analysis
Question:What can we learn Workflows encoding
about the target protein or cell statistical comparison of A Grid of Grids linking
line from this screen? results to similar screens, collections of services at
docking of compounds
into proteins to correlate PubChem
binding, with activity, ECCR centers
literature search of active MLSCN centers
Compounds submitted to compounds, etc
PubChem
PROCESS CHEMINFORMATICS GRIDS
R Web Services
34
Why?
Need access to math and stat
functionality
Did not want to recode algorithms
Wanted latest methods
Needed a distributed approach to
computation
Keep computation on a powerful machine
Access it from a smaller machine
35
Why R?
Free, open-source
Many cutting edge methods avilable
Flexible programming language
Interfaces with many languages
Python
Perl
Java
C
36
The R Server
R can be run as a remote compute
server
Requires the rserve package
Allows authenticated access over
TCP/IP
Connections can maintain state
Client libraries for Java & C
37
R as a Web Service
On its own the R server is not a web
service
We provide Java frontends to specific
functionalities
The frontend classes are hosted in a
Tomcat web container
Accessible via SOAP
Full Javadocs for all available WS‟s
38
Flowchart
39
Functionality
Two classes of functionality
General functions
Allows you to supply data and build a
predictive model
Sample from various distributions
Obtain scatter plots and hisotgram
Model development functions use a Java front-
end to encapsulate model specific information
40
Functionality
Two classes of functionality
Model deployment
Allows you to build a model outside of the
infrastructure
Place the final model in the infrastructure
Becomes available as a web service
Each model deployed requires its own front
end class
In general, these classes are identical - could
be autogenerated
41
Available Functionality
Predictive models - OLS, RF, CNN,
LDA
Clustering - k-means
Statistical distributions
XY plot and scatter plots
Model deployment for single model
types and ensemble model types
42
Deployed Models
Since deployed models are visible as
web services we can build a simple web
front end for them
Examples
NCI anti-cancer predictions
Ames mutagenicity predictions
43
Applications
The R WS is not restricted to „atomic‟
functionality
Can write a whole R program
Load it on the R compute server
Provide a Java WS frontend
Examples
Feature selection
Automated model generation
Pharmacokinetic parameter calculation
44
Data Input/Output
Most modeling applications require data
matrices
Depending on client language we can
use
SOAP array of arrays (2D matrices)
SOAP array (1D vector form of a 2D
matrix)
VOTables
45
Data Input/Output
Some R web services can take a URL
to a VOTables document
Conversion to R or Java matrices is done
by a local VOTables Java library
R also has basic support for VOTables
directly
Ignores binary data streams
46
Interacting With R WS‟s
Traditional WS‟s do not maintain state
Predictive models are different
A model is built at one time
May be used for prediction at another time
Need to maintain state
State is maintained by serialization to R
binary files on the compute server
Clients deal with model ID‟s
47
Interacting with R WS‟s
Protocol
Send data to model WS
Get back model ID
Get various information via model ID
Fitted values
Training statistics
New predictions
48
Cheminformatics at Indiana
University School of Informatics
David J. Wild
djwild@indiana.edu
Associate Director of Chemical Informatics &
Assistant Professor
Indiana University School of Informatics,
Bloomington
http://djwild.info 49
Cheminformatics education at
Indiana
M.S. in Chemical Informatics
2 years, 36 semester hours
Includes a 6-hour capstone / research project
Opportunity to work in Laboratory Informatics (IUPUI) or
closely with Bioinformatics (IUB)
Currently 9 students enrolled
Ph.D. in Informatics, Cheminformatics Specialty
90 credit hours, including 30 hours dissertation research.
Usually 4 years.
Research rotations expose students to research in related
areas
Currently 4 students enrolled
Graduate Certificate
4 courses, all available by Distance Education 50
Distance Education for
Cheminformatics
Uses Breeze + teleconference for live sharing
of classes: all that is required is a P.C. and a
telephone. Optional Polycom
videoconferencing.
Lectures are recorded for easy playback
through a web browser
Wiki or similar webpage for dissemination of
course materials
Also participate in CIC courseshare to give
class at University of Michigan
Of 75 students taking our courses since fall 51
Current research in the Wild
lab
Integration of cheminformatics tools and data
sources
A web service infrastructure for cheminformatics
Compound information & aggregation web service
and interface (“by the way box”)
An enhanced chatbot for exploting chemical
information & web services
A semantically-aware workflow tools for
cheminformatics
Data mining the NIH DTP tumor cell line database
PubDock: a docking database for PubChem
52
Current research in the Guha
lab
Predictive Modeling
Interpretation, validation, domain applicability
Generalization to other „models‟ such as docking,
pharmacophore etc
Integration of multiple data types
Addressing imbalanced and noisy datasets
Analysis of Chemical Spaces
Quantify distributions in spaces
Investigation of density approaches
Applications to lead hopping, model domains
Methods to summarize & compare data
53
Applications to HTS and smaller lead series type
Cheminformatics services
Cheminformatics web service Docking (FRED)
3D structure generation
infrastructure (OMEGA)
Database Services Filtering (FRED, etc)
PostgreSQL + gNova OSCAR3
Fingerprints (BCI, CDK)
PubChem mirror
(augmented) Clustering (BCI)
Toxicity prediction
Pub3D - 3D structures
for PubChem (ToxTree)
R-based predictive models
PubDock - Bound 3D
structures Similarity calculations
(CDK)
Compound-indexed
journal article DB Descriptor calculation
(CDK)
NIH Human Tumor Cell
Xiao Dong, Kevin E. Gilbert, Rajarshi Guha, Randy Heiland, Jungkee Kim, Marlon E. Pierce, Geoffrey C. Fox and
David J. Wild, Web service infrastructure for chemoinformatics, Journal of Chemical Information and Modeling, 2007;
Line 2D
47(4) pp 1303-1307 structure diagrams
(CDK) 54
Local PubChem mirror
RSC Project Prospect - what
can we do with the
information?
www.projectprospect.org
>100 papers marked up with SMILES/InChI
(using OSCAR3), plus Gene Ontology and
Goldbook Ontology terms
Created similarity searchable PostgreSQL /
gNova database with paper DOIs, SMILES,
and ontology terms
Web service and simple HTML interfaces for
searching … “which papers reference
compounds similar to this one in the scope of
these ontological terms?”
55
Greasemonkey / OSCAR
script
http://cheminfo.informatics.indiana.edu:8080/ChemGM/index.jsp
56
By the way…
By the way… annotation
This compounds is very similar to a
prescription drug, Tamoxifen.
This compound is referenced in 20 journal
(mock-up!)
articles published in the last 5 years
Similar compounds are associated with the
words “toxic” and “death” in 280 web pages
It appears to be covered under 3 patents
It has been shown to be active in 5 screens
Computer models predict it to show some
activity against 8 protein targets
Here are some comments on this
compound:
David Wild: don‟t take any notice of the
computational models - they are rubbish
57
Cheminformatics aware
Plug-in allows structures
to be drawn with
simple lab notebook (mock
the pen and cleaned up
up!)
Some useful chemical reactions
Iodoacetate a Iodoacetamide I-CH4COO- ICH2CONH2
OH
OH
C
+ I
S + H2C C S
O
O
FIND INFO ABOUT THIS REACTION
Web service interface
This may also react, chem favored by alkaline pH provides access to
computation and searching.
…. Page is marked up by what
Free text input can be
is possible
converted to machine
readable form by
electrovaya
Automatic detection of
data fields (yield, etc)
Where possible
58
Automatic workflow
generation and natural
language queries or
Develop service ontology using OWL-S
similar language
Allows service interoperability, replacement and
input/outut compatibility
2d 3D structures are
We can then use generic reasoning and
similarity
2D structures
compounds
3D search
3D structures
network analysis tools to find paths from
2D -> 3D
2D
inputs to desired outputs
structure2D structures 3D structures
crawler 3D structures & complexes
Natural language can be parsed to inputs and
2D structures are
P’phore
compounds
dock
result
search 3D protein 3D structures are
desired outputs structure compounds
Smart Clients <--> Agents <--> Services dock = bind
Possible “supercharged life science Google?” 59
Related docs
Get documents about "