Towards a Semantic Web for Bioinformatics

Document Sample
Towards a Semantic Web for Bioinformatics Powered By Docstoc
					REWERSE Working Group A2

Towards a Semantic Web
for Bioinformatics
Abstract                                                                              Mission

With the explosion of online accessible bioinformatics data and                       Biologists wish to understand protein
tools, systems integration has become very important for further                      function from their sequences and
progress. Currently, bioinformatics relies heavily on the Web.                        structure (Fig. 1). The bioinformatics
But the Web is geared towards human interaction rather than                           group of REWERSE brings 9
automated processing. The vision of a Semantic Web facilitates this                   European groups together, which
automation by annotating web content and by providing adequate                        have a strong background in the use
reasoning languages.                                                                  of rules and reasoning for biological
                                                                                                      data on sequence,
                                                                                                      structure, and func-

                                                                                                     They deploy rules
                                                                                                     and reasoning for
                                                                                                     ontologies and text
                                                                                                     mining, gene expres-
                                                                                                     sion data analysis,
                                                                                                     metabolic pathways,
                                                                                                     structure prediction
                                                                                                     and protein interac-
                                                                                                     tion. Enabling these
                                                                                                     tools on the Web sets
                                                                                                     the foundation for a
Fig 1: Sequence-Structure-Function. Biologist wish to go from DNA and protein                        Semantic Web for
sequences to structural information to understand the proteins’ function                             bioinformatics.

Use Scenarios

Consider a biologist working on           partners these proteins have, which         biologist provides the identifiers of
osteoporosis, a major bone disease        metabolic pathways they are involved        the experimentally determined
affecting millions of people. To target   in, which cellular locations the proteins   genes to a rule-based workflow,
osteoporosis, understanding the           mostly occur in, references to the          which integrates various information
balance between cells, which produce      most suitable articles on these             sources.
bone substance, and cells called          proteins, etc.. All this information
osteoclasts (Fig. 2), which consume       taken together, may lead to insights        A service for protein threading will
it, is important. Imagine that a          into which proteins may be suitable         determine the structural domains
scientist has measured the gene           targets to treat bone-related disea-        for the genes. Among others this
expression levels of osteoclasts          ses.                                        service determines a number of
during cell differentiation. Dozens                                                   overexpressed small GTPases,
of genes appear interesting and           Much of the needed information and          which act as molecular switches in
require further annotation to reveal      analysis tools are accessible over          signalling. A service for protein
any details. Numerous PhD students        the Web. However, they are designed         interactions determines potential
spend days in front of the computer       for low-throughput human use and            interaction partners of the GTPases.
using the internet to collect infor-      not for high-throughput automated                                             >
mation on these genes. They wish          use. The vision of a Semantic Web
to learn whether the proteins resul-      for bioinformatics transparently
ting from the expressed genes are         integrates some of these resources
similar to any known proteins with        through the use of mark-up langua-          More information available at
known function, which domains the         ges, ontologies, and rules. In such a
proteins consist of, which interaction    bioinformatics Semantic Web, the
It highlights for example the pro-
teins de/activating the GTPases.
Some of experimentally proteins
play a role in the MAPK signalling
pathway. A rule-based service allows
the biologist to ask queries over the
signalling pathway to determine
whether the expressed proteins are
essential or not. Another service
determines the relevant literature
for the overexpressed genes and
uses an ontology to annotate the

function, processes, and cellular
locations of the genes. The biologist
uses rules to reason over these
ontologies. He or she uses the ontology
                                          Fig2: Osteoclasts are cells which digest bone     (figure c). Among others small GTPases,
to retrieve scientific articles, which    substance and are therefore important for         which act as molecular switches, are found.
are relevant for osteoporosis, but        bone-related diseases such as Osteoporosis.       The inset in figure c shows the superposition
not postmenopausal osteoporosis,          The figure a shows a large cell with separately   of two GTPases, one of which is switched
                                          identifiable, multiple nuclei. To understand      on, the other off. The difference can be seen
and which mentions any positive
                                          osteoclasts various experimental and in-silico    in the different conformations of the chains
regulation of protein kinase activity     techniques are used:                              in the front. Some of the overexpressed
such as MAPK or JUNK.                     Gene expression data captures the activity        gene products are relevant for human MAPK
                                          of genes. Figure b shows the result of a          signalling pathway according to Kegg
                                          screen of many genes over a period of time.       ( depicted in figure
Description of Research                   Red indicates over expression. The genes          d. Finally, ontologies are used to annotate
                                          are grouped according to the similarity of        the data. Figure e shows the definition of
                                          their expression profiles. For the overex-        “negative regulation of osteclast differentiation”
The    bioinformatics     group   of
                                          pressed gene products structural domains          according to GeneOntology,
REWERSE will build prototype appli-       and their interaction partners are found                       
cations to demonstrate the idea of a
rule-based Web for bioinformatics.
                                          Tools & Technologies
It brings groups together which
have used a host of of rule-based         All of the groups involved in the bioin-          Edinburgh has developed an ontolo-
techniques such as constraints for        formatics group have developed rule-              gy for tissues and linked it to in-situ
structure prediction (Jena, Lisbon),      and Web-based bioinformatics tools.               gene expression data. Bucarest has
logic programming for systems             Dresden has developed an ontology-                used rules and inductive logic pro-
integration, to reason over metabolic     based literature search engine                    gramming for characterizing diffe-
pathways and to classify expression       ( and a data-                    rentially expressed genes in micro-
data (Dresden, Paris, Bucarest,           base to determine protein structure               array data. Manchester has de-
Edinburgh), and ontologies to             interactions. Jena has developed                  veloped a host of tools to analyse,
annotate expression data and for          tools using constraints for structure             maintain and deploy biomedical
text-mining (Dresden, Bucarest,           prediction. Lisbon has used con-                  ontologies. Linköping has developed
Edinburgh, Manchester, Linköping).        straints for structure prediction and             tools to merge and maintain
The working group will build on           has developed the docking package                 ontologies.
some of these existing tools and          Bigger. Paris has developed a rule-
integrate them with rule-based Web        based reasoning engine for metabolic
technologies to demonstrate the           pathways                                          Impressum
idea of a Semantic Web for bioinfor-
matics.                                                                                     webXcerpt Software GmbH
                                                                                            REWERSE Technology Transfer
                                                                                            Aurbacherstr. 2, D-81541 Munich
 Contact Person                           Members

 Dr. Michael Schröder, Professor          Michael Schröder (Dresden);                       Contact: Andrea Kulas
 Biotec/Dept. of Computing                Rolf Backofen (Jena);                   
 TU Dresden                               François Fages (Paris);                           Phone: +49 89 54 80 88 48
                                          Werner Nutt (Edinburgh);
 Tatzberg 47-51                                                                             Responsible for the content:
                                          Carole Goble (Manchester);
 01307 Dresden, DE                                                                          Prof. Michael Schröder
                                          Pedro Barahona (Lisbon);
                                                                                            Dept. of Computing
 Phone: +49 351 463 40062                 Patrick Lambrix (Linköping);
                                                                                            TU Dresden
 Email:                     Liviu Badea (Bucharest);
                                                                                            Tatzberg 47-51, D-01307 Dresden
                                          Mikael Berndtsson (Skövde)                                                         
                                                                                            Phone: +49 351 463 40060