A European Model for Bioinformatics Research and Community
Network of Excellence
Life Sciences, Genomics and Biotechnology for Health
2.1.23 Public standard interface HitKeeper services
Due date of deliverable: 31.1.2008
Actual submission date: 29.1.2008
Start date of project: 1.2.2005 Duration: 60 months
Organisation name of lead contractor for this deliverable: EMBL-EBIDel 2.1.23.
Public standard interface to HitKeeper services
Authors of the deliverable: Heinz Stockinger and Marco Pagni.
HitKeeper  is the core database system behind the MyHits  web site. HitKeeper
takes care of incremental updates of data stored in a relational database and provides
advanced query functionalities. The software package is tightly integrated with the
MyHits web site but it can also be used as a stand-alone application and offers several
public interfaces including command line and application programming interfaces. The
new version of the service now also provides a Web Services interface. Since the
package is available via sourceforge, it will be possible to deploy it at different sites for
serving sequence annotation data.
HitKeeper  is a software package that controls the fully automatic handling of multiple
biological databases (protein sequences, protein motifs and domains, taxonomic
classifications) and of hit list calculations (sequence versus motif) on a large scale, using
a relational database management system in the back-end. The software implements an
asynchronous update system that introduces updates and computes hits as soon as new
data become available. A query interface enables the user to search sequences by
specifying constraints, such as retrieving sequences that contain specific motifs, or a
defined arrangement of motifs (“metamotifs''), or filtering based on the taxonomic
classification of a sequence. Overall, HitKeeper provides a generic and modular
framework to handle the redundancy and incremental updates of biological databases,
and an original query language. Simply put, it interprets queries and translates them into
SQL, eventually executes them and reformats the results (cf. Figure 1).
Early versions of HitKeeper provided only command line interfaces (CLI). The command
line syntax consists of more than 50 methods that accept named parameters as optional
arguments. For example, the following command can be used to retrieve all glucokinases
seq_query seq_source=sw desc_text=glucokinase
In this deliverable, we added a Web services interface to the existing service.
The main design principle behind our service is to allow for several ways to access the
HitKeeper (Web) service and to auto-generate client and server components to allow for
simple Web service creation and access. The new HitKeeper package auto-generates
several other useful components and interfaces based on a simple “configuration file”
where the service interface is described:
WSDL file WS-I Basic Profile 1.1 compliant.
Perl documentation (perldoc) and on-line documentation in HTML
HTML server - This includes an HTML-form based interface as depicted in
REST server - The service provides a pure HTTP-based Web interface and auto-
generated clients that can be called via the HitKeeper Web site.
As a result, the HitKeeper service can be access via Web Services (SOAP and REST), a
command line interface as well as HTML forms as depicted in Figure 3.
The complete HitKeeper distribution, including exhaustive documentation, is available
under the GNU public license from the HitKeeper's Web site. An article describing the
Web Service layer has recently been submitted for publication . This interface will be
put into production in the back-end of the MyHits web site in course of 2008.
Source code and documentation:
 J. Hau, M. Muller, M. Pagni. HitKeeper, a Generic Software Package for Hit List
Management, Source Code Biol Med. 2:2, 2007.
 M. Pagni , V. Ioannidis, L. Cerutti, M. Zahn-Zabal, C.V. Jongeneel, J. Hau, O.
Martin, D. Kuznetsov, L. Falquet. MyHits: improvements to an interactive resource
for analyzing protein sequences. Nucleic Acids Res. 2007 Jul 1;35(Web Server
 M. Pagni, J. Hau, H Stockinger. A Multi-Protocol Bioinformatics Web Service: Use
SOAP, Take a REST or Go With HTML. Submitted for publication in Nov. 2007.
Figure 1. Software organisation of HitKeeper's core. If a request arrives, it is interpreted
within the context of previous requests (query stack). After the query is prepared, it is
``compiled'' into SQL and eventually executed, or stored, on the query stack. The results
are then translated into several optional formats via built-in drivers.
Figure 2. HTML form interface to HitKeeper.
SOAP Interface REST Interface Command Line HTML Form
Figure 3. The HitKeeper service can be accessed in several ways and via several
protocols (SOAP, REST, HTML form (cf. Figure 2) and via a CLI).