Document Sample
GIVE IT A CLEVER TITLE_ Powered By Docstoc
					Genome Browsing and AJAX:
 Advancing GMOD’s GBrowse
      to the Next Level

         by Andrew Uzilov
   for Holmes Lab group meeting
          October 13, 2006
Genome browsers are not just
a good idea... they are The Way
 Necessary for visualizing and understanding
 large amounts of genomic information
 – genome organization (including synteny)
 – multiple splicing
 – comparing predictions against known data
 – some insights may be more obvious visually
   than through flat files, database queries, or
   writing custom programs for data analysis
What else are they good for?
• Retrieving information
  – point-and-click on features of interest – better
    interface for exploring
  – BLAST and other database searches get you a
    visual of the genomic context, not just text
• Prepare pretty pictures for publications
  – annotation upload feature is a must for this
• Better interface for community annotation
  (genome wiki)
• Genome feature WYSIWYG editor?
What are the problems with
current genome browsers? (1)
 Most Web-based genome browsers are
 static HTML pages – the entire page is
 refreshed (HTML generated anew by server)
 anytime the user navigates, changes layout,
 – Delay incurred while page reloads – annoying
 – Vertical scroll position lost – also annoying
 – Sometimes, JavaScript or Flash is used to
   provide some dynamic content (you can change
   certain things without triggering reload), but
   usually navigation still causes reloads
What are the problems with
current genome browsers? (2)
 Most (all?) Web-based genome browsers
 rely on the “server renders graphics from
 scratch upon client request” model
 – Images for genome views are rendered on
   demand, after user navigates, changes layout,
   etc., making the user wait
 – Rendered images aren’t reused or reusable –
   not saved or cached, rendered anew each time
 – There are difficulties in preparing pre-rendered
Pre-rendering difficulties
  It would be is to have all images rendered
  ahead of time, then just serve them up,
  requiring no “live” rendering overhead/delay
   – obstacle is that pixel width of genome views is
     quite, quite, quite large – can’t render view as
     single image, will run out of memory
   – can’t render in small parts either, as
     BioPerl/GBrowse* will not produce parts that
     concatenate into a nicely contiguous genome

* and probably other rendering frameworks
The Insight
• Make BioPerl think it is rendering a massively wide
  single image, but instead intercept all rendering
  calls to the graphics library (i.e. the graphics
  primitives) and store them in database
• Now, we can query the database for only a
  manageable subset of primitives (i.e. only those
  required for a single tile – the basic unit out of
  which the total genome view is constructed) and
  render only them, producing a reasonably-sized
  tile image
  – primitives’ coordinates are offset if they start in tiles prior
    to (left of) the current one
Basic philosophy
    – the client is an application
         • maintains internal state (no longer a static page)
         • knows how to render itself (old way: server
           generates the whole page’s HTML for you)
         • knows how to change itself dynamically (old way:
           server generates new HTML for you)
    – the server is a… well, literally, a server
         • pre-processes as much as possible to reduce
           session-time delays/overhead
         • off-loads as much work as possible on the client
         • all this reduces server load, speeding up session

* less trite name under review
So how does it work?
• Based on GMOD’s GBrowse framework
• The server-side GBrowse Perl code for
  rendering genome views (i.e. the
  gbrowse_img script for the CGI) was
  hacked apart and back together to be a
  standalone pre-rendering program that
  uses BioPerl and GD libraries in the
  same way as GBrowse
  – except intercepts calls to
    cache primitives and render tiles
• The client was written in JavaScript
  from scratch
     Server side - the original way

The GBrowse framework
(from Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE,
Harris TW, Arva A, Lewis S (2002). The generic genome browser: a building block for
a model organism system database. Genome Research 12(10), 1599-1610.
           Server side - the new way*

* or at least one current proposed new way (subject to change)
Server side – features currently
implemented (1)
• MySQL database
• tile rendering Perl:
  – intercepts Bio::Graphics::Panel calls to
    GD::Image (using AUTOLOAD) and stores
    them in database, keyed on the bounding box to
    which they apply
  – now, if we want to know which GD primitives
    need to be rendered for some tile, we just
    search the database for all primitives
    overlapping with the tile bounding box
Server side – features currently
implemented (2)
• tile rendering Perl:
  – uses to:
     •   fill MySQL database with graphics primitives
     •   render tiles from a given database of primitives
     •   generate XML containing client config info
     •   do any combination of the above, including on
         subsets of tiles (allows to break rendering down into
         jobs, suitable for rendering on multiple CPU)
Server side – features to be
implemented, short-term (1)
• tile serving module
  – pre-fill the database with all primitives, but only render
    them selectively as users request tiles
  – store already rendered tiles to prevent re-rendering
  – maybe idle server CPU cycles can be used to render
    arbitrary tiles, always filling the “tile space”
• should process externally
  rendered tiles, e.g.:
  – dotplot tracks
  – histograms
  – supporting material for features such as pictures of
    fluorescent gene expression profiles, physiological
    changes due to gene knockout experiments, etc.
Server side – features to be
implemented, short-term (2)
• database optimizations
  – query for primitives in the slow step; rendering takes
    much longer than loading database
  – key on tile number (1 key), not bounding box (4 keys)
     • [go to whiteboard]
  – gridlines account for >50% of the primitives, but are the
    same for every tile
     • maybe load gridlines for just one tile, and return them for every
• GUI to wrap
  – should be built into Web interface for annotation upload
Server side – features to be
implemented, long-term (1)
• how to serve up feature info?
  – short-term solution
     • have produce an XML file with
       feature data (bounding boxes, etc.) for each tile,
       since it has easy access to that info
     • client loads and parses an XML tile for each tile
  – more robust solution
     • need a database of features (but what kind?)
        – necessary to support efficient search for features
        – necessary for community annotation, because people will
          be changing the feature info constantly
Server side – features to be
implemented, long-term (2)
• community annotation
  – concurrency is an issue (updating changes,
    notifying client of updates since start of session,
    locking features for editing, etc.)
  – feature upload seems (to me) to be a special
    case of community annotation and should use
    its framework
  – quality control (registration, security)
  – are there existing database schemas or other
    frameworks that can serve this purpose?
Client side – features currently
• Dragging works, but with bugs when large
  views are involved (fix is non-trivial, in
• Also work: jumping, centering, zooming,
  dynamic resize
• Tracks can be toggled hidden/visible
• Hovering labels (either all on, or pop up
  on mouseover), with adjustable
Brief aside: what is AJAX?
• Asynchronous JavaScript and XML
• A combinations of technologies to make
  clients behave more like applications
  – JavaScript client code that uses
    XMLHttpRequest to asynchronously query
    the server for things
  – Implies XHTML (well-formed HTML) and
    DHTML (DOM manipulation), use of CSS
Why I am avoiding existing AJAX
• Useful for flashy graphics effects, but don’t help
  with the engine of the client (except maybe
  Prototype and Google Web Tookit)
   – but, GWT is closed source and an early version – even
     online demo has bugs
• None support dragging, track management, tile
  caching, etc… so that needs to be done ourselves
  (and has, so far, consumed most of the effort)
• But… I’m willing to consider them for:
   – adding graphics effects after engine is more developed
   – for asynchronous communication with the server
DOM: from XHTML to a tree
       <td>Shady Grove</td>
       <td>Over the River,

This is from a W3 page, so you know it’s right:
Client: the nitty-gritty (1)
• Code is broken down into multiple JavaScript
   – by which I mean just separate .js files, most of which
     are object instances that provide:
      • “class” functions and methods
      • namespaces
      • modularity, organization
• “Static classes” (standalone file, no instance)
   – Other.js – misc. helpers
   – Load.js – loads XML; when it is loaded, instantiates all
     objects in the correct order
Client: the nitty-gritty (2)
The Component system
• An attempt to bring order to chaos
• Each discrete UI element (e.g. main view,
  navigation panel, panel with track control
  buttons, etc.) is a Component
  – code for each component in its own file
• Components are
  – instantiated by Load.js
  – connected through ComponentInterface.js
     • should not modify other Component properties
       directly (although JavaScript allows this), but rather
       use ComponentInterface.js for sanity!
Client: the nitty-gritty (3)
• Each Component must define:
  – constructor
  – renderComponent()
    • returns the DOM node for this Component
    • will (eventually) be called by Load.js, which will then
      take the DOM node and append it to document
    • once fully implemented, there will be no need for
      content in <body> of XHTML – JavaScript will render
      everything dynamically
       – which allows for possibility of having a server-side config
         file specifying client-side layout, thus further removing
         users from the necessity of doing any programming
Client: the nitty-gritty (4)
• Each Component must also define:
  – getState()
     • for setting bookmarks/history
  – setState()
     • for restoring bookmarked/history points
  – some bookmarking object will eventually use
    the above methods to store/load bookmarked
    states by polling all Components
Client: the nitty-gritty (5)
•    If a programmer writes a new Component, they
     have to
    1) add accessors/modifiers for its object properties to
    2) add calls to constructor and renderComponent() to
•    However, eventually, accessor/modifier
     construction will be done automatically by
     ComponentInterface.js (in theory, it’s possible)
    –   this means that a Component programmer never has
        to look outside their own Component code, using the
        API for the other Components to access/modify them
Gods below! Was it really
necessary to take 5 slides for this?
 Yes, because “object-oriented” programming
 in JavaScript requires discipline, and its
 important to work these things out early on
  – with multiple people working on this code, it
    needs to be compartmentalized somehow
  – otherwise, debugging may cause blood pressure
    to rise to dangerous levels (although Venkman’s
    debugger will alleviate that)
  – see ComponentTemplate.js in SVN for a
    template, with guidelines on how to write a
    component of your own
Client: the nitty-gritty (6)
•   Current components
    –   ViewerComponent.js
    –   NavigationComponent.js
    –   TrackControlComponent.js
    –   DebugComponent.js
•   Other “classes”
    –   View.js
        •   stores limited information about current view
        •   intended to be the class that manages feature info fetching,
            caching, etc.
    –   TracksAndZooms.js
        •   just a data structure to hold config info from the XML file and
            current state info about what zoom level we’re at, and what
            tracks are hidden/visible
    –   These should really be prototypes for other objects
Client: the nitty-gritty (7)
• Dragging and genome view events
  – brace yourself, this is going to be ugly
• [go to whiteboard]
• Ideally, no one should have to deal with this
  after it’s been programmed, as it will be
  wrapped up in ViewerComponent.js, and
  navigation can be accomplished by using
  accessors to move view around
Client side – features to be
implemented (1)
• Client has no idea what the information on
  the tiles actually means (no knowledge of
  where and what the features are)
  – must be made aware of what it is displaying:
    short-term solution is load this from XML file for
    each tile (remember the server-side “to do”?)
  – the client JavaScript “class” for doing this can be
    later replaced with something more
    sophisticated, e.g. an XSL transformational
    grammar and XHR for fetching feature info from
    database… there are many possibilities
Client side – features to be
implemented (2)
• How can the user actually see the
  information about features?
  – pop-up menu on mouseover?
     • would have option to pop up details in separate
       window, manage annotation, etc.
  – displayed in a sidebar a la Google Local?
• There is no one True Answer, so maybe we
  can build all of the above and provide options
  to toggle between things
Client side – features to be
implemented (3)
• Feature search
  – by feature, keyword, regular expression, etc.
  – search results display:
    • pop open a table (load Component) displaying results;
      clicking on results in table will center the view on them
    • multiple views can open up stacked on one another
       – can be used to display synteny – link them all to a single
         horizontal dragging ruler
Client side – features to be
implemented (4)
• Posting things to server (what protocol? XML?
  – community annotation
  – feature upload
  – automated bug reporting system
• Needs to check for changes in server-side
  database, tiles rendered, etc., since
  community annotation may change contents
  that you are looking at
Client side – features to be
implemented (5)
• Bookmarking
  – entire state of browser encoded in URL
     • can use Web browser bookmarking to save
  – have internal tracking of history
     • internal back/forward buttons, log of what you did
  – every Component must have getState() and
    setState() defined to implement this
     • JSON would be perfect for this, no?
• Output current view to image (PNG, SVG,
Client side – features to be
implemented (6)
• The genome browser as a plug-in
  – runs in a little box on someone else’s website to
    show an example
This was written to the sounds of…

•   Tortoise – Standards
•   Jazz History Vol. 5 – Now As Then-Revival
•   Tosca – Suzuki
•   Aphex Twin - …I Care Because You Do
•   Squarepusher - Ultravisitor

Shared By: