Document Sample
GSoC-2010-ProposalByNirmalFernando Powered By Docstoc
					 Google Summer of Code 2010 - Project Proposal

                Derby-4587- Add tools for improved analysis and understanding of query
                                   plans and execution statistics [1]
                C. S. Nirmal J. Fernando.
IM              nirmal070125 (Google Talk)

Apache Derby, an Apache DB subproject, is an open source relational database implemented
entirely in Java and available under the Apache License, Version 2.0, and is a very interesting
piece of software. Derby is based on Java, JDBC and SQL. Quite frequently, users of
Derby have troubles in comprehending

         (a) how their query is being translated into a query plan by the optimizer

         (b) what is the execution-time, resource usage at the various steps of a query plan

The objective of this project is to provide visual displays to help people understand the way
their query is being run.

Derby is known as a system which is easy to install, deploy and maintain in production
environments. Latest official release of Derby is, and was released in year 2009.
Apache Derby (here after mention as Derby) has an active developers’ community behind the
wall, who are dedicating, hours of their time to improve/ fix bugs that are raised by the users.
The Derby developers’ community is really helpful and they are easy to work with.

Users, of an open source project like Derby are the most important group that will take the
software an extra mile from what it is at a particular instance. That’s why the developers’ pay
a great attention for the issues or improvements that are raised by users of Derby. This project
addresses such an improvement that users of Derby are expecting. Quite frequently, they
have troubles in comprehending the steps followed by the optimizer when it translates a
query into a query plan, and details such as time to execute, resource utilization at each step
of the query plan. There are low-level features in Derby which capture this information and
record it, such as logQueryPlan, and the XPLAIN tables, so this project will lead to a
development of a new tool which can process the query plan and execution statistics
information and present it in a more comprehensible fashion i.e. a graphical interpretation of
the query plan.
After went through some possible ways of implementing this tool which were suggested by
my mentor for this project Mr. Bryan Pendleton, we came across the following idea which
will display the query plan in a web browser.

                             Query plan

                 Emits                                      XSLT Compliant Browser
 Query Plan                   Raw XML

                              XSLT Style
                                Sheet                        XHTML page displayed

                      Graphical display template

                  Figure1: Process View of the tool

XSLT Style Sheet:

First I do a research on Derby to recognize all the possible instances that can be occur in a
query such as name of the table, sort, hash-join etc. Next I planned to write the template, as it
iteratively looks for the instance occurred at that particular step and displays the image +
details (execution time, resource usage), for all the steps that were followed when executing
the query. Here, image and details can be obtained through the XML file emitted by the query

Raw XML Document:

The XML document should contain a predefined set of tags for all possible instances of a
query as mentioned above and each tag (Child) should contain sub-children which will
describe the image + details. And this will create at the run time in order of the execution

Next I will link Raw XML document and the XSL style sheet. This step is fairly easy; I just
need to provide the reference to the XSL style sheet inside the XML document.

After that XSLT compliant browser will transform XML document generated by query plan
into a XHTML page, which will show the graphical query plan + details.

We planned to approach the complete graphical query explainer, incrementally, starting from
very basic queries.
Things Done So Far
     Checked out Derby trunk from the SVN
     Setup my development and testing environment
     Built Derby from the source and got familiar with the folder structure, functionalities
     Searched and went through the explain query plans of other well-known open source
      database management systems such as PostgreSQL-PgAdmin tool, MySQL.
      PgAdmin provides a good graphical explain of a query and support for many
      functionalities than MySQL. So, checked the possibility of using PgAdmin tool with
      Derby, and found that PgAdmin is not allowing, using other database management
     I have read a query execution plan for a simple query executed by Derby and
      manually wrote XML document for that execution plan, created a sample XSLT style
      sheet to prototype the “Derby Query Explainer”, and these are attached under [1].
     Currently I am assigned myself to the issue Derby-4406-Wrong order when using
      ORDER BY on non-deterministic function [2], and hope to fix it soon.

     Test cases required to verify the functionality of the graphical query explain tool.
     Code of the tool that emits the data in XML tagged format.
     Code of the tool that format XML data into visual information in a browser.
     XSLT style sheet which templates the graphical output.
     Documentation of new tools and screen-shots of the graphical output for sample

Development Schedule
     10th May 2010 – Complete reading the documentation and analysing the existing
      Derby code base while paying higher attention on the relevant areas to the project.
     15th May 2010 – Finalize the process view after getting the comments from the
      developers’ community and from my mentor and finalize the way that a user might
      view the graphical explainer.
     23rd May 2010 – Recognize all the possible instances that can be occur in a query and
      complete developing test cases required to test the tool.
     6th of July 2010 – Build a tool (a small Java program using standard JDBC) which
      can read the query execution data for a SQL statement from the Derby XPLAIN
      tables and emit the data in XML tagged format. This involves specifying the XML
      schema for the data, writing the program to produce the data, and building tests for it.
     12th of July 2010 – Submit workings for mid-term evaluations.
     31st of July 2010 – Build a tool to format the XML-formatted data into visual
      information in a browser, using an XSLT style sheet. This involves conceptualizing
       the visual display, designing and writing the XSLT style sheet, and building tests for
      9th August 2010 – Complete testing the new tool and create documentation for it and
       update Derby documentation.
      16th August 2010 – Submit workings for final evaluations.

About Me
I am Nirmal Fernando, 3rd year undergraduate at Department of Computer Science and
Engineering at University of Moratuwa, Sri Lanka. I am very competent at Java
programming language, OOP, relational database schemas, SQL, XML, XSL and data
structures and algorithms. I had gone through a course module in Database Management
Systems, few months’ back that will help me to understand the underlying concepts fairly

I had created an extension for as my level-3-semester-I project [3].

I am looking forward to have an exciting summer with Apache Derby, and I consider this is a
great opportunity to me, to apply my knowledge and skills into a real world application which
benefits many people.

Community Interaction
Apache Derby developers’ community is the main community behind this project and I
highly appreciated comments/ ideas of the expert developers of Derby and consider those as a
great opportunity to learn and contribute more and more to the improvement of Derby.

I already got subscribed to the derby-dev mailing list and getting the continuous feed backs
and ideas regarding the project as well as the other issues of Derby.

[1] Project Idea on Apache JIRA

[2] Derby-4406-Involvements

[3] Open extension

Shared By:
Tags: GSoC