Docstoc

dave

Document Sample
dave Powered By Docstoc
					Implementation of One Stop
Search by XSLT
           By Dave Low
      University of Hong Kong
            9-Dec-2003
Agenda
• Flow of One Stop Search
• Reason to use Extensible Stylesheet Language
  Transformation (XSLT)
• Difficulties on implementation of One Stop
  Search by XSLT
• Our solution
• Our implementation
• Summary
Flow of One Stop Search
1. Capture the search keyword
2. Issue the search to different search engines
3. Collect the result and click on next button until
   we got all the records
4. Compile the search results from different
   search engines
5. Present the result to the user
Flow of One Stop Search
              One Stop                        Science   Kluwer
                                   ProQuest
               Search                          Direct   Online


  Capture Keyword
                      Search and next
                            Search and next
                                 Search and next
             Compile Result


     Present Result
Reason to use XSL
• Simple
  – XSL is plain text
• Multiplatform
  – Can run on any machine with XSLT Engine
• Easy to maintain
  – When the output layout of target search engine
    change
     • Just change the content of XSL file
     • No recompilation is needed
Two main problems when using XSL
1. XSLT engine requires well formatted XML
   files as input
  –   Web based search engine output in HTML only
  –   HTML is not well formatted XML
      •   HTML allows open tag only for some tags
      •   E.g. <br>
Solution
1. Use HTML tidy (http://tidy.sourceforge.net/) to
   convert HTML to well-format XML
   –   “A HTML syntax checker and pretty printer. It can be used
       as a tool for cleaning up malformed and faulty HTML. In
       addition, it provides a DOM interface to the document that
       is being processed, which effectively makes you able to use
       it as a DOM parser for real-world HTML”
   –   It is open source
   –   It has many implementations such as Java, Perl and Python
Solution
• Sample code in Java
  StringReader strReader = new StringReader(html);
  Tidy tidy = new Tidy();
  return tidy.parseDOM(strReader, null);



• HTML => XML
Two main problems when using XSL
2. There is no browse function in XSL
  –   In one-stop search, we need to click the next
      button several times to collect all the result
  –   We need to tell the program to find the next button
      and then issue a browse request based on the URL
      of the next button
Solution
2. Add browse function to XSL by XSL
   extension
  –   XSLT allows two kinds of extension, extension
      elements and extension functions
  –   Type of extension depends on XSLT
      implementations
  –   Detail can be found
      http://www.w3.org/TR/xslt#extension
Solution
• Our implementation
  – Select a java based XSLT Engine
  – Use java to write the function
  – Compile it into classes and then jar
  – Include the jar file into the classpath of the XSLT
    Engine
  – Run it
Sample code on XSL extension
<?xml version="1.0" encoding="UTF-8"?>                   Define Class
                                                          to be used
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:HKUL="http://www.lib.hku.hk/java/hkul.apps.web.Browser"
exclude-result-prefixes="HKUL">

 <xsl:template match=“/">
                                                                       Create
  <xsl:variable name="url">http://www.lib.hku.hk/</xsl:variable>         it
  <xsl:variable name="browser" select="HKUL:new($url)" />
  <xsl:variable name="content" select="HKUL:browse($browser,$url)" />
  <xsl:apply-templates select="$content/html/*" />
 </xsl:template>                                            Call the browse
                                                               function
Our Implementation
 Browse




    Next   Tidy



                  Parse


                          Result
Our Implementation
• Both client and server programs are written by
  Java
• Client and server program communicated by
  HTTP
• Making use of wireless network
Our Implementation (Client side)
• Palm OS
  – Sun’s Java 2 Platform, Micro Edition (J2ME)
    http://java.sun.com/j2me/
  – Mobile Information Device Profile (MIDP)
    http://java.sun.com/products/midp
Our Implementation (Server side)
• Application Server (Running on Sun Solaris with
  JDK1.4)
   – Jakarta Tomcat (http://jakarta.apache.org/tomcat)
   – Jakarta Struts Framework (http://jakarta.apache.org/struts)
• Xerces XSLT Engine (http://xml.apache.org/#xerces)
• MySQL database (http://www.mysql.com)
Summary
• Implement the one stop search by XSLT
  – Simple
  – Multiplatform
  – Easy to maintain
• Two problems
  – HTML is not well formatted XML
  – No browse function in XSL
Summary
• Solutions
  – HTML Tidy
  – XSL Extension
• Implementation
  –   J2ME
  –   Jakarta Tomcat + Struts
  –   Xerces
  –   MySQL
Questions?
• Thank you

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:13
posted:4/22/2012
language:
pages:19