Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Proof-of-concept for XML table format - Download as DOC by hla22005


									Proof-of-concept for XML table format
These files constitute a browser-based demonstration of the application of XML to
cross-tab formatting. The demonstration requires no installed software other then
Internet Explorer 6, with MSXML level 4; a normal configuration on most business
computers. If MS Office (2002/XP or newer) is installed then tables can be prepared
for use with Word or Excel.
The demonstration was developed by enhancing a sample XML/HTML application
from Microsoft. It has been tested with Windows 2000 and XP, and both Office 2002
and 2003, though not in all combinations.
This document describes how to use the demonstration and reports some lessons
learned about the feasibility of browser-based processing.
Installation notes:
      In Windows XP under service pack 2, you may run into difficulties due to the
       very restrictive default Internet Explorer security settings for local files. If so,
       you may find the notes in the accompanying file doc\IFRAME ERROR
       IE6_SP2.doc helpful.
      The XSLT transformations rely on MSXML4. This is included by default, we
       believe, in the .NET framework 1.1 that PW clients are encouraged to install.
       MSXML may be installed independently from a Microsoft download:

The p.o.c. contains the following in its top level directory:
      readme.doc: this file
      xml: subfolder containing a sample XtabML file:
           o Holiday2.xml: XtabML rendition of a Pulsar Web demo table
             (formatted values)
           o Holiday2-can.xml: XtabML rendition of the same table (canonical
      xsl: subfolder containing xsl scripts implementing the demonstration:
           o XtabML2toc.xsl: converts XtabML to HTML for viewing and printing.
             Creates a table of contents if there is more than one table in the
             XtabML document (all_XtabML.doc).
           o XtabML2XMLSS.xsl: converts XtabML to XMLSS (Excel XML
           o rawXML.xsl: shows the contents of an XML file
      index.hta: the page to open to start the demonstration
After opening index.hta there is a control window on the left and an output window
on the right.
There is a selection of XML and XSL files shown in the control window;. The output
window shows IEs’ interpretation of the result of applying the current XSL file to the
current XML file.
The output window contents are only interesting if the XSL is either the ‘raw XML’,
which makes any XML file intelligible or an XSL that generates HTML output such
as XtabML2toc.xsl. Otherwise it shows just the text content of the result of the XSLT;
however by right-clicking on the lower part of the right-hand window you may view
(and save) the XSLT output. You might want to do this, for example, to see the XML
created by XtabML2XMLSS.
The transformation will be re-executed each time you select a new XML or XSL file,
or click the Reformat button.

XSLT parameters
There is a capability to provide parameters to the XSLT transformations. After
specifying parameter names and values click Reformat to see the effect.
The following parameters are implemented (case is significant):
Name                  Interpretation         Default               Applies to
bcw                   Data column width      60                    XtabML2toc
                      (points/ pixels)                             XtabML2XMLSS
lcw                   Label column           160                   XtabML2toc
                      width                                        XtabML2XMLSS
style                 Select either Pulsar   PW (XtabML2toc)       XtabML2toc only
                      Web or ‘classical’     Classical
                      table layout           (XtabML2XMLSS)
bg                    Colour of table        Navy. Use white to    XtabML2toc only
                      border                 suppress the border
dark                  Colour of group        "#ACD3FF" (as in      XtabML2toc only
                      and summary            PW)
light                 Colour of element      "#E4F0FF" (as in      XtabML2toc only
                      headings               PW)
The simplest way to get a presentable printed document is to save the output as .htm,
open the resulting file table.htm in IE, set page to landscape, and print.

MS Office
The output from the XSLT may be saved as a .htm (HTML), .doc (Word) or .xls
(Excel) file. Clicking the appropriate button saves your current output in table.htm
etcetera. Note that this affects ONLY the saved file type; the contents of the file are
whatever the XSLT generated. Both Word and Excel import HTML reasonably well
if that is what they find in the file they are given.


If you use the XtabML2toc transformation and save the result as .xls Excel does load
the table or tables. However the interpretation of CSS styles is not very satisfactory
and all tables appear in one worksheet.
The XtabM2XMLSS XSLT script generates an XMLSS document which is Excels’
native XML format. This allows more flexibility in the generated document; for
example each table is placed in its own worksheet.
Note that the Excel sheet is achieved with some ‘cheating’. Strictly speaking the data
values can only be interpreted as text. If you import the cells as text then Excel
decorates each text cell containing a number with a green tick, and worse, you lose
the capability to do spreadsheet arithmetic operations on the content. So the XSLT
enters the data cells as numeric or percentage on the basis of inspecting the value, and
is therefore locale-dependent. The XtabML format should include canonical values,
for a truly robust conversion to Excel.


You may save the output of XtabML2toc as a .doc file and Word will import it. The
interpretation of CSS styles in the HTML is reasonably complete, and the appearance
is satisfactory, but it takes a long time to load. This is a feature of Word in general
however; if you save the document as a regular Word .doc file the load times do not
change significantly and the document size becomes several times larger than the
The resulting document may be saved in regular Word .doc format. Note that print or
print preview on larger documents seems to put Word in a loop. The cause is not yet
fully understood.
Word like Excel has its own native XML format (WordML). Generating this from an
XSLT is a potential alternative route from XtabML to Word..
The XSLT scripts in the XSL folders are just XML files themselves and may be
edited with a text editor. There are two standard ‘include’ files used by the
   XtabML_library.xsl: generally useful templates for XML and XtabML
   XtabML_include.xsl: standard useful material for any stylesheet transforming
The easiest customisation to start with is to edit the CSS styles found in
Note that the Xtabml2toc and XtabML2XMLSS XSLT scripts are both about 400
lines. It doesn’t need a lot of code to get useful output by transforming XtabML.

The browser-based XML table viewing experience
It is clear from use of the demonstration that it is perfectly feasible to download and
transform an XML table. Single tables transform almost instantaneously and even
larger documents with many tables transform in a very few seconds. Download time
should not be an issue as the XtabML source adds very little overhead to a plain text
file containing the tables.
So this is very good news so far as generating alternative table formats online in
Pulsar Web is concerned.
However, there are problems if conversion to non-HTML formats is attempted in the
browser rather than the server, and the proof-of-concept reveals them.

Incomplete CSS support
       IE does not implement all CSS features, in particular print media features, so
       that it is not possible to force landscape mode in the text of the document.
       Arguably, why generate HTML/CSS rather than Office formats? The
       attraction of using HTML and CSS, especially for printed tables, is that if you
       don’t use them you have to re-invent them. Users will want to choose fonts,
       text style/ alignment/ rotation, backgrounds, borders, colours etcetera. CSS
       provides an established, comprehensive, well-documented and widely-known
       mechanism to do this. It also provides a mechanism to separate presentation
       from content with style sheets. This is important for minimising document size
       and facilitating customisation.

Lack of control of output interpretation
       Although XSLT scripts can generate output documents that invoke Excel or
       Word when opened by Windows Explorer after saving to file, it does not seem
       to be possible to generate them within a browser window and have the same
       effect. Hence we see a stream of text in the output window rather than an
       invocation of Excel or Word.
Both of these problems can be addressed by scripting. A script can save the output in
a temporary file, start an Office application, load the document into it and use the
Office object model to fill in the parts that aren’t reached by the various Microsoft
implementations of CSS.
Unfortunately use of sophisticated scripting features runs into browser security and
virus checking limitations, meaning that the operational issues of supporting browser-
based export to Office without saving to an intermediate file in a user dialog are
similar to requiring download of a standalone viewer program
Alternatives to ‘pure’ browser.
Server-based transformations
If the conversion to (say) XMLSS for Excel is done in the server, then HTTP comes
to the rescue by providing content-type and content-disposition headers. These cause
IE to do automatically what we seem otherwise to have to do with script: save the
document in a temporary file and open the appropriate application on the file in a
browser window. The drawback of course is the load on the server running the XSLT

Office add-ins
It is perfectly feasible to prepare Excel and Word macros to interface with Pulsar Web
to download an XML table document and then transform it, much the same as the
Quantum conversion in Leap. So we might create an add-in for Excel to make Excel
into a table viewer, and another for Word to make Word into a table publisher.

.net viewer application
Many of the security issues relating to scripting are addressed quite elegantly in .net.
It is very likely that we could implement a simple table viewer application for users of
the .net framework. This has not been investigated

InfoPath looks to be a very suitable platform to develop an XML table viewer that can
be offered to users of Office 2003 Professional. This has not been investigated.

Conclusions and next steps
We believe the proof-of-concept demonstrates that an XML format for tables can be
an efficient and flexible delivery mechanism for cross-tabulations.
Assuming that products such as Pulsar Web and STAR are modified to generate XML
tables, we think the simplest way forward to create a production table publication
system is by using MS Office automation to extend Excel and Word, and to create a
generic table pagination tool.

MS Office integration
The drawbacks of the browser-based solution may be overcome by using automation
in Office. Comparatively little script is required because the heavy lifting is done by
XSLT called from the script
An Excel add-in may populate a spreadsheet with the contents of an XML table and
add features such as print settings and scrolling regions. It could also provide some
user customisation, e.g. selection or reordering of statistics.
A Word template could also import an XML table, convert to HTML (or WordML)
and save the resulting document with appropriate print settings and document
As has been noted Word seems to fail importing large tables. In any case none of the
applications (Explorer, Word or Excel) that might be used to prepare a print document
can paginate a large table satisfactorily. It is not enough simply to reprint a few rows
from the top of the table. It is important to preserve row (and column) grouping and in
some markets to present total rows at the bottom rather than the top of the table.
There are more sophisticated models of page content such as XSL Formatting
Objects. These still do not support all the pagination requirements and have some
serious drawbacks:
      there is no support from major software vendors
      open-source implementations (e.g. Apache FOP) are incomplete and very slow
So even if we take the Formatting Objects route it is still necessary to paginate the
document. An attractive approach is a ‘filter’ that would take an XML table and add
attributes to its edge descriptions at suitable page breakpoints.
This way we can leverage all the expressive power of HTML/CSS..

XSLT library
The demonstration files show how a library of common templates might be
implemented. For production use these should be reorganised and documented so that
they are a good platform for authors of new scripts.

To top