Proof-of-concept for XML table format These files constitute a browser-based demonstration of the application of XML to cross-tab formatting. The demonstration requires no installed software other then Internet Explorer 6, with MSXML level 4; a normal configuration on most business computers. If MS Office (2002/XP or newer) is installed then tables can be prepared for use with Word or Excel. The demonstration was developed by enhancing a sample XML/HTML application from Microsoft. It has been tested with Windows 2000 and XP, and both Office 2002 and 2003, though not in all combinations. This document describes how to use the demonstration and reports some lessons learned about the feasibility of browser-based processing. Installation notes: In Windows XP under service pack 2, you may run into difficulties due to the very restrictive default Internet Explorer security settings for local files. If so, you may find the notes in the accompanying file doc\IFRAME ERROR IE6_SP2.doc helpful. The XSLT transformations rely on MSXML4. This is included by default, we believe, in the .NET framework 1.1 that PW clients are encouraged to install. MSXML may be installed independently from a Microsoft download: http://www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2- 46da-b4b6-c5d7485f2b42&DisplayLang=en Contents The p.o.c. contains the following in its top level directory: readme.doc: this file xml: subfolder containing a sample XtabML file: o Holiday2.xml: XtabML rendition of a Pulsar Web demo table (formatted values) o Holiday2-can.xml: XtabML rendition of the same table (canonical values) xsl: subfolder containing xsl scripts implementing the demonstration: o XtabML2toc.xsl: converts XtabML to HTML for viewing and printing. Creates a table of contents if there is more than one table in the XtabML document (all_XtabML.doc). o XtabML2XMLSS.xsl: converts XtabML to XMLSS (Excel XML format) o rawXML.xsl: shows the contents of an XML file index.hta: the page to open to start the demonstration Usage After opening index.hta there is a control window on the left and an output window on the right. There is a selection of XML and XSL files shown in the control window;. The output window shows IEs’ interpretation of the result of applying the current XSL file to the current XML file. The output window contents are only interesting if the XSL is either the ‘raw XML’, which makes any XML file intelligible or an XSL that generates HTML output such as XtabML2toc.xsl. Otherwise it shows just the text content of the result of the XSLT; however by right-clicking on the lower part of the right-hand window you may view (and save) the XSLT output. You might want to do this, for example, to see the XML created by XtabML2XMLSS. The transformation will be re-executed each time you select a new XML or XSL file, or click the Reformat button. XSLT parameters There is a capability to provide parameters to the XSLT transformations. After specifying parameter names and values click Reformat to see the effect. The following parameters are implemented (case is significant): Name Interpretation Default Applies to bcw Data column width 60 XtabML2toc (points/ pixels) XtabML2XMLSS lcw Label column 160 XtabML2toc width XtabML2XMLSS style Select either Pulsar PW (XtabML2toc) XtabML2toc only Web or ‘classical’ Classical table layout (XtabML2XMLSS) bg Colour of table Navy. Use white to XtabML2toc only border suppress the border dark Colour of group "#ACD3FF" (as in XtabML2toc only and summary PW) headings light Colour of element "#E4F0FF" (as in XtabML2toc only headings PW) Print The simplest way to get a presentable printed document is to save the output as .htm, open the resulting file table.htm in IE, set page to landscape, and print. MS Office The output from the XSLT may be saved as a .htm (HTML), .doc (Word) or .xls (Excel) file. Clicking the appropriate button saves your current output in table.htm etcetera. Note that this affects ONLY the saved file type; the contents of the file are whatever the XSLT generated. Both Word and Excel import HTML reasonably well if that is what they find in the file they are given. Excel If you use the XtabML2toc transformation and save the result as .xls Excel does load the table or tables. However the interpretation of CSS styles is not very satisfactory and all tables appear in one worksheet. The XtabM2XMLSS XSLT script generates an XMLSS document which is Excels’ native XML format. This allows more flexibility in the generated document; for example each table is placed in its own worksheet. Note that the Excel sheet is achieved with some ‘cheating’. Strictly speaking the data values can only be interpreted as text. If you import the cells as text then Excel decorates each text cell containing a number with a green tick, and worse, you lose the capability to do spreadsheet arithmetic operations on the content. So the XSLT enters the data cells as numeric or percentage on the basis of inspecting the value, and is therefore locale-dependent. The XtabML format should include canonical values, for a truly robust conversion to Excel. Word You may save the output of XtabML2toc as a .doc file and Word will import it. The interpretation of CSS styles in the HTML is reasonably complete, and the appearance is satisfactory, but it takes a long time to load. This is a feature of Word in general however; if you save the document as a regular Word .doc file the load times do not change significantly and the document size becomes several times larger than the HTML. The resulting document may be saved in regular Word .doc format. Note that print or print preview on larger documents seems to put Word in a loop. The cause is not yet fully understood. Word like Excel has its own native XML format (WordML). Generating this from an XSLT is a potential alternative route from XtabML to Word.. Customisation The XSLT scripts in the XSL folders are just XML files themselves and may be edited with a text editor. There are two standard ‘include’ files used by the transformations: XtabML_library.xsl: generally useful templates for XML and XtabML XtabML_include.xsl: standard useful material for any stylesheet transforming XtabML. The easiest customisation to start with is to edit the CSS styles found in XtabML_library.xsl. Note that the Xtabml2toc and XtabML2XMLSS XSLT scripts are both about 400 lines. It doesn’t need a lot of code to get useful output by transforming XtabML. The browser-based XML table viewing experience It is clear from use of the demonstration that it is perfectly feasible to download and transform an XML table. Single tables transform almost instantaneously and even larger documents with many tables transform in a very few seconds. Download time should not be an issue as the XtabML source adds very little overhead to a plain text file containing the tables. So this is very good news so far as generating alternative table formats online in Pulsar Web is concerned. However, there are problems if conversion to non-HTML formats is attempted in the browser rather than the server, and the proof-of-concept reveals them. Incomplete CSS support IE does not implement all CSS features, in particular print media features, so that it is not possible to force landscape mode in the text of the document. Arguably, why generate HTML/CSS rather than Office formats? The attraction of using HTML and CSS, especially for printed tables, is that if you don’t use them you have to re-invent them. Users will want to choose fonts, text style/ alignment/ rotation, backgrounds, borders, colours etcetera. CSS provides an established, comprehensive, well-documented and widely-known mechanism to do this. It also provides a mechanism to separate presentation from content with style sheets. This is important for minimising document size and facilitating customisation. Lack of control of output interpretation Although XSLT scripts can generate output documents that invoke Excel or Word when opened by Windows Explorer after saving to file, it does not seem to be possible to generate them within a browser window and have the same effect. Hence we see a stream of text in the output window rather than an invocation of Excel or Word. Both of these problems can be addressed by scripting. A script can save the output in a temporary file, start an Office application, load the document into it and use the Office object model to fill in the parts that aren’t reached by the various Microsoft implementations of CSS. Unfortunately use of sophisticated scripting features runs into browser security and virus checking limitations, meaning that the operational issues of supporting browser- based export to Office without saving to an intermediate file in a user dialog are similar to requiring download of a standalone viewer program Alternatives to ‘pure’ browser. Server-based transformations If the conversion to (say) XMLSS for Excel is done in the server, then HTTP comes to the rescue by providing content-type and content-disposition headers. These cause IE to do automatically what we seem otherwise to have to do with script: save the document in a temporary file and open the appropriate application on the file in a browser window. The drawback of course is the load on the server running the XSLT transformation. Office add-ins It is perfectly feasible to prepare Excel and Word macros to interface with Pulsar Web to download an XML table document and then transform it, much the same as the Quantum conversion in Leap. So we might create an add-in for Excel to make Excel into a table viewer, and another for Word to make Word into a table publisher. .net viewer application Many of the security issues relating to scripting are addressed quite elegantly in .net. It is very likely that we could implement a simple table viewer application for users of the .net framework. This has not been investigated InfoPath InfoPath looks to be a very suitable platform to develop an XML table viewer that can be offered to users of Office 2003 Professional. This has not been investigated. Conclusions and next steps We believe the proof-of-concept demonstrates that an XML format for tables can be an efficient and flexible delivery mechanism for cross-tabulations. Assuming that products such as Pulsar Web and STAR are modified to generate XML tables, we think the simplest way forward to create a production table publication system is by using MS Office automation to extend Excel and Word, and to create a generic table pagination tool. MS Office integration The drawbacks of the browser-based solution may be overcome by using automation in Office. Comparatively little script is required because the heavy lifting is done by XSLT called from the script An Excel add-in may populate a spreadsheet with the contents of an XML table and add features such as print settings and scrolling regions. It could also provide some user customisation, e.g. selection or reordering of statistics. A Word template could also import an XML table, convert to HTML (or WordML) and save the resulting document with appropriate print settings and document characteristics. Pagination As has been noted Word seems to fail importing large tables. In any case none of the applications (Explorer, Word or Excel) that might be used to prepare a print document can paginate a large table satisfactorily. It is not enough simply to reprint a few rows from the top of the table. It is important to preserve row (and column) grouping and in some markets to present total rows at the bottom rather than the top of the table. There are more sophisticated models of page content such as XSL Formatting Objects. These still do not support all the pagination requirements and have some serious drawbacks: there is no support from major software vendors open-source implementations (e.g. Apache FOP) are incomplete and very slow So even if we take the Formatting Objects route it is still necessary to paginate the document. An attractive approach is a ‘filter’ that would take an XML table and add attributes to its edge descriptions at suitable page breakpoints. This way we can leverage all the expressive power of HTML/CSS.. XSLT library The demonstration files show how a library of common templates might be implemented. For production use these should be reorganised and documented so that they are a good platform for authors of new scripts.
Pages to are hidden for
"Proof-of-concept for XML table format - Download as DOC"Please download to view full document