Docstoc

Converting LaTeX_ Mathematica and Maple Documents into XHTML with

Document Sample
Converting LaTeX_ Mathematica and Maple Documents into XHTML with Powered By Docstoc
					    Converting L TEX, Mathematica and Maple
               A

     Documents into XHTML with MathML

                                Ossi Mauno
                           Institute of Mathematics
                       Helsinki University of Technology
                                   Finland
                         e-mail: Ossi.Mauno@hut.fi




1    Background
At the Institute of Mathematics of the Helsinki University of Technology
there has been a research project called MatTa – Computer aided mathe-
matics for a period of over ten years. This year the project is being extended
to a national project called MatTaFi as part of the Finnish Virtual University.
In the past few years it became obvious that MathML is a very promising
way of representing mathematics on the web. The technology and convert-
ing tools were tested in the project MatTa by translating study material
into XHTM+MathML form. The original material constituted of LTEX  A

documents, Mathematica notebooks and Maple worksheets. The work was
done using TeX4ht and Mathematica’s and Maple’s own means for producing
MathML.
A new user interface was created to enable browsing the translated material.
The aim was to make the user interface as simple as possible and easy to be
modified to meet the demands of different kind of courses in which it could
be used. A PDF version of the study material was also made using the same
user interface.


2    Original material
The original material that was translated into XHTML+MathML form was
produced within the MatTa project and was originally designed to be rep-
resented on the web using HTML and therefore the material is divided into
short articles. The material forms a study package of ordinary differential
equations called DelTa. Mathmatica version and Maple version of the pack-
age are on the web site of MatTaFi project. The material was designed for a


                                      1
basic course in Helsinki University of Technology but it can be used as well
in other universities, polytechnics and even in mathematically oriented upper
secondary schools.
                                    A
The DelTa package constitutes of 73 LTEX documents, 42 Mathematica note-
books and Maple worksheets. The worksheets were made by rewriting the
notebooks with Maple’s syntax.
 A
LTEX documents do not have a preamble so they can be easily converted
to different forms like HTML and PDF by adding a suitable preamble. The
documents do have metadata fields and some of those fields are obligatory.
Metadata includes fields like title, date, nature of the document (theory,
example, calculation or application) and the name of the author. In the
metadata section there must be an entry of document’s classification under
Mathematics Content Dictionary, also developed in MatTa project. Some
 A
LTEX documents contain also eps figures. Mathematica and Maple docu-
ments have their metadata in separate tex files.

In the DelTa package there is also a Mathematica guide, a Maple guide and
an exercise collection.

In addition the package includes two Java applets for illustrating the solu-
tions of ordinary differential equations (DEW1 for first and DEWn for higher
order).




3    User interface

The aim was to create a simple tool for browsing different kinds of study
material on the net. The user interface has to be simple enough so that even
people who are not very familiar with HTML, XHTML or JavaScript can
understand and modify the code in order to alter the user interface to meet
the demands of the individual course they are giving. This means that there
are quite many comments among the source code. The user interface of the
HTML version of the DelTa study package was the starting point for the
development.

The user interface is a set of XHTML pages that include some JavaScript.
The appearance is set by Cascading Style Sheets. At the top of the Navig-
ation Window (see figure 1) there is a link to the help page. In the middle
there is an array of links. By clicking the links in the upper row of the array
(theory, solving, examples and applications), a corresponding table of con-
tents will appear below the array and a heading will appear to the blank
space at the top of the page. The links in the lower row (exercises, Java
applets, and Mathematica and Maple guides) open a separate window.


                                      2
                     Figure 1: The starting page




Figure 2: The Navigation Window with table of contents and an article
opened in Content Window




                                 3
3.1    XHTML+MathML version
First version of the user interface was made for browsing material that is
in XHTML form. For the present, the working version can be found at
http://matta.hut.fi/work/omauno/Navigaattori.html. There are sub-
headings in the table of contents. Each document can be opened in the
Content Window by clicking documents name or in a new window by click-
ing the adjacent [i]. When clicking the documents name, the document will
be opened in a Content Window, replacing the previous document in that
window, and that window will be raised to the top. See figure 2.
Mathematica and Maple documents in XHTML form can also be opened in
the Content Window or in a new window. In addition, users can open the
original Mathematica notebook or Maple worksheet by selecting nb or mws.
If converted documents contain links, they are at the end of that file.




Figure 3: Links, Navigation Window and a PDF document viewed with
Acrobat Reader



3.2    PDF version
The PDF version of the user interface was done for browsing documents that
are in PDF form. It was done by modificating the XHTML+MathML
version and it can be found at http://matta.hut.fi/work/omauno/
pdfNavigaattori.html for the time being.


                                     4
PDF documents are usually viewed with Acrobat Reader. If the application
is a browser plug-in, the user interface could have been almost identical to
XHTML version. If Acrobat Reader is not a browser plug-in, relative links
do not work. Because of this, links were not included in PDF documents but
they were placed in a separate link file and links to the link files were added
to the table of contents. See figure 3.
It is possible to include animations in PDF documents but in order to work
the animations need an external application. That would limit the function-
ality and that is why the animations were not included in the PDFs but
added as separate gif animations. Links to the animations were added to the
link file of the corresponding PDF document.


4     Making of XHTML+MathML version

4.1    Conversion of LTEX documents
                     A

A
LTEX documents were converted to XHTML+MathML form with Eitan M.
Gurari’s program TeX4ht. The work was done in Windows environment
using a bit old version of the program with command mzlatex.
      A
The L TEX documents contain only structured data and no preamble.
That makes it easier to use the documents for different purposes. For
the XHTML+MathML conversion following lines had to be added to the
beginning of the file

\documentclass[10pt]{article}
\usepackage[finnish]{babel}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}

\usepackage{xhtmlkonv,matht}             a¨
                                      % M¨arittelytiedostot, tex4ht

\begin{document}

and \end{document} to the end of the file.
File xhtmlkonv.sty includes definitions needed for XHTML+MathML con-
version and it loads for example hyperref package. File matht.sty consists
of commands, which make it easier to write math. Correspondingly there are
similar files as xhtmlkonv.sty for proofreading and PDF conversion.
Running mzlatex produces files that begin:

<?xml version="1.0" encoding="iso-8859-1" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"


                                     5
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" [
<!ENTITY mathml "http://www.w3.org/1998/Math/MathML">
]>
<?xml-stylesheet type="text/css" href="ryhfas.css"?>
<html
xmlns="http://www.w3.org/1999/xhtml"
>


This cannot be viewed with Internet Explorer. Therefore, the document’s
beginning must be revised in order to work properly with the most com-
mon browsers. This is done by using David Carlisle’s Universal MathML
stylesheet. An extra css file ratk.css has been added to modify the appear-
ance of background and it has no effect on the accessibility.


<?xml version="1.0" encoding="iso-8859-1" ?>
<?xml-stylesheet type="text/xsl" href="../xsl/mathml.xsl"?>
<?xml-stylesheet type="text/css" href="../css/ratk.css"?>
<?xml-stylesheet type="text/css" href="algeul.css"?>
<html
xmlns="http://www.w3.org/1999/xhtml"
>


4.1.1   Some notes of the converted material

                         A
In general, converting LTEX files with TeX4ht makes XHTML+MathML
that is quite clean and easy to read. Howerver, some problems exist.
Images are given height and width, in pixels, smaller than the actual size
of the images and thus the image quality is reduced on the screen. This is
easy to fix by removing those size parameters. The horizontal space (\quad)
between two images is omitted in conversion.
Lines connected to each other with curly braces (left or right) are not coded
within table. The result of the equation group below is shown in the figure 4.

                                  y = f (t, y, z),
                                  z = g(t, y, z).

If an open interval is marked as ]a, b[ in the source file, the result is:


<math xmlns="http://www.w3.org/ ... " display="inline" ></mrow>
<mo class="MathClass-close">]</mo></mrow>
...
<mrow><mo class="MathClass-open">[</mo><mrow></math>


                                        6
Figure 4: Equation groups are coded so that all the equations lie one after
another in a single line


This is erroneous MathML for the opening and closing tags of mrow elements
do not match.
In the original material absolute value is encoded either with | or a macro
\abs that uses \vert. The former option transforms into &#x2223; (DI-
VIDES) and the latter is coded with mfenced element. An alternative could
be to encode both cases with <mo>|</mo>.
In Finnish there are no prepositions and that is why the inflection of words
is done by adding an ending to a word like “j:nnen” meaning j th . That
is coded <math><mi>j</mi></math>:nnen and in some cases the linebreak
may happen between the word and its ending, which is not allowed. One
solution to this problem is to put the word and the ending inside an extra
mrow element: <mrow><mi>j</mi><mtext>:nnen</mtext></mrow></math>.
Derivatives are marked with prime (&#2032;) in superscript. This makes the
prime look a bit loose because it is situated so high if viewed with Mozilla.
On the other hand, if the prime would be on the baseline, it would look
something close to a slash in Netscape. See figure 5.




        Figure 5: The appearance of prime depends on the browser


White space handling depends on the browser used. In the following case
Mozilla and Netscape show space around R but IE does not.

ja</span>
<!--l. 43--><math
 xmlns="http://www.w3.org/1998/Math/MathML" display="inline" ><mi
>R</mi></math> <span
class="ecti-1000">ovat

In some cases there is no space between ending tag of math and the following
word. For example $y$ ja translates into </math>ja. Though, in most cases
this problem does not exist.


                                     7
             A
Piece of the LTEX code


                 a
ja $x = \text{mik¨ tahansa vakio}$, $y = 2$.


transforms into


<mtext >mik&#x00E4;&#x00A0;tahansa&#x00A0;vakio</mtext>
<!--/mstyle--></math>,


Something strange happens if it is viewed with Mozilla or Netscape. The
comma after the math is shown in the middle of the preceding word. See fig-
ure 6. The problem can be solved by using normal space instead of &#x00A0
(NO-BREAK SPACE, &nbsp;).




      Figure 6: Comma is shown in the middle of the preceding word




4.2    Conversion of Mathematica notebooks

Mathematica’s notebooks are converted to XHTML+MathML by selecting
Save as Special. . . and furthermore XML (HTML+MathML) from the File
menu. This was done using version 5.0. With default settings Mathem-
atica produces presentation markup. Mathematica’s own code is included
in MathML code with annotation element but it is done only in (error)
messages.
Animations are plotted as separate images. A series of images can be
transformed to an animated gif file. This is done by running a command
Export["anim", %, "GIF"] or Export["anim.gif", %] right after the
series of images is created. These commands make a file named anim.gif.
The animation made in this way is shown only once. The animation is
made continuous by using ImageMagick package and its convert routine:
convert -loop 0 old.gif new.gif. This makes the animation start from
the beginning right after it has reached its end.
Extra images are removed from the XHTML document and animations are
added by hand.


                                    8
4.2.1   Some notes of the converted material

Converted XHTML documents resemble quite much the original notebooks
and therefore it is easy to recognize that they were originally written as
notebooks. They are not identical and maybe the most evident difference is
that In[*]:= and Out[*]= are on their own lines. That makes the document
look more sparse.
The following is a list of problems we have found.
If notebook cell has a background color, it does not necessarily mean that
text in XHTML form has has the same background color. The html “cell”
has a colored background but a part of a text within it may have a white
background.
Outputs are in a single line that can be very long. This happens because
the outputs are coded inside mrow elements and browsers do not allow line
breaks inside mrow. One solution to divide long outputs into several lines is
simply remove some mrow elements.
The precision of numbers may be greater in MathML than in notebooks.
Number 2.0 is represented as 2. in input and inline equations.
Mathematica’s == encodes either <mo>==</mo> or <mo>&#10869;</mo> (two
consecutive equals signs).
Although the output is not math, it is coded with MathML, for example the
output with plots.
Part of the message
Remove::rmnsm : There are no symbols matching "Global‘*".                More...
transforms into

   <mtext>There are no symbols matching \&quot;</mtext>
   <ms>Global‘*</ms>
   <mtext>\&quot;. </mtext>

and when it is rendered by browsers it looks like like: Remove::rmnsm :
There are no symbols matching \" "Global‘*"\".More.... Inner quo-
tation marks are caused by the ms element that is rendered within quotation
marks.
There might be an extra mtext element that contains only a zero width space.
Mathematica gives a mathsize attribute to mstyle element. The value of
the attribute is given as a without unit though unit cannot be omitted.
Browsers do not recognize all characters in the documents. If something
                                                        <       >.
is omitted in long outputs, it is denoted with symbols < and > These
are encoded into Unicode characters that are situated in Private Use Area
block. Also IndentingNewLine is encoded into that area so it is natural
that browsers cannot handle these characters. Sometimes characters that


                                     9
are encoded properly into Unicode are not displayed by Mozilla, Netscape or
IE with MathPlayer, for example cross, &#10799;, is such a character. Some
characters are displayed only by MathPlayer, for example Mathematica’s :→
(RuleDelayed).


4.3     Conversion of Maple worksheets
The Maple worksheets were converted in Windows environment. If the work-
sheets were opened with Maple on Linux, a warning message was generated:
“This worksheet contains elements that are not supported on this platform.
Your worksheet may be incomplete.”
The conversion was done using version 8.00 by selecting Export As and fur-
thermore HTML with MathML. . . from the File menu. Maple proposes the
html as a file extension. Next the user has to select if he or she wants to use
frames and which way would be used for saving mathematical formulae. The
options are GIF, MathML 1.0, MathML 2.0 and MathML 2.0 with WebEQ.
Converting the DelTa package, no frames and MathML 2.0 were used. The
MathML generated is presentation markup.
Maple makes a link and a horizontal line at the top of each XHTML page.
The link points to an anchor which is situated just below the horizontal line.
This link is changed to point to the corresponding worksheet and made to
open a new window.
The size of plot images is determined by the size of the window in which
the worksheet was executed. The windows width also determines the length
of the output lines that are transformed into images. This means that the
conversion must be done using a reasonable window width.
Maple makes continuous animations in conversion but the file size is rather
big. Animations can also be made by clicking the right button of the mouse
over an animation and selecting Export As and furthermore Graphics Inter-
change Format (GIF)... from the menu. The file size can be substantially
reduced for example by using ImageMagick package and its convert routine:
convert old.gif new.gif.


4.3.1   Some notes of the converted material

The converted documents must be renamed with the file extension xml be-
cause otherwise Mozilla and Netscape do not render MathML properly.
Images made by the function plot cause an well-formedness errors because
there is an extra space at the end of img tags: / >. This space must be
deleted.
The whole content of the body element is enclosed in a basefont element
though there should not be anything inside a basefont element. This is fixed
in by removing both the starting and ending tag of the basefont element.


                                     10
After these modifications the documents can be viewed with IE, Mozilla and
Netscape.
En-dashes causes problems because of the differences in character sets used
by Windows and other operating systems.
Maple produces a lot of NO-BREAK SPACE (&nbsp;) characters and line
breaks. This causes an extra space to be shown if the document is viewed
with Mozilla and Netscape but not with IE.
The image quality may be reduced, though the change is not very big.
Outputs that are converted to images have blue font while outputs encoded
as MathML are rendered with black font.
Inline symbols like m2 transform to images. These images have an extra
space around them and the baseline of the symbols can be at different height
than the one of the surrounding text.
Unlike Mathematica, Maple does not create excessively mrow elements. That
means that breaking long outputs into two or more lines succeed better than
with the code produced by Mathematica.
Numbers may be presented in a different form. For example an inline expo-
nential notation 1016 transforms into 10000000000000000 and inline expres-
sion 10−6 into 1000000 . Numbers 0.1327259700 1021 and 0.04 transform into
                  1

.1327259700e21 and .4e-1.
In some cases there are more parentheses in the XHTML document than
in the original worksheet, for example parentheses are added inside a square
root. On the other hand, in some cases parenthesis are omitted. This happens
for example in fractional exponent. Sometimes removing parentheses may
                                                      ∂2
change the meaning of the expression, for example ( ∂x2 (∗))2 transforms into
 ∂2
∂x2
    (∗)2 .
The formulae are quite often represented in a different form in worksheets
and in MathML. For example the argument of cosine function may change its
sign, the order of terms in equations may change, equations may be multiplied
by −1, minus sign’s place may change from numerator to the front of the
                                                                               1
divisional line, π may transform into 1 π, character / into a division line, 3 6
     √          1
                  2
                              −1
                                       2
into 6 3 and √LC into (CL) 2 . The order of elements in sets may change and
the place of the imaginary unit may change from the end to the beginning in
the product.
Airy wave function, AiryAi, in output is encoded in two different ways:
<mi>Ai</mi> and <mi>AiryAi</mi>.
Maple produces erroneous MathML while trying to convert D2 into MathML.

<mrow>
 <diff id=’id79’/>
 <mo>&compfn;</mo>


                                      11
 <diff id=’id80’/>
</mrow>

Element diff is not a presentation markup element but a content markup
element and its use would also require bvar element.
Worksheet’s RootOf transforms into a different form and label disappears
in figure 7.




         Figure 7: Maple’s RootOf structure changes in conversion


In some cases ∂ symbols transform into d and sometimes they remain as a
partial derivative symbol. Sometimes operator D may disappear, for example
D(y) transforms into (y ).


4.4    Few notes of notebooks and worksheets
The reusability of Mathematica notebooks and Maple worksheet is somewhat
open to doubt. Material produced with earlier versions may not work prop-
erly with newer versions. Does this mean that the author must check all his
work whenever a new version of the program is released?
In Mathematica 5 there is a bug in DSolve that caused errors in many note-
books. Mathematica does not solve initial or boundary value problems for
ordinary differential equations with symbolic parameters in them. The prob-
lem can be avoided by modifying the notebook. This was not done since the
problem is probably fixed by the next version.
In Maple the command dsolve does not work similarly as in version 6. Prob-
lems were encountered also with piecewise defined function, loop and with
simplifying.
How to create dislay formulae with text like the below in Mathematica?
 EL = L dI(t)
         dt
                        a¨
                       k¨amin yli,
 ER = RI(t)                             a
                       vastuksen yli sek¨


                                    12
If this is done inside an input cell, it looks well but the MathML created is
strange.

 <mi fontfamily=’Times New Roman’ fontweight=’normal’>yli</mi>
 <mtext fontfamily=’Times New Roman’ fontweight=’normal’>   </mtext>
 <mi><mglyph fontfamily=’Times New Roman’ alt=’se’/></mi>
</mrow>

Part of a text is in mtext and a part in mglyph and a part is omitted in
MathML. Also the space created by space characters disappears. It is coded
in the following way: <mtext>                                  </mtext>.
Spaces in the beginning of an element are omitted so the mtext is treated
as an empty element. If the display formula is rewritten in text cell, the
conversion works better.
In Maple it is possible to write inline equation that looks like erx by typing
exp(rx). While this is converted to MathML, rx is encoded as <mi>rx</mi>.
By writing exp(r*x) equation is shown like e(rx) in the worksheet and r and
x are treated as separate identifiers.
Another example of something that works well on worksheet as inline equa-
tion but transforms into MathML in an unwanted way is denoting the electric
current with capital I (I(t)). This in encoded into MathML in the following
way.

<mn>&ImaginaryI;</mn>
<mo>&ApplyFunction;</mo>
<mfenced>
  <mi>t</mi>
</mfenced>

These examples show that when converting Mathematica or Maple docu-
ments they must be checked before converting into MathML. It is not enough
that notebook or worksheet looks well, something unexpected can still hap-
pen in the MathML conversion. Fortunately the cases where problems were
encountered were quite rare.


5    Making of PDF version
      A
The LTEX files of the DelTa package were converted to PDF documents
using commands latex and dvipdfm. An alternative would have been to use
command pdflatex but that would have required transforming the images
into PDF form before running the command.
Mathematica notebooks and Maple worksheets were converted to PDF files
with Adobe Acrobat 6.0 Professional. This was done by sending Mathematica
and Maple documents to an Adobe PDF printer.


                                     13
In evaluated notebooks animations are a series of images. All the images
except the first one of the animations were deleted and separate animation
files were made as described in subsection 4.2. Maple worksheets containing
animations do not require any treatment before sending to PDF printer. The
making of Maple animations is described in subsection 4.3.
The background of printed images is gray though the background on screen
or in original notebooks and worksheets is white. Otherwise, there are no
problems with the PDF conversion.




                                   14

				
DOCUMENT INFO
Shared By:
Stats:
views:45
posted:10/8/2010
language:Finnish
pages:14
Description: eXtensible HyperText Markup Language (XHTML), is a markup language, expression and Hypertext Markup Language (HTML) similar to, but the syntax is more strict. Speaking from the inheritance, HTML is based on Standard Generalized Markup Language (SGML) application, is a very flexible markup language, XHTML is based on Extensible Markup Language (XML), XML is an SGML subset. XHTML 1.0 in January 26, 2000 a W3C Recommendation.