VIEWS: 45 PAGES: 14 CATEGORY: Internet / Online POSTED ON: 10/8/2010
eXtensible HyperText Markup Language (XHTML), is a markup language, expression and Hypertext Markup Language (HTML) similar to, but the syntax is more strict. Speaking from the inheritance, HTML is based on Standard Generalized Markup Language (SGML) application, is a very flexible markup language, XHTML is based on Extensible Markup Language (XML), XML is an SGML subset. XHTML 1.0 in January 26, 2000 a W3C Recommendation.
Converting L TEX, Mathematica and Maple A Documents into XHTML with MathML Ossi Mauno Institute of Mathematics Helsinki University of Technology Finland e-mail: Ossi.Mauno@hut.fi 1 Background At the Institute of Mathematics of the Helsinki University of Technology there has been a research project called MatTa – Computer aided mathe- matics for a period of over ten years. This year the project is being extended to a national project called MatTaFi as part of the Finnish Virtual University. In the past few years it became obvious that MathML is a very promising way of representing mathematics on the web. The technology and convert- ing tools were tested in the project MatTa by translating study material into XHTM+MathML form. The original material constituted of LTEX A documents, Mathematica notebooks and Maple worksheets. The work was done using TeX4ht and Mathematica’s and Maple’s own means for producing MathML. A new user interface was created to enable browsing the translated material. The aim was to make the user interface as simple as possible and easy to be modiﬁed to meet the demands of diﬀerent kind of courses in which it could be used. A PDF version of the study material was also made using the same user interface. 2 Original material The original material that was translated into XHTML+MathML form was produced within the MatTa project and was originally designed to be rep- resented on the web using HTML and therefore the material is divided into short articles. The material forms a study package of ordinary diﬀerential equations called DelTa. Mathmatica version and Maple version of the pack- age are on the web site of MatTaFi project. The material was designed for a 1 basic course in Helsinki University of Technology but it can be used as well in other universities, polytechnics and even in mathematically oriented upper secondary schools. A The DelTa package constitutes of 73 LTEX documents, 42 Mathematica note- books and Maple worksheets. The worksheets were made by rewriting the notebooks with Maple’s syntax. A LTEX documents do not have a preamble so they can be easily converted to diﬀerent forms like HTML and PDF by adding a suitable preamble. The documents do have metadata ﬁelds and some of those ﬁelds are obligatory. Metadata includes ﬁelds like title, date, nature of the document (theory, example, calculation or application) and the name of the author. In the metadata section there must be an entry of document’s classiﬁcation under Mathematics Content Dictionary, also developed in MatTa project. Some A LTEX documents contain also eps ﬁgures. Mathematica and Maple docu- ments have their metadata in separate tex ﬁles. In the DelTa package there is also a Mathematica guide, a Maple guide and an exercise collection. In addition the package includes two Java applets for illustrating the solu- tions of ordinary diﬀerential equations (DEW1 for ﬁrst and DEWn for higher order). 3 User interface The aim was to create a simple tool for browsing diﬀerent kinds of study material on the net. The user interface has to be simple enough so that even people who are not very familiar with HTML, XHTML or JavaScript can understand and modify the code in order to alter the user interface to meet the demands of the individual course they are giving. This means that there are quite many comments among the source code. The user interface of the HTML version of the DelTa study package was the starting point for the development. The user interface is a set of XHTML pages that include some JavaScript. The appearance is set by Cascading Style Sheets. At the top of the Navig- ation Window (see ﬁgure 1) there is a link to the help page. In the middle there is an array of links. By clicking the links in the upper row of the array (theory, solving, examples and applications), a corresponding table of con- tents will appear below the array and a heading will appear to the blank space at the top of the page. The links in the lower row (exercises, Java applets, and Mathematica and Maple guides) open a separate window. 2 Figure 1: The starting page Figure 2: The Navigation Window with table of contents and an article opened in Content Window 3 3.1 XHTML+MathML version First version of the user interface was made for browsing material that is in XHTML form. For the present, the working version can be found at http://matta.hut.fi/work/omauno/Navigaattori.html. There are sub- headings in the table of contents. Each document can be opened in the Content Window by clicking documents name or in a new window by click- ing the adjacent [i]. When clicking the documents name, the document will be opened in a Content Window, replacing the previous document in that window, and that window will be raised to the top. See ﬁgure 2. Mathematica and Maple documents in XHTML form can also be opened in the Content Window or in a new window. In addition, users can open the original Mathematica notebook or Maple worksheet by selecting nb or mws. If converted documents contain links, they are at the end of that ﬁle. Figure 3: Links, Navigation Window and a PDF document viewed with Acrobat Reader 3.2 PDF version The PDF version of the user interface was done for browsing documents that are in PDF form. It was done by modiﬁcating the XHTML+MathML version and it can be found at http://matta.hut.fi/work/omauno/ pdfNavigaattori.html for the time being. 4 PDF documents are usually viewed with Acrobat Reader. If the application is a browser plug-in, the user interface could have been almost identical to XHTML version. If Acrobat Reader is not a browser plug-in, relative links do not work. Because of this, links were not included in PDF documents but they were placed in a separate link ﬁle and links to the link ﬁles were added to the table of contents. See ﬁgure 3. It is possible to include animations in PDF documents but in order to work the animations need an external application. That would limit the function- ality and that is why the animations were not included in the PDFs but added as separate gif animations. Links to the animations were added to the link ﬁle of the corresponding PDF document. 4 Making of XHTML+MathML version 4.1 Conversion of LTEX documents A A LTEX documents were converted to XHTML+MathML form with Eitan M. Gurari’s program TeX4ht. The work was done in Windows environment using a bit old version of the program with command mzlatex. A The L TEX documents contain only structured data and no preamble. That makes it easier to use the documents for diﬀerent purposes. For the XHTML+MathML conversion following lines had to be added to the beginning of the ﬁle \documentclass[10pt]{article} \usepackage[finnish]{babel} \usepackage[T1]{fontenc} \usepackage[latin1]{inputenc} \usepackage{xhtmlkonv,matht} a¨ % M¨arittelytiedostot, tex4ht \begin{document} and \end{document} to the end of the ﬁle. File xhtmlkonv.sty includes deﬁnitions needed for XHTML+MathML con- version and it loads for example hyperref package. File matht.sty consists of commands, which make it easier to write math. Correspondingly there are similar ﬁles as xhtmlkonv.sty for proofreading and PDF conversion. Running mzlatex produces ﬁles that begin: <?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" 5 "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" [ <!ENTITY mathml "http://www.w3.org/1998/Math/MathML"> ]> <?xml-stylesheet type="text/css" href="ryhfas.css"?> <html xmlns="http://www.w3.org/1999/xhtml" > This cannot be viewed with Internet Explorer. Therefore, the document’s beginning must be revised in order to work properly with the most com- mon browsers. This is done by using David Carlisle’s Universal MathML stylesheet. An extra css ﬁle ratk.css has been added to modify the appear- ance of background and it has no eﬀect on the accessibility. <?xml version="1.0" encoding="iso-8859-1" ?> <?xml-stylesheet type="text/xsl" href="../xsl/mathml.xsl"?> <?xml-stylesheet type="text/css" href="../css/ratk.css"?> <?xml-stylesheet type="text/css" href="algeul.css"?> <html xmlns="http://www.w3.org/1999/xhtml" > 4.1.1 Some notes of the converted material A In general, converting LTEX ﬁles with TeX4ht makes XHTML+MathML that is quite clean and easy to read. Howerver, some problems exist. Images are given height and width, in pixels, smaller than the actual size of the images and thus the image quality is reduced on the screen. This is easy to ﬁx by removing those size parameters. The horizontal space (\quad) between two images is omitted in conversion. Lines connected to each other with curly braces (left or right) are not coded within table. The result of the equation group below is shown in the ﬁgure 4. y = f (t, y, z), z = g(t, y, z). If an open interval is marked as ]a, b[ in the source ﬁle, the result is: <math xmlns="http://www.w3.org/ ... " display="inline" ></mrow> <mo class="MathClass-close">]</mo></mrow> ... <mrow><mo class="MathClass-open">[</mo><mrow></math> 6 Figure 4: Equation groups are coded so that all the equations lie one after another in a single line This is erroneous MathML for the opening and closing tags of mrow elements do not match. In the original material absolute value is encoded either with | or a macro \abs that uses \vert. The former option transforms into ∣ (DI- VIDES) and the latter is coded with mfenced element. An alternative could be to encode both cases with <mo>|</mo>. In Finnish there are no prepositions and that is why the inﬂection of words is done by adding an ending to a word like “j:nnen” meaning j th . That is coded <math><mi>j</mi></math>:nnen and in some cases the linebreak may happen between the word and its ending, which is not allowed. One solution to this problem is to put the word and the ending inside an extra mrow element: <mrow><mi>j</mi><mtext>:nnen</mtext></mrow></math>. Derivatives are marked with prime (߰) in superscript. This makes the prime look a bit loose because it is situated so high if viewed with Mozilla. On the other hand, if the prime would be on the baseline, it would look something close to a slash in Netscape. See ﬁgure 5. Figure 5: The appearance of prime depends on the browser White space handling depends on the browser used. In the following case Mozilla and Netscape show space around R but IE does not. ja</span> <!--l. 43--><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" ><mi >R</mi></math> <span class="ecti-1000">ovat In some cases there is no space between ending tag of math and the following word. For example $y$ ja translates into </math>ja. Though, in most cases this problem does not exist. 7 A Piece of the LTEX code a ja $x = \text{mik¨ tahansa vakio}$, $y = 2$. transforms into <mtext >mikä tahansa vakio</mtext> <!--/mstyle--></math>, Something strange happens if it is viewed with Mozilla or Netscape. The comma after the math is shown in the middle of the preceding word. See ﬁg- ure 6. The problem can be solved by using normal space instead of   (NO-BREAK SPACE, ). Figure 6: Comma is shown in the middle of the preceding word 4.2 Conversion of Mathematica notebooks Mathematica’s notebooks are converted to XHTML+MathML by selecting Save as Special. . . and furthermore XML (HTML+MathML) from the File menu. This was done using version 5.0. With default settings Mathem- atica produces presentation markup. Mathematica’s own code is included in MathML code with annotation element but it is done only in (error) messages. Animations are plotted as separate images. A series of images can be transformed to an animated gif ﬁle. This is done by running a command Export["anim", %, "GIF"] or Export["anim.gif", %] right after the series of images is created. These commands make a ﬁle named anim.gif. The animation made in this way is shown only once. The animation is made continuous by using ImageMagick package and its convert routine: convert -loop 0 old.gif new.gif. This makes the animation start from the beginning right after it has reached its end. Extra images are removed from the XHTML document and animations are added by hand. 8 4.2.1 Some notes of the converted material Converted XHTML documents resemble quite much the original notebooks and therefore it is easy to recognize that they were originally written as notebooks. They are not identical and maybe the most evident diﬀerence is that In[*]:= and Out[*]= are on their own lines. That makes the document look more sparse. The following is a list of problems we have found. If notebook cell has a background color, it does not necessarily mean that text in XHTML form has has the same background color. The html “cell” has a colored background but a part of a text within it may have a white background. Outputs are in a single line that can be very long. This happens because the outputs are coded inside mrow elements and browsers do not allow line breaks inside mrow. One solution to divide long outputs into several lines is simply remove some mrow elements. The precision of numbers may be greater in MathML than in notebooks. Number 2.0 is represented as 2. in input and inline equations. Mathematica’s == encodes either <mo>==</mo> or <mo>⩵</mo> (two consecutive equals signs). Although the output is not math, it is coded with MathML, for example the output with plots. Part of the message Remove::rmnsm : There are no symbols matching "Global‘*". More... transforms into <mtext>There are no symbols matching \"</mtext> <ms>Global‘*</ms> <mtext>\". </mtext> and when it is rendered by browsers it looks like like: Remove::rmnsm : There are no symbols matching \" "Global‘*"\".More.... Inner quo- tation marks are caused by the ms element that is rendered within quotation marks. There might be an extra mtext element that contains only a zero width space. Mathematica gives a mathsize attribute to mstyle element. The value of the attribute is given as a without unit though unit cannot be omitted. Browsers do not recognize all characters in the documents. If something < >. is omitted in long outputs, it is denoted with symbols < and > These are encoded into Unicode characters that are situated in Private Use Area block. Also IndentingNewLine is encoded into that area so it is natural that browsers cannot handle these characters. Sometimes characters that 9 are encoded properly into Unicode are not displayed by Mozilla, Netscape or IE with MathPlayer, for example cross, ⨯, is such a character. Some characters are displayed only by MathPlayer, for example Mathematica’s :→ (RuleDelayed). 4.3 Conversion of Maple worksheets The Maple worksheets were converted in Windows environment. If the work- sheets were opened with Maple on Linux, a warning message was generated: “This worksheet contains elements that are not supported on this platform. Your worksheet may be incomplete.” The conversion was done using version 8.00 by selecting Export As and fur- thermore HTML with MathML. . . from the File menu. Maple proposes the html as a ﬁle extension. Next the user has to select if he or she wants to use frames and which way would be used for saving mathematical formulae. The options are GIF, MathML 1.0, MathML 2.0 and MathML 2.0 with WebEQ. Converting the DelTa package, no frames and MathML 2.0 were used. The MathML generated is presentation markup. Maple makes a link and a horizontal line at the top of each XHTML page. The link points to an anchor which is situated just below the horizontal line. This link is changed to point to the corresponding worksheet and made to open a new window. The size of plot images is determined by the size of the window in which the worksheet was executed. The windows width also determines the length of the output lines that are transformed into images. This means that the conversion must be done using a reasonable window width. Maple makes continuous animations in conversion but the ﬁle size is rather big. Animations can also be made by clicking the right button of the mouse over an animation and selecting Export As and furthermore Graphics Inter- change Format (GIF)... from the menu. The ﬁle size can be substantially reduced for example by using ImageMagick package and its convert routine: convert old.gif new.gif. 4.3.1 Some notes of the converted material The converted documents must be renamed with the ﬁle extension xml be- cause otherwise Mozilla and Netscape do not render MathML properly. Images made by the function plot cause an well-formedness errors because there is an extra space at the end of img tags: / >. This space must be deleted. The whole content of the body element is enclosed in a basefont element though there should not be anything inside a basefont element. This is ﬁxed in by removing both the starting and ending tag of the basefont element. 10 After these modiﬁcations the documents can be viewed with IE, Mozilla and Netscape. En-dashes causes problems because of the diﬀerences in character sets used by Windows and other operating systems. Maple produces a lot of NO-BREAK SPACE ( ) characters and line breaks. This causes an extra space to be shown if the document is viewed with Mozilla and Netscape but not with IE. The image quality may be reduced, though the change is not very big. Outputs that are converted to images have blue font while outputs encoded as MathML are rendered with black font. Inline symbols like m2 transform to images. These images have an extra space around them and the baseline of the symbols can be at diﬀerent height than the one of the surrounding text. Unlike Mathematica, Maple does not create excessively mrow elements. That means that breaking long outputs into two or more lines succeed better than with the code produced by Mathematica. Numbers may be presented in a diﬀerent form. For example an inline expo- nential notation 1016 transforms into 10000000000000000 and inline expres- sion 10−6 into 1000000 . Numbers 0.1327259700 1021 and 0.04 transform into 1 .1327259700e21 and .4e-1. In some cases there are more parentheses in the XHTML document than in the original worksheet, for example parentheses are added inside a square root. On the other hand, in some cases parenthesis are omitted. This happens for example in fractional exponent. Sometimes removing parentheses may ∂2 change the meaning of the expression, for example ( ∂x2 (∗))2 transforms into ∂2 ∂x2 (∗)2 . The formulae are quite often represented in a diﬀerent form in worksheets and in MathML. For example the argument of cosine function may change its sign, the order of terms in equations may change, equations may be multiplied by −1, minus sign’s place may change from numerator to the front of the 1 divisional line, π may transform into 1 π, character / into a division line, 3 6 √ 1 2 −1 2 into 6 3 and √LC into (CL) 2 . The order of elements in sets may change and the place of the imaginary unit may change from the end to the beginning in the product. Airy wave function, AiryAi, in output is encoded in two diﬀerent ways: <mi>Ai</mi> and <mi>AiryAi</mi>. Maple produces erroneous MathML while trying to convert D2 into MathML. <mrow> <diff id=’id79’/> <mo>∘</mo> 11 <diff id=’id80’/> </mrow> Element diff is not a presentation markup element but a content markup element and its use would also require bvar element. Worksheet’s RootOf transforms into a diﬀerent form and label disappears in ﬁgure 7. Figure 7: Maple’s RootOf structure changes in conversion In some cases ∂ symbols transform into d and sometimes they remain as a partial derivative symbol. Sometimes operator D may disappear, for example D(y) transforms into (y ). 4.4 Few notes of notebooks and worksheets The reusability of Mathematica notebooks and Maple worksheet is somewhat open to doubt. Material produced with earlier versions may not work prop- erly with newer versions. Does this mean that the author must check all his work whenever a new version of the program is released? In Mathematica 5 there is a bug in DSolve that caused errors in many note- books. Mathematica does not solve initial or boundary value problems for ordinary diﬀerential equations with symbolic parameters in them. The prob- lem can be avoided by modifying the notebook. This was not done since the problem is probably ﬁxed by the next version. In Maple the command dsolve does not work similarly as in version 6. Prob- lems were encountered also with piecewise deﬁned function, loop and with simplifying. How to create dislay formulae with text like the below in Mathematica? EL = L dI(t) dt a¨ k¨amin yli, ER = RI(t) a vastuksen yli sek¨ 12 If this is done inside an input cell, it looks well but the MathML created is strange. <mi fontfamily=’Times New Roman’ fontweight=’normal’>yli</mi> <mtext fontfamily=’Times New Roman’ fontweight=’normal’> </mtext> <mi><mglyph fontfamily=’Times New Roman’ alt=’se’/></mi> </mrow> Part of a text is in mtext and a part in mglyph and a part is omitted in MathML. Also the space created by space characters disappears. It is coded in the following way: <mtext> </mtext>. Spaces in the beginning of an element are omitted so the mtext is treated as an empty element. If the display formula is rewritten in text cell, the conversion works better. In Maple it is possible to write inline equation that looks like erx by typing exp(rx). While this is converted to MathML, rx is encoded as <mi>rx</mi>. By writing exp(r*x) equation is shown like e(rx) in the worksheet and r and x are treated as separate identiﬁers. Another example of something that works well on worksheet as inline equa- tion but transforms into MathML in an unwanted way is denoting the electric current with capital I (I(t)). This in encoded into MathML in the following way. <mn>ⅈ</mn> <mo>⁡</mo> <mfenced> <mi>t</mi> </mfenced> These examples show that when converting Mathematica or Maple docu- ments they must be checked before converting into MathML. It is not enough that notebook or worksheet looks well, something unexpected can still hap- pen in the MathML conversion. Fortunately the cases where problems were encountered were quite rare. 5 Making of PDF version A The LTEX ﬁles of the DelTa package were converted to PDF documents using commands latex and dvipdfm. An alternative would have been to use command pdflatex but that would have required transforming the images into PDF form before running the command. Mathematica notebooks and Maple worksheets were converted to PDF ﬁles with Adobe Acrobat 6.0 Professional. This was done by sending Mathematica and Maple documents to an Adobe PDF printer. 13 In evaluated notebooks animations are a series of images. All the images except the ﬁrst one of the animations were deleted and separate animation ﬁles were made as described in subsection 4.2. Maple worksheets containing animations do not require any treatment before sending to PDF printer. The making of Maple animations is described in subsection 4.3. The background of printed images is gray though the background on screen or in original notebooks and worksheets is white. Otherwise, there are no problems with the PDF conversion. 14