VIEWS: 14 PAGES: 6 POSTED ON: 1/9/2010
Converting Word documents to HTML Dafydd Gibbon, U Bielefeld The production of HTML coded documents for the World Wide Web has gone through several stages since the beginning of the 1990s, and there are a number of different techniques for this purpose: manual coding with HTML markup; HTML page editors with format conversions (such as Word for Windows with MicroSoft Internet Assistant, Claris Home Page, Microsoft Front Page, HotDog, and many others); automatic conversion of structured documents in a conventional formatting environment such as Word or LaTeX into multi-page/file hyperdocuments with automatically constructed systematic navigation strategies. Conversion software Word documents can easlily be converted to HTML using one of the converter programmes which are now on the market. There are several which are currently available: Microsoft Internet Assistent (MSIA): This programme is an add-on to Word. In other words, when it has been installed it is not available as a separate stand-alone programme, but in the form of additional actions in the Word File, View, Insert, Format menus. This programme converts a Word document into a single page with HTML format markup. The main advantage of MSIA is that it provides very good conversion of text objects which are formatted with format templates such as headings, numbered and bullet lists, tables, and of graphics. The main disadvantage is that MSIA only produces a single page of output, i.e. a single file, and does not take advantage of the heading structure for producing a set of interlinked files with structured navigation strategies. WordToWeb: This is a commercial programm which does everything that MSIA does and more: it converts a Word document with heading templates into a set of HTML pages/files which are interlinked into a hyperdocument with structured navigation strategies for moving through the graph-structure of the hyperdocument. WordToWeb is very similar to the well-known UNIX tool latex2html which has been available for several years. More information can be obtained by searching the Web with a web search engine. RTFTOHTML: This programme is shareware. After an initial trial period, during which the unregistered software starts with a 20 second delay, the software must be registered. The programme is similar to WordToWeb and is very suitable for converting descriptive texts, articles, handbooks into a hyperdocuments. Like other good conversion software, it offers several options for generating a table of contents, for splitting the original document into HTML pages/files at different levels of detail, and for generating either standard HTML hyperdocuments or frame hyperdocuments. The RTFTOHTML programme is available for a wide variety of platforms, including Win95 and various flavours of UNIX. Unfortunately RTFTOHTML does not currently automatically convert RTF graphics to the GIF format used in web documents, but it does provide the links which are required by graphics. The present description refers to RTFTOHTML. However, one way to overcome the weakness of RTFTOHTML in not providing graphics format conversion is to first convert the document using Word with MSIE, then converting the document with RTFTOHTML, and finally renaming the graphics files produced by MSIE (the numbered *.GIF files) to the names required by RTFTOHTML (the *.WMF files). Installing RTFTOHTML Following the following steps: 1. Make a new directory in the top directory ”C:\” called ”R2H”. 2. Search for ”RTFTOHTML” with a web search engine. 3. Download the appropriate version (e.g. for Windows 95) into the R2H directory. 4. The downloaded file is most likely called r2hw95.zip, which must be unpacked using the programme pkunzip. This programme is very widely available, and is distributed along with many other items of software. Unpack the file r2hw95.zip by double-clicking it. When the dialogue box appears with the request to select a programme for processing the file, click on ”Other” and then type in the filename pkunzip (if pkunzip is in the same directory) or the complete path and filename, such as C:\Programs\pkunzip (if this is where the programme is) and then click on ”OK”. 5. A DOS box (MS-DOS window) will appear, which will be rapidly filled with a list of filenames. When the unpacking process is finished (see the top bar of the DOS box) close the window in the usual way. 6. Read the licence text and register the software. At this point, the directory R2H should contain 56 files, of which the following will be discussed: Type Name Comment Archive R2HW95.ZIP See text Guide GUIDE.RTF RTF input format (can be loaded into Word) GUIDE.HTM Start file of HTML document generated by RTFTOHTML GUIDE01.GIF Graphics file for guide GUIDE01.HTM HTML page in guide GUIDE02.GIF Graphics file for guide GUIDE02.HTM HTML page in guide GUIDE03.HTM HTML page in guide GUIDE04.HTM HTML page in guide GUIDE05.HTM HTML page in guide GUIDE06.HTM HTML page in guide GUIDE07.HTM HTML page in guide GUIDEFC.HTM HTML page in guide GUIDEFI.HTM HTML page in guide GUIDE_C.HTM HTML page in guide GUIDE_I.HTM HTML page in guide GUIDE_T.HTM HTML page in guide Navigation BLANKG.GIF Graphics file for navigation d CONTG.GIF Graphics file for navigation DBLEDAG.GIF Graphics file for navigation INDEXG.GIF Graphics file for navigation LEFTG.GIF Graphics file for navigation NOLEFTG.GIF Graphics file for navigation NORIGHTG.GIF Graphics file for navigation NOTOPG.GIF Graphics file for navigation NOUPG.GIF Graphics file for navigation RIGHTG.GIF Graphics file for navigation SECTMARK.GIF Graphics file for navigation SIGMA_.GIF Graphics file for navigation TOPG.GIF Graphics file for navigation UPG.GIF Graphics file for navigation NAV-PANL File for defining navigation graphics files Running R2H95.BAT Batch file for drag-and-drop running of R2HW96.EXE R2HW95.EXE The Windows 95 version of the RTFTOHTML programme Adapting RTFTOHTML to your computer A number of minor operations need to be performed in order to make yourself a useful working environment. These are detailed in the following sections. Creating a directory for the navigation buttons Normally an HTML document created by RTFTOHTML must be located in a directory which also contains a subdirectory images containing the GIF graphics files for the navigation buttons. So, for example, before reading the Guide, create a new directory called c:\R2H\images and copy all the GIF navigation button images into this directory. The simplest way of ensuring that your HTML documents have access to the GIF files for the navication buttons is to copy this directory to the directory in which your own HTML files are located. However, there is a more general technique: the file containing the navigation panel button graphics file names may be modified in order to give these files some standard name which can be used anywhere on your computer, e.g. C:\R2H\images\... In this case, you do not need to copy your files into your HTML directories each time. The Guide Now click on the HTML start file for the Guide. This should start your web browser, such as MS Internet Explorer or Netscape Navigator. You should see two frames, with the Contents on the left and the first page on the right. At the top of the frames you should see navigation buttons (currently light grey arrows and other symbols for contents and index). The Guide contains very information on the programme, licensing and usage. The present outline does not duplicate information in the guide, but contains additional practicall information about installation and use. Optimal use of RTFTOHTML under Windows 95 The most practical way to use the RTFTOHTML software under Windows 95 is to create a desktop icon with a link to the batch programme, and then drag your RTF file to this icon and drop it on the icon when the icon is highlighted. A DOS box appears, the programme runs, and the output is deposited in the directory which contains your RTF file. It is also useful to create copies of the batch programme with different parametrisations of the RTFTOHTML programme, for instance to produce single page output, to produce a hypertext with a contents page, or to produce a hypertext with a frame for the contents page. Adapting the batch file However, the batch file first has to be modified slightly for your system: 1. Make a copy of the file r2h95.bat called r2h95.bat (to avoid bother if you make a mistake). 2. Using the Windows 95 Notepad editor, load the file r2h95.bat in the R2H directory. 3. Look for the following line: 4. set RTFLIBDIR=D:\r2h3x\binary 5. Change it to read: 6. set RTFLIBDIR=C:\R2H 7. Store the file. Creating a desktop icon Create a shortcut to the batch file in your R2H directory. Then drag-and-drop the shortcut from the R2H directory to the desktop (i.e. the background part of the Windows 95 screen) by clicking on it, holding down the left mouse button, and moving the cursor (which turns into a special warning image) to the desktop, then letting go. Take care NOT to drop it on a directory icon, or it will land inside the directory, not on the desktop. After putting the programme icon on the desktop you can move it around to the optimal position for your style of working. Customising the desktop icon Following the following steps to change various properties of your desktop icon and the programme attached to it: 1. Change the name on the icon, if you wish, by clicking on the icon name, then after a short pause (in order to avoid the double-click effect) click again. Then you can type in a new name, such as simply R2H. 2. To change other properties by double-clicking on the icon. Then a DOS box appears, and you can click on the ”File/Properties” menu item in order to di this. 3. Change the icon graphic by clicking on the symbol change button and selecting a new icon from the icons provided. 4. You can also ensure that the DOS box disappears automatically after the programme has been run by clicking on the appropriate box. However, do not do this until you have got used to the operation of the programme and the DOS box. Creating your RTF document with Word In order to create a hypertext document, the Word document must be properly structured with text objects and not simply constructed with local markup. 1. A text object is, for example, a format template for headings, or a numbered list or bullet list, or a table. 2. Local markup is either highlighting with bold face or italics, or layouting with TAB stops or sequences of spaces. 3. ALWAYS use a text object, if possible. 4. NEVER use local markup to create the impression of structure if a text object is available. Only if you follow these principles will you achieve acceptable results. In a sense, the creation of a structured document is a form of applied text linguistics. After all, hypertext is text, and text is an important object of linguistic description. 5. Store your finished document in RTF format. Finally, to produce and examine the HTML document follow these steps: 1. Open a directory display of the directory in which your RTF document is located and adjust the size and location of the display so that the RTFTOHTML desktop icon is visible. 2. Drag and drop the RTF document icon on to the RTFTOHTML desktop icon. 3. When the conversion process has finished and the DOS box is closed, refresh the directory display for your RTF document directory, and click on the Web icon with the same name (which should have an Internet Explorer or Netscape Navigator icon, depending on the configuration of your computer, if you already have a browser). The file has the extension .HTM or .HTML. If you have selected a configuration option (see below) which produces more than one file, make sure you select the file with the start page; this file will have exactly the same name as your own file (but of course with the extension .HTM or .HTML instead of .RTF). 4. If you do not have a browser, you will need to copy all the .HTM or .HTML files to a floppy disk and transfer them to a computer which does have a browser in order to look at the result. Conversion options with RTFTOHTML When the RTFTOHTML programme (from now on referred to as ”r2h”) is run with no filename, the following message containing a list of the command line options which define the form of the output of the programme: Options may be: -c generate a Table of Contents page -F use frames (req. -c and Netscape 2.0 or compatible) -G write no graphics files -h[n] split output at headings (at the n'th level) -i imbed graphics -N file read navigation panel from "file" -o file write output to "file" -P ext use "ext" as extension for external graphic files -s use short filenames when splitting -t list external references on top of page -T title use "title" as the document title -V print version nuber of rtftohtml (only) -x create an Index (if there are index entries) -X text use "text" as the text for index anchors (e.g. ·) If no parameters are used, a single page of HTML output, with embedded tables of contents, is produced. Other hyperdocument formats are produced using specific parametrisations. It is very convenient to make copies of the batch file, each with a desktop icon, and each with different parametrisations. To do this, make copies of the batch file, load them into the Notepad editor, and modify each of them as follows: 1. Look for the following line: %RTFLIBDIR%\R2Hw95.exe "%1" 2. Change the line with the required parameters. Useful parameters are: Single page %RTFLIBDIR%\R2Hw95.exe "%1" Standard hypertext %RTFLIBDIR%\R2Hw95.exe -c -F -h6 -i "%1" Contents frame %RTFLIBDIR%\R2Hw95.exe -c -h -i "%1" Advanced uses For more advanced uses, including the definition of additional cross-references and other links within the Word document in order to creat a more sophisticated hyperdocument structure, the Guide should be consulted.
Pages to are hidden for
"Converting Word documents to HTML"Please download to view full document