Converting Word documents to HTML by byrnetown68

VIEWS: 14 PAGES: 6

									Converting Word documents to HTML
Dafydd Gibbon, U Bielefeld



The production of HTML coded documents for the World Wide Web has gone through
several stages since the beginning of the 1990s, and there are a number of different techniques
for this purpose:
 manual coding with HTML markup;
 HTML page editors with format conversions (such as Word for Windows with MicroSoft
   Internet Assistant, Claris Home Page, Microsoft Front Page, HotDog, and many others);
 automatic conversion of structured documents in a conventional formatting environment
   such as Word or LaTeX into multi-page/file hyperdocuments with automatically
   constructed systematic navigation strategies.



Conversion software
Word documents can easlily be converted to HTML using one of the converter programmes
which are now on the market. There are several which are currently available:

 Microsoft Internet Assistent (MSIA): This programme is an add-on to Word. In other
  words, when it has been installed it is not available as a separate stand-alone programme,
  but in the form of additional actions in the Word File, View, Insert, Format menus. This
  programme converts a Word document into a single page with HTML format markup. The
  main advantage of MSIA is that it provides very good conversion of text objects which are
  formatted with format templates such as headings, numbered and bullet lists, tables, and of
  graphics. The main disadvantage is that MSIA only produces a single page of output, i.e. a
  single file, and does not take advantage of the heading structure for producing a set of
  interlinked files with structured navigation strategies.
 WordToWeb: This is a commercial programm which does everything that MSIA does and
  more: it converts a Word document with heading templates into a set of HTML pages/files
  which are interlinked into a hyperdocument with structured navigation strategies for
  moving through the graph-structure of the hyperdocument. WordToWeb is very similar to
  the well-known UNIX tool latex2html which has been available for several years. More
  information can be obtained by searching the Web with a web search engine.
 RTFTOHTML: This programme is shareware. After an initial trial period, during which the
  unregistered software starts with a 20 second delay, the software must be registered. The
  programme is similar to WordToWeb and is very suitable for converting descriptive texts,
  articles, handbooks into a hyperdocuments. Like other good conversion software, it offers
  several options for generating a table of contents, for splitting the original document into
  HTML pages/files at different levels of detail, and for generating either standard HTML
  hyperdocuments or frame hyperdocuments. The RTFTOHTML programme is available for
  a wide variety of platforms, including Win95 and various flavours of UNIX. Unfortunately
  RTFTOHTML does not currently automatically convert RTF graphics to the GIF format
  used in web documents, but it does provide the links which are required by graphics.
The present description refers to RTFTOHTML. However, one way to overcome the
weakness of RTFTOHTML in not providing graphics format conversion is to first convert the
document using Word with MSIE, then converting the document with RTFTOHTML, and
finally renaming the graphics files produced by MSIE (the numbered *.GIF files) to the
names required by RTFTOHTML (the *.WMF files).



Installing RTFTOHTML

Following the following steps:

1. Make a new directory in the top directory ”C:\” called ”R2H”.
2. Search for ”RTFTOHTML” with a web search engine.
3. Download the appropriate version (e.g. for Windows 95) into the R2H directory.
4. The downloaded file is most likely called r2hw95.zip, which must be unpacked using the
   programme pkunzip. This programme is very widely available, and is distributed along
   with many other items of software. Unpack the file r2hw95.zip by double-clicking it.
   When the dialogue box appears with the request to select a programme for processing the
   file, click on ”Other” and then type in the filename pkunzip (if pkunzip is in the same
   directory) or the complete path and filename, such as C:\Programs\pkunzip (if this is
   where the programme is) and then click on ”OK”.
5. A DOS box (MS-DOS window) will appear, which will be rapidly filled with a list of
   filenames. When the unpacking process is finished (see the top bar of the DOS box) close
   the window in the usual way.
6. Read the licence text and register the software.

At this point, the directory R2H should contain 56 files, of which the following will be
discussed:
      Type             Name                 Comment

      Archive          R2HW95.ZIP           See text
      Guide            GUIDE.RTF            RTF input format (can be loaded
                                            into Word)
                       GUIDE.HTM            Start file of HTML document
                                            generated by RTFTOHTML
                       GUIDE01.GIF          Graphics file for guide
                       GUIDE01.HTM          HTML page in guide
                       GUIDE02.GIF          Graphics file for guide
                       GUIDE02.HTM          HTML page in guide
                       GUIDE03.HTM          HTML page in guide
                       GUIDE04.HTM          HTML page in guide
                       GUIDE05.HTM          HTML page in guide
                       GUIDE06.HTM          HTML page in guide
                       GUIDE07.HTM          HTML page in guide
                       GUIDEFC.HTM          HTML page in guide
                       GUIDEFI.HTM          HTML page in guide
                       GUIDE_C.HTM          HTML page in guide
                       GUIDE_I.HTM          HTML page in guide
                       GUIDE_T.HTM          HTML page in guide
      Navigation       BLANKG.GIF           Graphics file for navigation
      d                CONTG.GIF            Graphics file for navigation
                       DBLEDAG.GIF          Graphics file for navigation
                       INDEXG.GIF           Graphics file for navigation
                       LEFTG.GIF            Graphics file for navigation
                       NOLEFTG.GIF          Graphics file for navigation
                       NORIGHTG.GIF         Graphics file for navigation
                       NOTOPG.GIF           Graphics file for navigation
                       NOUPG.GIF            Graphics file for navigation
                       RIGHTG.GIF           Graphics file for navigation
                       SECTMARK.GIF         Graphics file for navigation
                       SIGMA_.GIF           Graphics file for navigation
                       TOPG.GIF             Graphics file for navigation
                       UPG.GIF              Graphics file for navigation
                       NAV-PANL             File for defining navigation
                                            graphics files
      Running          R2H95.BAT            Batch file for drag-and-drop
                                            running of R2HW96.EXE
                       R2HW95.EXE           The Windows 95 version of the
                                            RTFTOHTML programme




Adapting RTFTOHTML to your computer
A number of minor operations need to be performed in order to make yourself a useful
working environment. These are detailed in the following sections.

Creating a directory for the navigation buttons
Normally an HTML document created by RTFTOHTML must be located in a directory which
also contains a subdirectory images containing the GIF graphics files for the navigation
buttons. So, for example, before reading the Guide, create a new directory called
c:\R2H\images and copy all the GIF navigation button images into this directory. The
simplest way of ensuring that your HTML documents have access to the GIF files for the
navication buttons is to copy this directory to the directory in which your own HTML files are
located.
However, there is a more general technique: the file containing the navigation panel button
graphics file names may be modified in order to give these files some standard name which
can be used anywhere on your computer, e.g. C:\R2H\images\... In this case, you do not
need to copy your files into your HTML directories each time.

The Guide
Now click on the HTML start file for the Guide. This should start your web browser, such as
MS Internet Explorer or Netscape Navigator. You should see two frames, with the Contents
on the left and the first page on the right. At the top of the frames you should see navigation
buttons (currently light grey arrows and other symbols for contents and index).
The Guide contains very information on the programme, licensing and usage. The present
outline does not duplicate information in the guide, but contains additional practicall
information about installation and use.

Optimal use of RTFTOHTML under Windows 95
The most practical way to use the RTFTOHTML software under Windows 95 is to create a
desktop icon with a link to the batch programme, and then drag your RTF file to this icon and
drop it on the icon when the icon is highlighted. A DOS box appears, the programme runs,
and the output is deposited in the directory which contains your RTF file. It is also useful to
create copies of the batch programme with different parametrisations of the RTFTOHTML
programme, for instance to produce single page output, to produce a hypertext with a contents
page, or to produce a hypertext with a frame for the contents page.

Adapting the batch file
However, the batch file first has to be modified slightly for your system:
1. Make a copy of the file r2h95.bat called r2h95.bat (to avoid bother if you make a
   mistake).
2. Using the Windows 95 Notepad editor, load the file r2h95.bat in the R2H directory.
3. Look for the following line:
4. set RTFLIBDIR=D:\r2h3x\binary
5. Change it to read:
6. set RTFLIBDIR=C:\R2H
7. Store the file.

Creating a desktop icon
Create a shortcut to the batch file in your R2H directory. Then drag-and-drop the shortcut
from the R2H directory to the desktop (i.e. the background part of the Windows 95 screen) by
clicking on it, holding down the left mouse button, and moving the cursor (which turns into a
special warning image) to the desktop, then letting go. Take care NOT to drop it on a
directory icon, or it will land inside the directory, not on the desktop. After putting the
programme icon on the desktop you can move it around to the optimal position for your style
of working.

Customising the desktop icon
Following the following steps to change various properties of your desktop icon and the
programme attached to it:

1. Change the name on the icon, if you wish, by clicking on the icon name, then after a short
   pause (in order to avoid the double-click effect) click again. Then you can type in a new
   name, such as simply R2H.
2. To change other properties by double-clicking on the icon. Then a DOS box appears, and
   you can click on the ”File/Properties” menu item in order to di this.
3. Change the icon graphic by clicking on the symbol change button and selecting a new icon
   from the icons provided.
4. You can also ensure that the DOS box disappears automatically after the programme has
   been run by clicking on the appropriate box. However, do not do this until you have got
   used to the operation of the programme and the DOS box.


Creating your RTF document with Word
In order to create a hypertext document, the Word document must be properly structured with
text objects and not simply constructed with local markup.
1. A text object is, for example, a format template for headings, or a numbered list or bullet
   list, or a table.
2. Local markup is either highlighting with bold face or italics, or layouting with TAB stops
   or sequences of spaces.
3. ALWAYS use a text object, if possible.
4. NEVER use local markup to create the impression of structure if a text object is available.
Only if you follow these principles will you achieve acceptable results. In a sense, the creation
of a structured document is a form of applied text linguistics. After all, hypertext is text, and
text is an important object of linguistic description.
5. Store your finished document in RTF format.
Finally, to produce and examine the HTML document follow these steps:
1. Open a directory display of the directory in which your RTF document is located and
   adjust the size and location of the display so that the RTFTOHTML desktop icon is visible.
2. Drag and drop the RTF document icon on to the RTFTOHTML desktop icon.
3. When the conversion process has finished and the DOS box is closed, refresh the directory
   display for your RTF document directory, and click on the Web icon with the same name
   (which should have an Internet Explorer or Netscape Navigator icon, depending on the
   configuration of your computer, if you already have a browser). The file has the extension
   .HTM or .HTML. If you have selected a configuration option (see below) which produces
   more than one file, make sure you select the file with the start page; this file will have
   exactly the same name as your own file (but of course with the extension .HTM or .HTML
   instead of .RTF).
4. If you do not have a browser, you will need to copy all the .HTM or .HTML files to a
   floppy disk and transfer them to a computer which does have a browser in order to look at
   the result.

Conversion options with RTFTOHTML
When the RTFTOHTML programme (from now on referred to as ”r2h”) is run with no
filename, the following message containing a list of the command line options which define
the form of the output of the programme:

Options may be:
     -c        generate a Table of Contents page
     -F        use frames (req. -c and Netscape 2.0 or
               compatible)
     -G        write no graphics files
     -h[n]     split output at headings (at the n'th level)
     -i        imbed graphics
     -N file read navigation panel from "file"
     -o file write output to "file"
     -P ext    use "ext" as extension for external graphic
               files
     -s        use short filenames when splitting
     -t        list external references on top of page
     -T title use "title" as the document title
     -V        print version nuber of rtftohtml (only)
     -x        create an Index (if there are index entries)
     -X text use "text" as the text for index anchors (e.g.
               ·)

If no parameters are used, a single page of HTML output, with embedded tables of contents,
is produced. Other hyperdocument formats are produced using specific parametrisations. It is
very convenient to make copies of the batch file, each with a desktop icon, and each with
different parametrisations. To do this, make copies of the batch file, load them into the
Notepad editor, and modify each of them as follows:
1. Look for the following line:
    %RTFLIBDIR%\R2Hw95.exe "%1"
2. Change the line with the required parameters.

Useful parameters are:
   Single page            %RTFLIBDIR%\R2Hw95.exe "%1"
   Standard hypertext     %RTFLIBDIR%\R2Hw95.exe -c -F -h6 -i "%1"
   Contents frame         %RTFLIBDIR%\R2Hw95.exe -c -h -i "%1"


Advanced uses
For more advanced uses, including the definition of additional cross-references and other
links within the Word document in order to creat a more sophisticated hyperdocument
structure, the Guide should be consulted.

								
To top