Docstoc

XML quick start guide

Document Sample
XML quick start guide Powered By Docstoc
					FIGIS XML Quick Start Guide

Management Summary

Author Version Project Distribution Created Saved Printed

Yves Jaques 1.0 FIGIS – XML DTD’s 29 May 2001 9:37 AM 24/10/2009 08:58:00

Revision History Date 24/10/2009 08:58:00 13/11/2002

Author Y. Jaques Y. Jaques

Summary Update following user comments. Update for dev site revision

FIGIS PROJECT

FAO

Contents

1. Introduction ...................................................................................................................... 1 2. XML structure .................................................................................................................. 1 3. Install XML Spy ............................................................................................................... 2 4. Configure XML Spy ......................................................................................................... 2 5. Open an XML document ................................................................................................. 3 6. Navigate the document structure. ................................................................................... 3 7. Elements.......................................................................................................................... 6 8. Attributes ......................................................................................................................... 7 9. Entities ............................................................................................................................. 8 10. Testing an XML document .............................................................................................. 8 11. Entering text and elements into an XML document ........................................................ 9 12. Starting a new blank XML document ............................................................................ 10 13. Converting an existing electronic document into an XML document ............................ 12 14. Creating an XML document from scratch (without importing an existing electronic document) ................................................................................................................................ 13 15. Printing an XML document ............................................................................................ 14 16. Document tagging rules ................................................................................................ 14 17. Further exploration ........................................................................................................ 15 18. Further reading .............................................................................................................. 16

FIGIS PROJECT

FAO

XML quick start guide

1.

Introduction
 Opening an XML document  Assigning an XML DTD (rules file) to an XML document  Tagging and structuring an XML document

This guide demonstrates the basic steps involved in creating an XML document.

2.

XML structure

XML is more than just a new software language for creating documents. It is a data solution; one that allows for easy cross-platform data input/output. XML simplifies the creation of human-readable structured data that can be formatted in many different ways.

XSL files (extension .xsl)

DTD files (extension .dtd)

XML files (extension .xml) This guide focuses on XML files only.

10/24/2009 8:58:00 AM

Page 1

FIGIS PROJECT

FAO

3.

Install XML Spy

This XML guide uses Altova’s XMLSpy XML editing software for its examples. A free 30-day version is available from the Altova site: http://www.altova.com

4.

Configure XML Spy

To use XMLSpy as described in this document a few parameters must be configured. 1. Start the XMLSpy application 2. From the Tools menu select Options 3. There are four important tab screens that must be set:   File File Types   Editing Encoding

4. On each of the tab screens, match your settings to the ones pictured below:

Be sure and select XML as your file type before making selections in this tab screen.

10/24/2009 8:58:00 AM

Page 2

FIGIS PROJECT

FAO

5.

Open an XML document

Sample XML files are included on the FIGIS development site at the following location: http://www.fao.org/fi/figis/devcon/document/template/xmlsample.zip 1. Download and place the unzipped xmlsample directory wherever you like on your local drive 2. Start the XMLSpy application 3. From the File menu select Open 4. Browse to the xmlsample directory and double-click the fishing techniques file 2pair_seining.xml. 5. You should see the following file open on your screen:

6.

Navigate the document structure.

All XML files have a hierarchical structure of elements that is defined by the assigned DTD. This structure may be flat or deep. Pictured below is a simplified version of the FIGIS DTD structure for Fishing Techniques:

1. To begin taking a look at the structure of 2pair_seining.xml, highlight the fi:FishTechnique element and click the small arrow to the left of it . The element expands to show you its structure

10/24/2009 8:58:00 AM

Page 3

FIGIS PROJECT

FAO

2. The structure for fi:FishTechnique is a standard structure found within most FIGIS domains  The Ident block contains a group of elements (keys) used to identify the object that is being created by the XML document:

 The Profile block contains elements intrinsic to the object; elements by which it is defined rather than identified:

 The Features block contains elements extrinsic to the object; elements which relate to it but do not identify or define it:

 At the top of every fi:FIGISDoc, before the domain are the DataEntry and ObjectSource elements. The DATAENTRY block contains the name of the editor together with the version and date for the XML file. The ObjectSource block contains elements that identify the information sources used to create the object. Source may appear in many places. It always refers directly to the element in which it is found and by default to any sub-elements unless those sub-elements contain source elements of their own:

10/24/2009 8:58:00 AM

Page 4

FIGIS PROJECT

FAO

3. It is probably a good idea to have a look at what the actual XML looks like. The view we have been examining thus far is a graphic representation of the actual XML. To see what an XML document looks like in its raw state go to the View menu and click Text view. The beginning of your document should look similar to the one pictured below:

<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE fi:FIGISDoc SYSTEM "http://www.fao.org/fi/figis/devcon/dtd/beta/3.0b/Figisdoc3.0.dtd"> <fi:FIGISDoc xmlns:ags=http://www.fao.org/agris/agmes/ xmlns:cds=http://193.43.36.85/fi/figis/devcon/ xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:fi=http://193.43.36.85/fi/figis/devcon/ xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:dced=http://dublincore.org/documents/2000/10/05/educationnamespace/ xmlns:agls=http://www.naa.gov.au/recordkeeping/gov_online/ agls/1.1 xmlns:fint="http://www.fao.org/fi/figis/internal/"> <fi:DataEntry>  Each <element> is a <fi:Editor>Valerio DTD structure. A well-formed element contains part of the FIGIS Crespi</fi:Editor> <dc:Date>2001</dc:Date> an opening tag <element> and a closing tag </element>. Any content or nested elements fits </fi:DataEntry> tags: between these <fi:BiblioEntry_Link> <dc:Creator>Yves Jaques</dc:Creator> <dc:Date>2001</ dc:Date > </fi:BiblioEntry_Link > 4. On the right hand of the screen is a long list of elements. This is a list of every XML element in the FIGIS system (approx. 800) 5. To return to the previous view, go to the View menu again and select Enhanced Grid view

10/24/2009 8:58:00 AM

Page 5

FIGIS PROJECT

FAO

7.

Elements

The fundamental building block of an XML document is the element. It may contain text, child elements, and also have attributes. 1. Highlight the fi:FishTechnique element and look at the right upper-hand tab screen. This screen contains the list of elements that may be inserted into the document at the present point. Depending on the tab, this list may contain the elements which can be inserted at the same level, or it may contain those that can be inserted at a higher level:

2. Click the tabs along the bottom to select from the three different lists:  Append indicates the elements that may be inserted after the current element  Insert indicates the elements that may be inserted before the current element  Add child indicates the elements that may be inserted as children of the current element 3. Double-click some elements in the element tab screen. They will appear within the document 4. Highlight the Add child tab screen and then try double clicking the box on the right-hand side of one of the elements currently found in the document. If the element can contain other elements the list of allowable child elements will update in the tab screen. If the element is non-structural and may only contain text, the word Text will be displayed 5. Try double-clicking the left-hand side of the element, i.e. its name. Instead of a list of child elements the drop-down box will show the other elements that could appear in the current position:

10/24/2009 8:58:00 AM

Page 6

FIGIS PROJECT

FAO

8.

Attributes

Attributes are content only. Unlike elements they may not contain other elements. They have no structure. They are used in FIGIS primarily to provide meta-information about elements; i.e. they generally describe the element itself -- its name, its function -- rather than the content in the element or document. 1. Below the element tab screen there is a second tab screen. It contains the list of attributes for the currently highlighted element. The attributes vary from element to element, however three attributes are always present:

 Style - for layout purposes one of several pre-defined styles may be added to an occurrence of an element, e.g. bullet, paragraph level  xml:lang – allows the user to declare the language found within the tag  FID - this is an internal ID assigned by the system. As an attribute it is almost never completed by the user Keep in mind that the above attributes are only the default global attributes. Many elements contain additional specific attributes. 2. Try double-clicking the Style attribute. It will appear within the document 3. Double-click the box to the right of =Style. A drop-down box appears with the available choices for the attribute:

 Other attributes may contain fixed values or accept free text

10/24/2009 8:58:00 AM

Page 7

FIGIS PROJECT

FAO

9.

Entities

Entities have a variety of uses, but for the XML end-user they function as pre-defined blocks of information that can be dropped into a document. They function well as automation tools that can reduce keystrokes, as in the case of repeated bibliography declarations. 1. Below the attribute tab screen there is a third tab screen. It contains the list of parameter entities for the currently highlighted element  Many entities in FIGIS are used to support accented characters

2.

Scrolling down will reveal a number of other entities such as SOU.CRUSTACEAL which contain much-used bibliographic references  Double-clicking an entity within a text field inserts the entity name into the XML document  This has the effect when the document is parsed (processed together with the DTD and an XSL stylesheet to produce the final product) of inserting all the information in the entity into the final product

10.

Testing an XML document

One of the strengths of XML is the fact that for a document to be parsed (processed) it must conform to two sets of rules, the generic rules of XML, and the specific rules of the assigned DTD. 1. The XML document that is currently open in XMLSpy is well-formed and valid  Well-formed means that the XML conforms to the general syntactical rules of XML  Valid means that it conforms to the specific grammatical rules of the DTD to which it is linked. For the 2pair_Seining.xml this means that the structure conforms to the Fishing techniques DTD created by FIGIS for the production in XML of the Fishing techniques Fact Sheets 2. Test whether the XML is well-formed by clicking the yellow Well-formed checkmark the toolbar 3. Test the validity of the XML by clicking the green Validation checkmark in

in the toolbar

4. The document should pass both tests. If not, experiment with repairing it. The parser will highlight problem areas and produce helpful (hopefully) error messages at the bottom of the screen. If you cannot repair it, click File and Reload  The error messages produced by the parsers are not always correct  The actual error may often be a missing element that comes before the element indicated by the error message

10/24/2009 8:58:00 AM

Page 8

FIGIS PROJECT

FAO

11.

Entering text and elements into an XML document

Entering text in an XML document is somewhat more difficult than just typing away in a wordprocessing program. This is because an XML document is a structured document; i.e. it is a database that resembles a document. Though it is initially more work producing a document this way, the overall benefits are great.  An XML document is made up of plain, tagged text. This means that the base document can be opened and read by any software program that reads text: wordprocessing programs, browsers, notepads, etc.  XML is neither platform nor software-specific. Though this guide uses XMLSpy to orient the new user, there are numerous XML editors for every platform: PC, Mac, Linux, Unix, Java, Solaris, etc. XML files can be moved seamlessly from platform to platform and software to software. Your data is no longer trapped within a single product or platform  Within an XML editor the document also becomes a database object that can be easily manipulated, stored and retrieved  Any tagged keywords can be linked to other database resources  The document can be reproduced in many different forms by being processed with various XSL stylesheets to create hardcopy print, web html, acrobat pdf, etc.  As a database of XML documents is built, future documents containing shared information need only refer to the already entered information. This means a great reduction in duplication of effort Several XML tagging ground rules:  When tagging keywords try and leave out any extraneous characters such as commas, periods and parentheses  For long blocks of text, it is not necessary to press <enter> at the end of each typed line. Line breaks can be handled later when the document is processed with XSL  Remember that formatting is not really a part of XML. XML is designed to create structured data that looks somewhat like a document. XML is a database. It is the XSL transformation via the parser that takes an XML document and turns it into an end product like a print document or web page 1. Try adding text, elements and attribute values to the document. The program will tell you quickly when you have broken the rules contained in the DTD  In the following example, within the parent element FishTechniqueIdent element GeartypeRef (English) was accidentally inserted in place of Name. This can happen in several ways. If the editor is in plain-text mode (as in section 7.3) or if an element name is double-clicked in a document and incorrectly hand-typed:

10/24/2009 8:58:00 AM

Page 9

FIGIS PROJECT

FAO

 The element tab screen stops displaying the available elements and produces an error message:

 The error message often pinpoints the problem, allowing for quick repair

12.

Starting a new blank XML document

In the preceding examples a pre-prepared XML document (2Pair_Seining.xml) was used. Unless you're working with multiple documents that are all in the same rigid format, it's often easier to build the document from scratch, adding elements as content is added. The FIGIS development site contains blank templates for each version of the DTD. Included in the ZIP file is a blank template for the current version 3.0b of the DTD. 1. Open XMLSpy 2. From the File menu select Open 3. Browse to the xmlsample directory and double-click the Blank3.0.xml file. 4. You should see the following file structure open on your screen:
(the top namespace and doctype declarations have been removed for clarity)

<fi:DataEntry> <fi:Editor></fi:Editor> <dc:Date></dc:Date> </fi:DataEntry> <fi:ObjectSource> <fi:Owner> <fi:ProgrammeRef CodeSystem="Acronym" Code=""> <fi:ParentInstitution> <fi:InstitutionRef CodeSystem="Acronym" Code=""> <fi:Name></fi:Name> <fi:LandPoliticalRef CodeSystem="ISO2" Code=""></fi:LandPoliticalRef> </fi:InstitutionRef> </fi:ParentInstitution> </fi:ProgrammeRef> </fi:Owner> <fi:CoverPage> <dc:Title></dc:Title> <dc:Creator></dc:Creator> <dc:Date></dc:Date> </fi:CoverPage>

10/24/2009 8:58:00 AM

Page 10

FIGIS PROJECT

FAO

</fi:ObjectSource> 5. The procedures vary depending on whether you are converting an existing electronic document (go to section 13) or creating an entirely new document (go to section 14.)

10/24/2009 8:58:00 AM

Page 11

FIGIS PROJECT

FAO

13.

Converting an existing electronic document into an XML document

This section assumes that XMLSpy is open with a blank new document as described in section 12. 1. Open the existing source document that is to be put into XML using the program in which it was created, e.g. Microsoft Word 2. Open XMLSpy 3. Double-click the box to the right of the fi:FIGISDoc element and paste the clipboard contents into it 4. At this point it's time to decide on the domain. Highlight the fi:FIGISDoc element and click the Add child tab in the element tab screen:

5. The program will display a list of available FIGIS domains. Select the domain by doubleclicking it. It should appear in the document together with its elementIdent block 6. At this point, elements can be added to the structure piece-by-piece. From any point in the document, the element tab screen will display only the elements allowable at that moment  Text can be cut and pasted into the element fields from the source document to the new XML document.  Keep in mind that many of the higher-level (parent) elements cannot contain text and may only contain other elements. If an element may contain text there will be a selectable field at the bottom of the element tab screen named text  To study an overall view of the fi:FIGISDoc structure you may open the included schema (xsd files) Figisdoc3.0schema.xsd also found in the zip file.  If the Enhanced Grid view gets too cumbersome you may always switch to the Text view under the View menu. Bear in mind however that in the Text view XMLSpy displays all the elements in the FIGIS system, making it difficult to select the

10/24/2009 8:58:00 AM

Page 12

FIGIS PROJECT

FAO

correct element. Text view can be very useful after the element structure is more or less complete, as it is easier to drag text from tag to tag in this view

14.

Creating an XML document from scratch (without importing an existing electronic document)

This section assumes that XMLSpy is open with a blank new document as described in section 13. 1. Decide on the domain. Highlight the fi:FIGISDoc element and click the Add child tab in the element tab screen. 2. The program will display a list of available FIGIS domains. Select the domain by doubleclicking it. It should appear in the document as a child element of fi:FIGISDoc. 3. At this point, elements can be added to the structure piece by piece. The element tab screen will display only the allowable elements  Many of the higher-level (parent) elements cannot contain text and may only contain other elements. If an element may contain text there will be a selectable field at the bottom of the element tab screen named text  To study an overall view of the fi:FIGISDoc structure you may open the included schema Figisdoc2.0.xsd included in the zip file.  If the Enhanced Grid view gets too cumbersome you may always switch to the Text view under the View menu. Bear in mind however that in the Text view XMLSpy displays all the elements in the FIGIS system, making it difficult to select the correct element. Text view is however very useful after the element structure is more or less complete, as it is easier to drag text from tag to tag in this view

10/24/2009 8:58:00 AM

Page 13

FIGIS PROJECT

FAO

15.

Printing an XML document

Printing while working can be helpful when the structure gets cumbersome or the document gets long. There are various ways to print an XML document.  Since XML is plain text, it can be printed as such within XML Spy, or within any ordinary text processor like Notepad or Word.  Many XML editors offer special printing modes that can enhance the readability of XML-tagged text. In XML Spy, when printing from the Enhanced Grid view, the following print selections generally provide the best end product:

16.

Transforming XML to HTML

For an XML file to become formatted for the web or other purposes it must be linked to a stylesheet that formats the data found in the XML document. The XMLSample zip contains an example of the transformation process:  Open the file sample.xml with XMLSpy  You should be able to select view / browser view from the toolbar and see the page formatted as html  Open the file sample.xsl with XMLSpy. This is the file that formatted the XML into HTML.

17.

Document tagging rules

There are several common categories of elements used in FIGIS XML that must be understood in order to correctly tag and create a FIGIS XML document:  element_Ident -- whenever this block appears in a document it signals the intention to create a database object. They are only found within root domain elements where they are mandatory. For example, the root domain fi:FishTechnique must contain fi:FishTechnique_Ident. When the system finds it in a document it will create a database object out of whatever species is identified in the _Ident block. If the object already exists, the database will kick out both specimens for examination by a fisheries expert  element_Ref -- this block is used to positively link a container to an existing database object. It contains all the available classification systems for its domain, of which at least one must be completed. They are used in the light versions of root domains when the goal is to use a domain's structure but not create a new object. They are also used widely in assembling compound domains. For example, the fi:AqStock identity block consists of a fi:AqSpecies_Ref and an fi:Area_Ref since a stock is the conjunction of a species and an area. In this way, positive references to existing database objects create a new hybrid/compound object  L_element -- this is a keyword tagging element. It is used to tag the names of possible database objects found in a text. There is no requirement to know whether or not the tagged name corresponds to an actual database object. It merely instructs the system to attempt to link the word to an existing object. If none is found it doesn't

10/24/2009 8:58:00 AM

Page 14

FIGIS PROJECT

FAO

matter. For example, while creating a species fact sheet several geartypes may be mentioned. These geartypes may be tagged with fi:L_Geartype whether or not the user knows that these exist in the database or are even named correctly in the source documentation. There is also the possibility of grouping tagged keywords by putting them inside the container fi:KeywordGroup  H_element -- these are used to highlight important terms or phrases that do not link to any database objects. They may be handled by the system in different ways depending on the desired output document. For example a list of fisheries management strategies could all be tagged using fi:H_Strategy, thus allowing them to be handled as a set of strategies when parsing the document. These elements may also be grouped together within fi:KeywordGroup  element_LINK -- link is used as a lightweight way of referencing a related compound domain. It is used when the user would like to capture a possible compound link but does not want the trouble of positively referencing an object with a _Ref block. For example, while creating a stock observation a user may find information on a fishery. Instead of positively referencing the fishery with a _Ref block, the user can open fi:Fisheries_Link and tag keywords with tags like fi:L_FishTechnique, fi:L_Area, fi:L_Vesseltype, etc. The system will take the tagged keywords and consider them together when searching for a matching fishery in the database

18.

Further exploration

This guide is designed to get users started tagging XML documents. Below are several suggestions to broaden the user's knowledge and understanding of XML. 1. The best way to learn how to create XML documents is to examine those that have already been created  The zip file also contains a set of XML files created by FIGIS editors.  Examine in particular the different ways in which the element_Ident blocks are created. These blocks are fundamental to the correct categorisation of an XML document within the database  Examine the use of the various L_element keyword tags. They are used to provide linking to other documents. They create a database-wide index that is crucial to resource indexing  Examine the element_Ref blocks. They are used singly to positively link to existing database objects, and are used together to create compound objects, e.g. fi:Vesseltype_Ref + Geartype_Ref = ExploitationUnit. Studying the interplay of the _Ident and _Ref elements is key to understanding the entire FIGIS information structure 2. Use templates  To speed the creation of new documents, create basic templates like the example template used in this guide  When saving use the Save as command to save under a new file name, leaving the template unchanged 3. Take a look at the FIGIS XML-driven site: www.fao.org/fi/figis  Navigate to any of the fact sheets for aquatic species, vesseltypes, geartypes or Fishing Techniques. All of these fact sheets have been prepared from data input to storage to output using XML

10/24/2009 8:58:00 AM

Page 15

FIGIS PROJECT

FAO

19.

Further reading

There are numerous important resources on the FIGIS development site: http://www.fao.org/fi/figis/devcon. The WWW also has many XML sites that may be of help in understanding XML. 1. Included on the FIGIS development web site:  FIGIS XML -- ‘FIGIS DOC’ XML DTD Structure Design Patterns; FIGIS XML Overview _v1.1.doc  Preparing data to be loaded for the Thematic Fact Sheets ; data_preparation1-2.doc  Tagging documents : do's and don't; XMLdosanddont1-0.doc 2. Other resources on the Web:  http://zvon.org/  http://www.xmlinfo.com  http://www.oasis-open.org/cover/xml.html  http://www.w3.org/XML

10/24/2009 8:58:00 AM

Page 16


				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:10/24/2009
language:English
pages:18