XML, DOM and Visitor Design Pattern
Document Sample


Markup languages
A markup language is used to tell a printer (a
person!) how to lay out text on the page.
XML, DOM and SGML: from about 1980
usual complaint: “too heavyweight”
Visitor Design Pattern means “hard”
HTML: much looser, therefore many users
XML: structure allows description of data
need description of “tags”
An HTML example From HTML to XML
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
• Factors leading to the creation of XML
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
• Problems with HTML
<title>
CSC207H: E5
• primarily presentation
</title>
</head> • hard to derive meaning from the markup
•
<body>
<h1> fixed tag set
CSC207H: E5
</h1>
<p>
• Web browsers were being viewed as potential
<strong>Due date: 10:00 a.m., Thursday, March 11, 2010.</
application platforms
strong>
</p>
Basic Format Rules for well-formed XML
• Element: <tag>content</tag> • Elements that contain data must have start and end tags
• Empty tags must be closed
• basic unit
• <br /> or <br> </br>
• tag name defines what the content is
• Elements should not overlap
• opening and closing tags enclose content
• Bad Nesting: <trunk> <branch> </trunk> </branch>
• Attribute: Information about the data
• All attribute values must be wrapped in quotes
• Attribute names are usually adjectives • <a href="newpage.html">
• Stored as attribute="value" pairs: • XML is case sensitive (unlike HTML): <TAG> and <Tag> are
treated differently.
• <tag colour="red">
•
• </tag>
content
• Standard: use lower case.
More Rules Document Object Model (DOM)
• A document begins with:
• Cross-language API for representing XML
• an XML Declaration documents as trees
• ! <?xml version="1.0" encoding="UTF-8"?>
• Easier to manipulate than strings or streams
• and perhaps a DocType Declaration:
• But may require a lot of memory
• <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN" • Several implementations in Java
• "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
• Root element immediately follows; encloses entire content of the
• E.g., org.jdom
document. • In Python, xml.dom is standard
• <book>
• xml.dom.minidom doesn’t have everything,
• everything that’s part of the book
but is easy to use and fast.
• </book>
Tree Structure DOM Rules
• Every document becomes an object of type Document
Let’s look at this document:
• This has a single child of type Element
• The root element of the document
• Its children may be:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" • Other elements
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
• Text objects
<body> • Other things that we won't worry about
<h1>Title</h1>
<p>A <em>word</em></p> • Note: white space is preserved
</body>
</html>
• For example, the newlines in the previous slide
• But comments are not
Using JDom Iterate over children
public static void main(String[] args) {
Get root
try {
String filename = args[0];
element
Build the // Show top-level elements
DOM tree Element root = doc.getRootElement();
// Build document tree Iterator ic = root.getChildren().iterator();
SAXBuilder builder = new SAXBuilder(); // (jdom isn’t 1.5-happy.)
Document doc = builder.build(filename); Get all children
while (ic.hasNext()) { (excluding text)
// Show top-level elements (next slide)
Element elt = (Element) ic.next();
} catch (Exception e) { System.out.println(elt.getName());
System.err.println(e); }
}
}
Input
Input and output The Visitor Pattern
<?xml version=“1.0” ?>
Document • Often want to operate on a tree recursively
•
<book>
<h1>First heading</h1> Count elements, search for text that matches a pattern,
<p>First
book etc.
<em>paragraph</em>.</p>
<p><em>Second
paragraph.</em></p> • Mechanics of traversing is the same every time
</book>
h1 p p • So build a generic visitor that knows how to traverse the
tree
Output
book em em • Give it do-nothing methods that are invoked at specific
h1 times during traversal
p
em • Users derive from this class and override the methods
p they're interested in
em
Related docs
Get documents about "