The Parser Implementation in Java by mikeholy


									3.3     The Parser Implementation in Java

Step 1: Rewrite the Grammar to Make it LL(1)

 document          →   element
 element           →   ElementFragment ElementRest
 ElementFragment   →   L BR NAME AttributeList
 ElementRest       →   R BR content ETag
                   |   SLASH R BR
 AttributeList     →
                   |   WHITESPACE AttributeList2
 AttributeList2    →
                   |   Attribute AttributeList
 Attribute         →   NAME OptWhiteSpace EQ OptWhiteSpace ATT VAL
 ETag              →   L BR SLASH NAME OptWhiteSpace R BR
 content           →
                   |   CDATA content
                   |   element content
 OptWhiteSpace     →
                   |   WHITESPACE
Step 2: Every Non-Terminal Symbol6 Results in a Java Function

 public     void    document();                                   public      void    attributeList2();
 public     void    element();                                    public      void    attribute();
 public     void    elementFragment();                            public      void    eTag();
 public     void    elementRest();                                public      void    content();
 public     void    attributeList();                              public      void    optWhiteSpace();

Terminal symbols will be handled by the function consume():

 public void consume(Token7 t) {
   if (t == scanner.getToken())
     scanner.advance();      // Everything’s ok. The token is the one expected.
     error("Expected token " + t + ", got " + scanner.getToken() +".");

 6 Non-terminal symbols are those that occur somewhere on the left side of a production and are not served by the scanner.
 7 One way to implement a token is some kind of enumeration type, e. g. int, the most elegant way the implementation of a new class.
Step 3: Non-Terminals with Only One Production Rule Simply Call the
Respective Functions


 public void element() {

Terminal symbols occurring in the production rule are reflected in a call to the consume() function:

 public void elementFragment() {

Step 4: One Look-Ahead Must Decide in All The Other Rules

 public elementRest() {
   switch(scanner.getToken()) {
     case R_BR:       consume(R_BR); content(); eTag(); break;
     case SLASH_R_BR: consume(SLASH_R_BR); break;
     default:         error("Unexpected token in elementRest().");

To satisfy an production, the token must be following set of the left-side symbol, i. e. the token
must follow the left-side symbol in at least one production rule.

 public attributeList() {
   switch(scanner.getToken()) {
     case R_BR:
     case SLASH_R_BR: break;      // Everything ok, epsilon.
     case WHITESPACE: consume(WHITESPACE); attributeList2(); break;
     default:          error("Unexpected token in attributeList().");

4 Using DOM with Java
• The W3C DOM specification defines a set of classes and their inheritance hierarchy.
• In Java these specifications are implemented as interfaces in the org.w3c.dom package.
• Several implementations of these interfaces can be used interchangeably, the most well-known
  beeing Xerces8 and Sun’s own implementation, that is shipped with Java since Version 1.4.
• To make this flexibility possible, Sun specified some helper classes, to be found in the
  javax.xml.parsers package.

                                                      Sun JAXP
                                           API                           instan
                                  t iation        DocumentBuilder                 tiation
                           instan                 DocumentBuilderFact.
   Application                                                                              Implementations
                           DOM                    org.w3c.dom
                                 functio                                       ents
                                        n calls   DOM Specification       implem
                                                  NodeList ...

4.1    The classes in the org.w3c.dom package
• Node — The generalization of the XML node types. Most of the other interfaces are derived
  from Node.
  – The actual sub-class of an object can not only be determined with the instanceof operator,
     but also with the getNodeType() method of Node. This is particularly convenient for
     switch blocks.
  – Most DOM functions are defined for the Node interface, although they do not make sense
     for every node type. Examples are getAttributes() or getChildren().
• NodeList — To be conformant with the DOM specification, functions like getChildNodes()
  do not return Java internal collection types like Vector or Hashtable, but the DOM-specific
  interface NodeList. NodeList is an ordered list of Nodes.
• NamedNodeMap — This interface provides a mapping table from attribute names to values. In
  contrast to NodeList, NamedNodeMap is not ordered.
• DOMException — You will get this exception only in really “exceptional” cases, such as an
  out-of-bounds error, or trying to add child nodes to an Attribute node.

4.1.1    A Short Example

 public void reverse(Document doc) {

     Stack elementStack = new Stack();

     Element root = doc.getDocumentElement();

     NodeList elements = root.getChildNodes(); ::::::::::::::::::::::::::

     while(elements.getLength() > 0) {

       elementStack.push(elements.item(0));            ::::::::

           ::::::::::::::::::::::                    ::::::::


     while(elementStack.size() > 0) {
       root.appendChild((Node) elementStack.pop());


4.2    Obtaining a Document Object
• The starting point to operate on DOM trees is the Document object, that also has construction
  functions for all the other node types.
• A Document object can be retrieved from a DOM parser, or — more generally — from a
• A standard way to make an API implementation-independent, is the factory design pattern. In
  our case, the DocumentBuilderFactory creates a new DocumentBuilder object.
• DocumentBuilderFactory is defined abstract. A new instance must therefore be created
  with its static function newInstance().
• Depending on the javax.xml.parsers.DocumentBuilderFactory System property, the
  newInstance() method chooses the right implementation for you.

try {
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  DocumentBuilder docBuilder = factory.newDocumentBuilder();
  Document doc = docBuilder.parse("demo.xml");

catch (Exception e) ...

     4.2.1       An (almost) Complete Example
      import javax.xml.parsers.DocumentBuilderFactory;
      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.ParserConfigurationException;

      import org.w3c.dom.*;


      public class ReversePrinter {

        public static void main(String args) {

            try {
              DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
              DocumentBuilder docBuilder = factory.newDocumentBuilder();
              Document doc = docBuilder.parse("tabledemo.xml");


             printNode(doc, "", System.out);
            catch (ParserConfigurationException pce) {
              System.err.println("Error in DOM Parser System Properties!");
            catch (SAXException se) {
              System.err.println("Parse error: " + se.getMessage());
            catch (IOException ie) {
              System.err.println("I/O Exception: " + ie.getMessage());

        private static void printNode(Node n, String indent, PrintStream out) {

            switch (n.getNodeType()) {
              case Node.TEXT_NODE:     // print text
              case Node.DOCUMENT_NODE:
                out.println("<?xml version=’1.0’ encoding=’iso-8859-1’?>");
                NodeList children = n.getChildNodes();
                for (int i = 0; i < children.getLength(); i++) {
                  printNode(children.item(i), indent, out);
              case Node.ELEMENT_NODE: ...

To top