XML Parsing in Java by fru18749


									                        XML Parsing in Java
 Creating a Dungeons and Dragons (Third Edition w/ 3.5 Revisions)

                             Character Generator in Java

                                        Mike Elliott

                                Computers Seminar

                      Block IV—First Semester ‟07-„08
           “We are stuck with technology when what we really want is stuff that works.”
                                       --Douglas Adams

  “…we are trying to unravel the Mighty Infinite using a language designed to tell one another
                                  where the fresh fruit was.”
                                       --Terry Pratchett


         My seminar project idea went through two basic stages: the project that

never was (due to, to be blunt, Steve Jobs), and the project that I ended up


         My original concept was to use the school network to set up an

audioscrobbling server similar to the one you can find online at www.last.fm. I

envisioned people bringing in their iPods and connecting them to the school

computers to upload the music they‟d been listening to, and perhaps pushing that
information out to all the computers so THS students could see what everyone

was listening to at the moment. This ended up falling apart due to Apple‟s policy

of not giving out any sort of development tools for the iPod. Back to the drawing

board, as it were.

       The lightbulb finally turned on when I was creating a character for a

friend‟s Dungeons and Dragons game. I was clicking through roughly 8 tabs in

Firefox, had my calculator out, and a book precariously balanced across my

knees. It was then that I expressed that infamous programmer‟s sentiment (the

one that drives people to hunt through Internet forums far and wee for that one

guy’s method code, and argue about “big O notation”): “This could be done so

much more efficiently.”

       I took some time to think about it: the character sheet was essentially a

bunch of whole numbers, some Boolean variables, and a String or two for good

measure. It would be easy to put together a PlayerCharacter object that would

store all the information.

       My next problem hit: the sheer scope of Dungeons and Dragons. The

Player‟s Handbook (the most basic of the rulebooks) contains 7 races, which can

be from 11 character classes, select from roughly 50 feats, 30 skills, and 100

magic spells. (For those keeping track at home, that‟s about 12 million possible

combinations from my, admittedly, very rough estimate.)

       I discovered, however, that some forward-thinking kind soul had put most,

if not all of the basic game information, into some XML files. I reasoned that XML

was a common enough format, and there had to be some way for Java to
process the information so it could be used in a program. Sure enough, there

was: JDOM, the Java Document Object Model. I grabbed the XML files off the

Internet, what Dungeons and Dragons materials I possessed, a copy of the

JDOM class listings and a cursory tutorial from Oracle magazine, and went to


A Brief Primer: Dungeons and Dragons

        Dungeons and Dragons grew out of a now-defunct game, Chainmail. It

was originally developed as a miniatures-based war-game, but later came to

have some more focus on role-playing (i.e. how players portrayed their

characters versus just trying to win every fight). The game, and its creator, Gary

Gygax, enjoyed some success in the mid-1970‟s, but then suffered when

Gygax‟s company, Tactical Simulation Rules (later known as TSR) decided to (in

software development terms) “fork” D&D. The original, simple rules were known

as “Dungeons and Dragons, Second Edition”, whereas TSR‟s new, more

complex set of rules became known as “Advanced Dungeons and Dragons, First


        The basic set, considered childish by those who played “AD&D” (a fair

assesment, considering TSR wanted to market the game to the board game set),

managed a stilted life of its own, getting a third and fourth edition in 1981 and

1983, respectively. It finally petered out with a fifth edition in 1991.

        Advanced Dungeons and Dragons, however, only picked up steam. It

issued “Advanced Dungeons and Dragons, Second Edition” in 1989, and enjoyed
large amounts of popularity from the growing field of computer games, and its

steeper learning curve. Internal difficulties at TSR, however, led to the company

nearly going bankrupt. Wizards of the Coast, another game company, flush with

the success of their lucrative card games (Pokemon and Magic: The Gathering

having made them several million dollars), purchased TSR—and with it, the

Dungeons and Dragons license, in 1997.

       Wizards ended the bizarre “fork”, and dropped “Advanced” from the name

of the game, referring to it simply as “Dungeons and Dragons”. They considered

their new edition the third edition of the AD&D rules, and named it as such. The

game was greatly streamlined, and characters were made far more customizable

than they had been in the past. They also introduced the OGL, or Open Gaming

License, which made it easier for people to write material that was compatible

with the game. (This actually ended up hurting Wizards in the long run, for

reasons that are far too long for me to explain within in scope of this paper.)

Wizards went on to introduce the unpronounceable “3.5 Edition”, which created

some basic rules changes. (This is the current edition of the game, and the one

that my program uses.)

       Wizards is set to release Fourth Edition in June of this year, with the rules

further streamlined, and a full suite of computer tools that promise to replace the

various player-made programs that sprouted up in the Third/3.5 Edition era. (I‟ll

address those later in my paper.) The obvious response at this point is, “Well,

Mike, if they‟re going to release D&D 4.0 in June, why go to all this trouble?”
Someone on an Internet forum pointed out one answer: “There is enough 3.5

edition material out there to [play D&D] until the [end] of the universe.”

       The game is a roleplaying game with an emphasis on tactical combat--the

only thing you have close to a gameboard is a grid on which characters move

during fights. Mostly, it's about playing your character [usually as a sort of

amateur-hour acting] through whatever adventure the Dungeon Master [read:

referee] has planned for you--this could be a simple crawl through an abandoned

temple to find treasure, or a hunt through a big city for a killer.

       Characters are represented by six attributes. The first three, Strength

[STR], Dexterity [DEX], and Constitution [CON], are "physical attributes", which

determine how the character works in situations where physical strength or agility

is needed. Arnold Schwarzenegger, for example, would be a good example of a

high Strength character, while a marathon runner would have high Constitution.

The other three, Intelligence [INT], Wisdom [WIS], and Charisma [CHA] are

"mental attributes", which determine how the character works in situations when

mental quickness or personality is needed. Ken Jennings is the best example of

someone with high Intelligence, but Bear Grylles [the survivalist of "Man Versus

Wild"] is high Wisdom--Intelligence is "book smarts", while Wisdom is practical

knowledge. Charisma, defined as "interpersonal skills" is a little bit harder to pin

down--James Bond is a classic example of high Charisma, but few examples of

low Charisma come to mind. These attributes, along with choice of race [elves,

dwarves, humans, et cetera] define the character.

       XML is somewhat hard to explain because it‟s not your usual “enter

command x to get result y”. XML, or Extensible Markup Language, is a way to

format data so that it can be read by a computer. XML is an outgrowth of

SGML—Standardized General Markup Language, but differs from it in the sense

that the user can define their own way of entering the document. Those familiar

with another outgrowth of SGML, HTML, will recognize some elements of XML.

       A “well-formed” XML document (i.e. a standard document that‟s ready to

be read without errors—similar to compiled source code) is given here:

                    <?xml version="1.0" encoding="UTF-8"?>

                               Document : test.xml
                      Created on : January 3, 2008, 12:29 PM
                                Author : student
                               Shows how XML works.


       The first line, <?xml version="1.0" encoding="UTF-8"?>, is the XML

declaration. This is optional, but helpful if the document is going to be read

across a wide variety of platforms. It tells us that we are using XML 1.0 (the most

recent version), and that the text is encoded in UTF-8 (how the characters are

represented as bytes…something outside the scope of this brief introduction).

       The next few lines, as any HTML users can tell, are commented out by the

<!—and -- > blocks. Comments are fairly self-explanatory to anyone familiar with
programming—the usual snippets of notes on what‟s being represented. This

comment block is automatically generated for me by Netbeans, and gives me

fields to fill in with the document name, the date it was created, the author, and a

description on what the program does. Not all programming environments have

this, of course. All XML documents support the comment blocks.

       The last few lines are actually the real meat of things, the actual document

and its markup tags. We have one root element that encapsulates the entire

document, similar to the <html> tags that surround an HTML page. The next

element (which is considered a child element of the root element, as are all other

elements in the document), is the foo element, which has as contents the text

“bar”. Below that is another element, the fullname element, which has as text

“Michael”…but also a name element inside of itself, which just has “mike”. So,

foo and fullname are children of the root element, and name is a child of fullname

and root. For the document to be considered well-formed, there must be opening

and closing tags on all of these elements. Of course, what sets XML apart from

HTML and makes it so useful is that there are no restrictions—get a root

element, get some information tagged up inside of it, and you‟re ready to go.

However, this kind of freedom comes at a price—you have to be very careful

when you‟re working with it in a programming environment.

The Java Document Object Model
       The other important part of the project to have some knowledge of is the

JDOM library, or the Java Document Object Model. The name sounds unwieldy,

but it makes some sense when you take it apart: the JDOM library provides a

model to turn an XML document into a Java Object.

       The library is divided into 6 packages:

      org.jdom

      org.jdom.input

      org.jdom.output

      org.jdom.transform

      org.jdom.xpath

      org.jdom.adapters

   org.jdom contains all of the classes that represent an XML document and its

components—the Document class, the Comment class, the Element class, etc.

The input and output libraries are self-explanatory, and could be thought of as

two different ends of a continuum: the input library takes in data and produces an

XML document, and the output library takes in an XML document and produces

the JDOM data. The transform and xpath libraries are for transformations for

HTML sites and looking up elements in the document, respectively. Finally,

adapters is the black sheep of the family—users never actually interact with this,

as it is used by JDOM to translate the method calls that the user puts in into

parser-specific calls.

   The parsers are the important part for my project, as the program has to

parse the data from the XML document into the JDOM-ready format. JDOM
comes with two parsers, the DOMBuilder (Document Object Model—Java is to

JDOM as C++ is to DOM) class, and the SAXBuilder class (Simple API for XML).

I ended up using SAXBuilder, as that‟s what the tutorials I found for how to use

JDOM were written with. There‟s little actual difference between the two, as far

as I can tell—it seems to be one of those legacy things (the old, grizzled users

prefer DOM, while the new, Web 2.0 types prefer SAX).

       The best way to see how JDOM works is to go through and look at how I

would parse XML for storage. This example will use the Feat class, which I never

actually got to use in my project. A Feat object has 7 fields:

          [String] name: the name of the feat

          [String] type: Feats can be general, combat, or metamagic in the basic

           game. (Other types exist, but in the interest of saving space, I won‟t go

           into them.)

          [Boolean] multiple: Can this feat be had multiple times? Some feats

           can be taken more than once (Weapon Focus—each time, it applies to

           a different weapon), some cannot (Alertness—it grants two skill check

           bonuses, and that‟s it).

          [Boolean] stack: Do the effects of this feat stack if it‟s taken multiple

           times? (Extra Turning, for example, stacks—you can keep getting its

           effects if you take it multiple times. Weapon Focus is an example of a

           feat that does not stack—it applies to a different weapon each time you

           take it, rather than constantly increasing its bonus amount.)
          [String] prereq: What prerequisites must the character meet before it

           can take this feat? Some feats require a certain score in one of the six

           attributes, others require that you complete a “tree” of feats before you

           can gain its effects.

          [String] benefit: What benefit does the feat give? For example, Weapon

           Focus gives a +2 bonus on all attack rolls with the specified weapon.

          [String] normal: How would the character function without the benefit?

           For example, without Weapon Focus, the character would just make its

           normal attack rolls.

       Now, the code for parsing an XML file the XML file that contains all of the

feat information (roughly 100-150 feats) looks like this, in the FAsObj class.

       public class FAsObj
                SAXBuilder builder;
                Document doc;
                Element root;
                List namedChildren;
                Iterator itr;

               public FAsObj()
                                 builder=new SAXBuilder();
                         catch (Exception e)
                           System.err.println("Error in FAsObj");


               public ArrayList FReturn()
                        ArrayList theList=new ArrayList();
                        while (itr.hasNext())
                                         Object o=itr.next();
                                         if (o instanceof Element)
                                                   Element e=(Element)o;
                                                   Element avail=e.getChild("reference");
                                                   if (avail.getText().equals("SRD 3.5 Feats"))
                                                    boolean multiples=false;
                                                   boolean stacks=false;

                                                  Element nam=e.getChild("name");
                                                  String name=nam.getText();

                                                 Element typ=e.getChild("type");
                                                 String type=typ.getText();

                                                 Element mul=e.getChild("multiple");
                                                 String multiple=mul.getText();

                                                 if (multiple.equals("Yes"))

                                                 Element sta=e.getChild("stack");
                                                 String stack=sta.getText();

                                                 if (stack.equals("Yes"))

                                                 String prerequisite="No prerequisites.";
                                                 Element pre=e.getChild("prerequisite");
                                                 if (pre!=null)

                                                  String benefit=ben.getText();

                                                 Element ref=e.getChild("reference");
                                                 String reference=ref.getText();

                                          Feat theFeat=new

                              return theList;

         Now, let‟s walk through the process of getting the list first. The FAsObj

constructor makes a new SAXBuilder to parse the document. It then creates the

Document object that we‟ll be crawling through by calling the build method on the

SAXBuilder, which gets passed the file name of the XML file, and then parses

through the document and drops it into the Document object.

         Next, we get the root element of the document and store it in the root

Element object. We get all the children of the root named “feat”—each separate

feat is delineated in the document by <feat></feat>. These documents are stored

as a List object, namedChildren. We can then get an Iterator of this List, which

can be used to fire through the List.

         At this point, the FAsObj object is all set, and ready to have its big method

called—the FReturn method, which passes back an ArrayList of Feat objects.

         FReturn uses the iterator we made in the constructor to parse through the

document. It keeps running through the Iterator until there is nothing left for it to


         However, the Iterator can‟t tell whether or not it‟s looking at a comment,

element, or something else entirely. We have to put in an if-block, if (o instanceof

Element). Instanceof is an operator similar to .equals() or the various inequality

symbols. It tells the computer to look and see if the first object mentioned is an
instance of the second object. My experience with it thus far is limited, but it is

certainly a powerful tool.

       The next thing that happens is that the computer casts the mystery object

o to be an Element. (If it‟s not, it‟s skipped—we‟re only interested in the elements

of the document, not any comments that might exist.) The next if has a story

behind it—it‟s a result of one of the many mistakes I made while working on this


       The XML files that I got off the Internet were stated to contain all of the

System Reference Document material—in other words, everything that you

would need to play a basic game. I shrugged this off, reasoning that that was

what I wanted. However, in a case of getting exactly what you ask for, the XML

file also contained the material for epic levels—a sort of variant on the game that

I wasn‟t prepared to deal with. The computer took forever to go through the list,

and I wasn‟t particularly interested in dealing with all of the extra coding for epic

levels. I had to find some way to handle all the epic material that I was going to

tell the computer to ignore.

       Luckily, the XML document tagged each feat with an extra element,

<reference>, which states just where it got them from. After a quick eyeballing of

the document, I realized that everything I needed came under the “SRD 3.5

Feats” category. This if block, therefore, stops the program from going further if

the feat it‟s currently looking at is not part of that group.
       After that, things are fairly self-explanatory. One interesting part to note is

the “prerequisites” code, which I took to referring to as the “imaginary friend

handler”—that is, it checks to make sure that something actually exists. This, of

course, was from another mistake I made.

       As I was writing the first version of this Feat parser, I realized that it kept

crashing at certain feats, saying that it had made some variation on the theme of

a null exception. I tried everything that I could think of, and then ended up going

into the code itself to see what was wrong. I discovered that, rather than just

saying “No prerequisites” under the feat‟s <prerequisites> tag, the original author

had simply left them out entirely. Therefore, when it gets to something that I‟ve

found (through trial and error) to “maybe not be there”, it goes into the “imaginary

friend handler”, making sure that what it gets back isn‟t a null.

       Outside of that part, things are fairly normal. Once it has all its ducks in a

row, it adds the new Feat to the list, goes back to the beginning, and starts over

from there. Eventually, it gets the whole list all set and sends it back for whatever

use one may have for it.

A Question of Design

       The next most important part, once I had all the various pieces of code

that I would need to get the requisite information out of the XML documents, was

how to represent it. This was, looking back, almost even a larger problem than

the XML parsing, because of my experiences in the past. Let‟s take a look at

other character generation software.
       The first up is what I consider the “gold standard”, which can be found at

http://www.pathguy.com/cg35.htm . It‟s surprisingly good for an amateur effort.

(This is just a small part of the program, the rolling of the character attributes.)

       If you just said, “Wait, what?”…I don‟t blame you. This is my big problem

here: no one seems capable of creating an intuitive interface for doing this.

       The way that this one is set up, you set your dice rolling method in the

dropdown box, and then hit the “Roll the Dice” button. You assign each attribute

in the radio buttons—you can‟t have two selections share a column or row. The

manual entry column is used for point buy, or when you get fed up with rolling


       So, let‟s take a look at another program. Next on the list is PCGen, which

other people consider to be the gold standard. This is the first screen you get
when you‟re going into the program, ready to make your character.

         If you thought that the last one was bad, you might as well just curl up and

cry at this point. You have to pick the sources you want to construct your

character from in the box on the right before you can actually do anything. (The

proper selection for making a basic Dungeons and Dragons character is

“RSRD”.) Then, you click on the button to load it in (right above the big red “2” in

this screenshot). This is the best part now: you have to go up to “File” and hit


         To sum all this up in one sentence, I‟m on the frontlines of the war against

poorly designed graphic user interfaces. Looking back now, I think that just about

anyone who has ever programmed, ever, can say that with some degree of pride.

         On the flip side of the coin, there are some ideas that I wanted to

incorporate into my project. First up on the Good List is Redblade, a free
character generator that, while not being perfect, has some points worth


       This is the very first screen you see, as you can see over on the left. I

don‟t like that they make you roll ability scores afterwards (note that your

progress through generation is measured over on the left—going from “Base” to

“Finished”), but I do like this screen. You can put in the character‟s name,

gender, race, and alignment right then and there. The dice pictures are actually

buttons—you click them to get a random result in the dropdown. It‟s clean and

easy to follow. Not to say that it's not without problems, but I like it.

       I, of course, did not finish my project due to being stilted along the way by

various pieces of JDOM and Java that I simply didn't know how to code/how to

use properly. The first panel I created was the stat roller.
       The way that this flows is simple. At the top of the screen, a generation

method is selected from the most popular ones I could find on the Internet, and

then "Roll" is clicked. The attributes rolled show up in the "???" spaces, and the

comboboxes each show the six attributes [STR, DEX, CON, INT, WIS, CHA].

One stat is selected per box, and then "Check and Accept" is clicked. If the

boxes have not be selected properly, the message at lower left changes to say

that one or more scores have been improperly assigned. If otherwise, the

selected stats are stored to the PlayerCharacter object that is constantly being

built behind the scenes, and the message shows that the user has selected a

valid score set.
              The other panel that I created, which was still a work in progress at

the time that I started this paper, was the Race panel, which I was especially

proud of.

       I felt that other race selection screens always presented too much, or too

little information. Therefore, I grabbed the parts that I thought were the most

important: the favored class [a game mechanic that's far too complicated, and in

my opinion, poorly designed, to get into right now], special abilities [gnomes can

speak to any burrowing animal, changelings can alter their physical appearance

at will], and any skill and/or attribute adjustments. The user simply selects a race

from the dropdown and hits accept if everything appears to be acceptable. The
big drawback to this was that I could never seem to get the race data into the text

fields or the table object--something that I wish I'd known more about how to use

beforehand. Another problem that I had during development was the stat roller

panel--this has already been covered in excruciating detail in my weekly updates,

but I'll reiterate once again that while using radio buttons is great, Java's radio

button group class, for reasons I don't think I will ever be able to comprehend,

does not have any way for you to call a method to see what button is currently

selected out of the group.

Down the Road

       I would like to continue working on this at some point, although, looking

back at my paper now, I'm not so sure that I would like to continue with the 3.5

ruleset. I think that the rules are too baroque and there's just too much going on

for anyone to write one really good, clean program that can easily sum up a

character. Fourth Edition is supposed to be the big simplification of the rules, and

I think that that would be a great place to get onboard with any sort of program.

       As far as XML and Java, I feel like I've learned a lot. I'm still a little

stunned that I never knew about the instanceof operator, not to mention the

usefulness of XML. I seem to see XML everywhere now, whether cruising around

the Internet or digging through files on my computer. I've greatly enjoyed the

freedom that the seminar has given me, and hope that I can have similar

experiences in the future.

To top