Examples for Java XML Processing
Under Java, there exist several different ways to process streams formatted in the Extensible Markup Language (XML) and related standards. Here you can find examples for them.
An XML document is created in memory using the Document Object Model (DOM) and then serialized to the standard output. Instead of stdout, we could as well have used a file. So now we can create XML documents.
We load the file course.xml as DOM document from a file into memory. In memory, we modify it. Then we serialize it to the standard output.
The Simple API for XML (SAX) is another way to load XML documents. While DOM loads the whole document contents into memory, which allows for you to edit it but is also memory consuming, SAX treats an XML document more like a stream of events. A SAX parser accepts a listener-like handler which it notifies whenever it encounters a new element or text or processing instructions in an XML stream.
A SAX parser is a push parser, meaning that the parser invokes the methods of your provided handler, i.e., the parser makes the decision when your application is notified about events and you have little control over this. You can imagine SAX something like a visitor pattern for XML documents. Obviously, this does not require loading the whole document into memory, instead, it can be processed more efficiently, in a stream-like fashion. This has another advantage: While processing a DOM document requires the whole document to be loaded first, a SAX handler can receive the first events already after only a few bytes have been read from the source document.
Here we apply a SAX parser to the courseWithNamespace.xml example file. The class SAXReaderExample.java not only executes the SAX parser in its
main method, it also extends the class
DefaultHandler, i.e., the default implementation of the SAX event listener interfaces. It therefore can receive and print SAX events.
While the above example can recognize namespaces in our XML document, it may not validate the document. This has two reasons:
- Validation is a feature which must be turned on via
- The parser may not know where to find the XML schema for a given namespace. For this purpose, a
schemaLocationattribute can be added to the XML file, as we do in courseWithNamespaceAndSchemaLocation.xml.
Here we unite these two steps into a new version of the SAX parsing example:
The Streaming API for XML (StAX) is complement to SAX: Like SAX, it presents an XML document as stream of events. However, different from SAX, it is a pull parser. Here, the user has to actively query the reader for events, process them, and then query for the next event. You can imagine this like an iterator pattern for XML document elements.
This iterator pattern can also be "turned around": An
XMLStreamWriter provides methods such as
writeAttribute which allow us to, well, build an XML document in a stream-like fashion. We do not need to build the document completely in memory and then serialize it, as in DOM, but now can build it step-by-step. This does, of course, require significantly fewer resources than DOM-based processing. Also, combine this with either StAX or SAX parsing and you get something very fast: Our process may already begin to write an output document while it is still reading its input document.
With Extensible Stylesheet Language Transformations ([XSLT]](https://en.wikipedia.org/wiki/XSLT)), we can transform one XML document to another document, which does not even necessarily need to be an XML document. In the file courses2html.xslt we specify how to transform XML documents conforming to our courses.xsd schema to HTML. Here you can find how we can actually perform this application, i.e., apply courses2html.xslt to courseWithNamespace.xml in Java.