Metanotion XML Tools

This page is a repository for tools we develop to manipulate XML and related standards. At the present we have only released our schema compiler for Java.

Table of Contents

XML Schema Compiler(v0.1)

What's a Schema?

XML Schema is basically a grammar for describing the limits on the types and values of tags, content, and attributes in an XML document. Many XML parsers will validate an XML document according to the schema. Most parsers provide a mapping to the Document Object Model Interface for manipulating trees of nodes.

Why Schema?

Often a schema is provided for a protocol or service. If someone has gone to the effort to produce a formal spec(which is what XML Schema is, whether you like it is a different matter), we feel that it should be possible to leverage that effort to speed up development time, and more importantly, produce correct code.

What's an XML Schema Compiler

The compiler takes as input, an XML Schema(for loads of simple schema examples, visit An Introduction to XML and Web Technologies), and its output is a set of Java objects which represent that schema.

Now, the DOM is one way of doing this, but it gets tedious to write all those while-loops looking for tags, and Integer.parseInt(node.getAttributeNode("blah").getValue()), especially when the Schema specifies stuff like that. For example, if you have a tag called <body> and it has an attribute like width whose type is an integer value, our compiler will generate a Java class called XEbody with an instance variable public Integer width;.

This is a code generator whose output is Java. The output classes are entirely dependent on the XML Schema, and a change in the Schema will almost certainly result in a change in the classes. If you must deal with multiple versions of a schema, it is suggested to compile each one separately and create a more abstract mapping by hand. Code generation is not a silver bullet, but an engineering tool, the output of this compiler is not meant to be hand modified(would you modify the machine code output from a C compiler?), and it is designed as a more manageable layer between "raw" DOM and the object model of the XML consuming applications which are dependent on XML Schema.

Features

How does it map XML Schema to Java Classes

The mappings produced are usually predictable, however, the XML Schema type system is complicated, and is non-trivial to map into a typical OO language like Java(for a comprehensive discussion of the problems, read the paper Revealing the X/O impedance mismatch by Ralf Lämmel and Erik Meijer, also a conversation discussing the paper at Lambda the Ultimate).

This is further complicated by two other implementation choices:

Admittedly, there are many other choices that could be made, and we have a list below which details other projects which make different tradeoffs. We have also made a choice to use prefixing. You made find this distasteful, but its a reminder that if the schema changes, these classes will change, and you should probably consider an additional layer of mapping to deal with this.

Edge labelling vs. Node labelling.

This is one of the bigger issues in mapping problems(far from the only however). In languages such as Java, objects are "node labelled", basically, if an object is referenced from another object the reference is named in the referencing object. That name is unique within that object. However, the "subelements" of an XML element are "edge labeled", the references are ordered by number and "type"(e.g. tag name). A tag could have 5 sub-nodes, identically tagged. Now, this example could be handled with an array, but it is also possible to have a different tag intervening, followed by another collection of the same "tag type".

Instance Variables

Methods

Validation

The constructors all take DOM nodes. It is assumed if you want your input XML validated, you will use a validating parser. While XML purists may be appalled, currently no constraint enforcement is performed on input or output with generated code. In the case of input, not all XML documents received from the outside will validate, and the generated code will do its best, ignoring unknown tags, and not generating errors.

For output, it is in the plan to provide an output phase validator. However, all validation will be performed in the toXML method, since it may be desirable to manipulate objects in to inconsistent states in the interim. It is also more efficient to save validation for the end, despite the fact that errors will be harder to track, it fits within the style of the design to provide code which presumes minimally about the needs of its users.

Similar Projects

Schema Alternatives

Download

Binary XML Schema Compiler Jar

Usage: java -jar xsdCompiler.jar SchemaFile.xsd Include-Directory Output-Directory Package