Metanotion XML Tools

This page is a repository for tools we develop to manipulate XML and related standards. At the present we have only released our schema compiler for Java.

XML Schema Compiler(v0.1)

XML Schema is basically a grammar for describing the limits on the types and values of tags, content, and attributes in an XML document. Many XML parsers will validate an XML document according to the schema. Most parsers provide a mapping to the Document Object Model Interface for manipulating trees of nodes.

Why Schema?

Often a schema is provided for a protocol or service. If someone has gone to the effort to produce a formal spec(which is what XML Schema is, whether you like it is a different matter), we feel that it should be possible to leverage that effort to speed up development time, and more importantly, produce correct code.

What's an XML Schema Compiler

The compiler takes as input, an XML Schema(for loads of simple schema examples, visit An Introduction to XML and Web Technologies), and its output is a set of Java objects which represent that schema.

Now, the DOM is one way of doing this, but it gets tedious to write all those while-loops looking for tags, and Integer.parseInt(node.getAttributeNode("blah").getValue()), especially when the Schema specifies stuff like that. For example, if you have a tag called <body> and it has an attribute like width whose type is an integer value, our compiler will generate a Java class called XEbody with an instance variable public Integer width;.

This is a code generator whose output is Java. The output classes are entirely dependent on the XML Schema, and a change in the Schema will almost certainly result in a change in the classes. If you must deal with multiple versions of a schema, it is suggested to compile each one separately and create a more abstract mapping by hand. Code generation is not a silver bullet, but an engineering tool, the output of this compiler is not meant to be hand modified(would you modify the machine code output from a C compiler?), and it is designed as a more manageable layer between "raw" DOM and the object model of the XML consuming applications which are dependent on XML Schema.

Features

Zero configuration - The compiler's only needed input is an XML Schema.
No subsets - We attempt to map all aspects of XML Schema.
No dependencies - The Java output has no dependencies other than a DOM interface to your XML parser.
Licensing - We have not settled on a licensing scheme for this product. Its currently not finished, and user feedback will strongly influence our final decision. It goes without saying, that commercial licensing under suitable terms is always a possibility.

How does it map XML Schema to Java Classes

The mappings produced are usually predictable, however, the XML Schema type system is complicated, and is non-trivial to map into a typical OO language like Java(for a comprehensive discussion of the problems, read the paper Revealing the X/O impedance mismatch by Ralf Lämmel and Erik Meijer, also a conversation discussing the paper at Lambda the Ultimate).

This is further complicated by two other implementation choices:

We have chosen to avoid the need for a configuration file to assist in the mapping process.
We do not impose any requirements on the Schema other than it is a valid XML Schema.

Admittedly, there are many other choices that could be made, and we have a list below which details other projects which make different tradeoffs. We have also made a choice to use prefixing. You made find this distasteful, but its a reminder that if the schema changes, these classes will change, and you should probably consider an additional layer of mapping to deal with this.

Edge labelling vs. Node labelling.

This is one of the bigger issues in mapping problems(far from the only however). In languages such as Java, objects are "node labelled", basically, if an object is referenced from another object the reference is named in the referencing object. That name is unique within that object. However, the "subelements" of an XML element are "edge labeled", the references are ordered by number and "type"(e.g. tag name). A tag could have 5 sub-nodes, identically tagged. Now, this example could be handled with an array, but it is also possible to have a different tag intervening, followed by another collection of the same "tag type".

Instance Variables

Naming conflicts - currently, if there is a naming conflict, an integer will be postfixed to the name and incremented until a free slot is found.
Tags - Tags are almost always mapped to a class whose name is prefixed with XE(short for XML Element) followed by the name of the tag(case and all). If a tag is an instance of a complex type which is not tied to a top level element, and not a top level tag, there is a possibility it will be mapped to an instance of XT(for XML Type) followed by the name of the complex type.
Attributes - Since attributes are guaranteed to be "uniquely named", attributes become instance variables whose name is identical to the attribute name, and the type is mapped as best as possible to a Java native type.
Subelements - If a sub element is simply typed, it will be mapped to an instance variable, and the name of the instance variable will be the same as the tag(unless there is a conflict). In the case where the max occurences is greater than one, a vector(typed with simple type of the element) is used in place of a simple type. If the sub element is more complex, either a XEtag object will be used or an XTtype object.

Methods

Both XE and XT objects have two constructors, one with no arguments, and another which takes a DOM Element.
An XE object also has a zero argument method: public String toXML() which produces an XML fragment of this node and its children.
An XT object also has a one argument method: public String toXML(String tag) which produces an XML fragment of this node and its children. The parameter, tag, is used to label the tag type.

Validation

The constructors all take DOM nodes. It is assumed if you want your input XML validated, you will use a validating parser. While XML purists may be appalled, currently no constraint enforcement is performed on input or output with generated code. In the case of input, not all XML documents received from the outside will validate, and the generated code will do its best, ignoring unknown tags, and not generating errors.

For output, it is in the plan to provide an output phase validator. However, all validation will be performed in the toXML method, since it may be desirable to manipulate objects in to inconsistent states in the interim. It is also more efficient to save validation for the end, despite the fact that errors will be harder to track, it fits within the style of the design to provide code which presumes minimally about the needs of its users.

Usage: java -jar xsdCompiler.jar SchemaFile.xsd Include-Directory Output-Directory Package

SchemaFile.xsd - XML Schema to compile.
Include-Directory - Directory to look for included schema's.
Output-Directory - Location of generated Java.
Package - Package the generated Java should be declared in.

Metanotion XML Tools

Table of Contents

XML Schema Compiler(v0.1)

What's a Schema?