These test results are presented as three separate sets of data. The small document tests use four collections of small documents:

  1. soaps (.4-1.4 KB) - SOAP request and response messages, taken from the Apache Axis interoperability test results. These use namespaces extensively, some attributes along with character data content.
  2. fms (each about 5 KB) - RDF files giving new release information from freshmeat.net. Heavy use of namespaces, a few attributes along with character data content.
  3. ants (.5-9.9 KB) - Ant build.xml files taken from Jakarta Taglibs projects. No namespaces, heavy attributes and comments with little character data content.
  4. webs (.2-36 KB) - Jakarta Taglibs taglib.tld and web.xml files. No namespaces, character data content with no attributes.

The mid-sized document tests use four documents in the 100-200 KB size range. The documents used were:

  1. soap.xml (131 KB) - a generated SOAP document containing a large list of values. Some namespaces and attributes, flat structure consisting of simple elements with short character data content. This document was generated by Aleksander Slominski as a SOAP test case.
  2. much_ado.xml (197 KB) - the Shakespeare play marked up as XML. No namespaces or attributes, flat structure consisting of simple elements with relatively long character data content. This came from Jon Bosak's collection of documents.
  3. periodic_table.xml (114 KB) - periodic table of the elements in XML. No namespaces and light attribute usage, fairly complex structure consisting mainly of elements with short character data content. This document originated with Elliotte Rusty Harold.
  4. xml.xml (192 KB) - the XML specification as XHTML, with the DTD reference removed and all entities substituted (necessary for some of the models used in the tests). This was chosen as typical of document presentation markups, with heavy mixed content. No namespaces, light use of attributes. Note that this file cannot be included in the distribution due to the licensing, which requires that it be distributed only in unmodified form - to use this, you'll need to remove the DTD reference and substitute &#xxx values for entities yourself.

Finally, a single document is used to test performance for large documents:

  1. weblog.xml (2.9 MB) - web server access log file formatted as XML. This consists of approximately 10K elements representing page hits, each containing several child elements with character data content for the fields of information. No namespaces and no attributes. This document was received from David Mertz.