[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

new tool xml-hddm introduced



Fellow simulators,

At the software workfest we decided to use xml as a sort of global language
that all of our simulation and analysis tools would be able to speak, in
addition to their local dialects.  The advantages of xml are how easy it is
understand the structure of the data just by browsing the data, and the
well-tested tools that exist in open-source form to create and manipulate
(query) them.  However reading and writing large data sets in xml can have
a lot of overhead, compared with plain old unformatted io.  Even for Geant
simulations which are cpu-intensive, the overhead of straight xml io is
significant.

To help with this, I introduced the concept of a "data model document" that
functions as an xml template for an event.  That template needs only to be
written out once at the beginning of a data file, and then following that it
is sufficient to just "fill in the blanks" for each event.  With this
approach you pay the overhead of parsing the xml only for the first event,
and from then on it is just unformmatted io.  The savings are large both
in processing overhead and in data volume, without sacrificing the flexibility
of xml.  A file formed in this way is tagged with the .hddm suffix.

All of this is now done automatically by tools that exist in the hddm
section of the cvs repository.  These tools are now being used to stitch
together the different pieces of the simulation chain.  They now include
two useful tools to go to/from plain-text xml that complete the suite that
we envisioned at the workfest.

  hddm-xml: read in a .hddm file and write it out as plain-text xml.
  xml-hddm: read in plain-text xml and a template and write a .hddm file.

The second one is new.  As of yesterday it went into the cvs repository.
This one was harder to write, because one has to deal with parsing a
free-form document, with all of its possible variations.

Thanks to Elliott for pointing out the need for such a tool.

Richard Jones



ps.  Thanks also to Elliott for showing how the common gzip utility can
further compress a hddm file by a factor of 2, saving me the wasted effort
of trying to optimize the hddm serializing code for data volume.  By
moving data around in longwords the hddm serializer/deserializer is
better optimized for efficiency than data volume.  For cases where volume
is more important than cpu time, there is gzip.