Boost logo

Boost :

Subject: Re: [boost] [GSOC] XML library of Boost
From: Bjorn Reese (breese_at_[hidden])
Date: 2013-05-06 05:43:39


On 05/05/2013 07:00 PM, Stefan Seefeld wrote:

> Define "simple needs". I bet there are as many different expectations
> for that as you ask people.

But that does not mean that we should ignore their needs.

We do not have to look further than Boost to find use cases wherein XML
is used as an encoding format and nothing else.

Boost.PropertyTree has a tree data structure that can be saved in XML,
JSON, INI, or its own file format. It therefore needs to parse an XML
document into its own data structure.

Boost.Serialization has an XML archive that needs to parse an XML
document into user-defined data structures. XmlReader would be a perfect
fit for Boost.Serialization.

With a builder design pattern both can be handled directly without any
intermediate DOM data structure. I am going to elaborate on that below.

> How would you package boost.xml, to offer these different
> implementations with varying feature sets ? I don't see any reasonable
> way to achieve that.

In the same way that you intend to support wrappers for libxml2 and
Xerces.

> In contrast, there are a couple of well-established APIs to deal with
> XML (notably SAX, XMLReader, and DOM), it just so happens that none of
> them are available as standard C++ APIs.

I must have expressed myself badly, if I left you with the impression
that I am against these APIs or C++ versions thereof. Quite to the
contrary. Let me outline how I would approach this project:

Start with an XML lexer. This simply returns the next token (start tag,
attribute, data, etc.) when called.

Put the XML lexer in a loop, and you get a SAX parser.

Pair the XML lexer with a parent stack, and you get an XmlReader.

Base the DOM parser on the SAX parser to create its tree. This is how
libxml2 does it, and how it reuses the tree generator for parsing other
formats such as HTML and DocBook.

By default, I would provide our own tree, although this is not terribly
important.

If I want to use XML Schema or XSLT, I would instead replace the
builder (the SAX callbacks) with one for libxml2, and then use libxml2
for validation or transformation. Creating such a libxml2 builder is
straight-forward, because libxml2 already supplies it in its API:
xmlDefaultSAXHandler. No maintenance nightmare here.

Another advantage of using this interpreter/builder split, is that it
gives our users the freedom to create new frontends for alternative XML
encodings, such as binary XML or SXML (XML as S-expressions.) This
would not be possible if we only created a wrapper for libxml2.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk