Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-07-09 18:22:55


Stefan Seefeld wrote:
> over the last couple of years we have discussed possible XML APIs
> for inclusion into boost. As I already had an early prototype for
> such an API, I kept evolving it, based on feedback from those
> discussions.
> A couple of weeks ago I actually checked it into the sandbox
> (http://svn.boost.org/trac/boost/browser/sandbox/xml).

> PS: The current scope of the project is described in
> http://svn.boost.org/trac/boost/browser/sandbox/xml/README

Hi Stefan,

My comments follow; these are based on maybe half an hour looking at
your code, but it's quite possible that I have missed something. As
others have pointed out, it would be easier to evaluate with some more docs...

I certainly agree that C++ would benefit from an XML API and Boost is a
good place to develop it.

As far as I can see, what you have is a wrapper around the GNOME
libxml2 (which has an MIT-license and is cross-platform) that
implements something that you call dom, but is not the standardised
"DOM" API for XML (http://www.w3.org/DOM/).

I think that two C++ APIs for XML document manipulation could be justified:

(a) DOM. This has the benefit of being standardised, so you can
transfer at least your experience and to some extent actual code from
one language to another (e.g. C++ to/from Javascript in my case). On
the other hand it is a rather verbose and unenjoyable API that isn't a
great match to 'modern' C++.

(b) A standard-library-like API (e.g. attributes are a map, child nodes
are a sequence). This would have the benefit of familiarity to users
of the C++ standard library, and I think it would be a more concise and
usable API.

As far as I can see, what you have created is something that isn't (a)
or (b) but falls somewhere between. For example, you provide iterators
rather than the nextSibling-style functions of DOM, but you provide
custom functions like append_element and set_attribute rather than
standard-library-like append() and operator[] implementations. For
example, compare:

- DOM:
e.setAttribute("color","red");
e.appendChild(doc.createElement("P"));

- Yours:
e.set_attribute("color","red");
e.append_element("P");

- STL-like:
e.attributes["color"]="red";
e.children.push_back(new Element("P"));

In the past I have used a library called xmlwrapp. You should take a
look at it if you have not done so already. It has a very liberal
license (boost-like). It is also a C++ libxml2 wrapper and as I recall
its style is similar to yours. It seemed to do nearly everything that
I wanted. I remember being confused about the ownership semantics of
pointed-to objects sometimes; what is your policy? (e.g. if I copy a
subtree to another place in the document, is it a deep copy or a
pointer copy? Copy-on-write? When is it freed? Reference counted?)
I was also surprised once with the memory inefficiency: you might like
to consider how many MB of RAM are needed to store in-memory a document
that is X MB on disk, for examples with many small nodes or fewer
larger nodes. In my case, it would have helped to use some sort of
dictionary for element and attribute names.

One thing that xmlwrapp did not offer was a way to access the
underlying libxml2 C 'object'. While this is normally an
implementation detail that you would like to hide, note that there are
other C libraries that you might want to use; I think the one that I
was looking at was the SVG renderer librsvg [attn Jake!]. I wanted to
build an in-memory XML/SVG document in my C++ code and then convert it
to a bitmap, but because xmlwrapp wouldn't let me get at the raw
libxml2 stuff, I couldn't, and had to go via a temporary file. (Or
maybe I hacked it, can't remember.) Doing XSLT transformations would
be another example where this would be necessary.

I hope these comments are useful; what do others think?

Regards,

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk