Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2006-09-07 16:00:09


On Wed, September 6, 2006 7:30 pm, Stefan Seefeld wrote:
> Have you followed the discussions around my proposal for an XML API in
> boost that I implemented on top of libxml2 (http://xmlsoft.org/) ?

I wasn't around for the early discussions, but have caught up on them now.
It's
an interesting discussion, though I'm not sure how relevant it is. I'll
elaborate later.

> I think that starting a new implementation from scratch is the wrong
> way to approach this (rather big) topic.
>
> This in particular since an 'XML library' shouldn't just provide
> ways to de- and encode XML documents into generic tree structures, but
> instead needs to provide quite a substantional amount of functionality in
> order to be considered complete (even if you approach this in a modular
> way). As an example, imagine querying your DOM-like structure with an
> XPath expression. Think about all this does involve, from regular
> expression handling, over XPath pattern matching, http lookup, entity
> handling, unicode, etc., etc.
>
> This is why I don't think that you should think about such a project
> one step at a time (e.g. the 'XML reading side of things').

It seems to me that your earlier proposal was mainly about a few API
specifications, that were then supposed to be implemented somehow -
preferably
on top of an existing XML library, in order to avoid reinventing the wheel.

This idea certainly has a lot of merit, but it also has some distinct
disadvantages.

First, an API specification is nice for standardization, but not very usable
within the context of Boost. In order to be useful, there must be at least
one
implementation of the API. Otherwise, the specification is worth nothing
to the
end user.
This implementation must exist within Boost, i.e. it must be completely
contained within Boost. Libraries like Regex and Iostreams offer enhanced
functionality if certain external libraries are available, but they will work
without them, too. Obviously, the Xml library could not work without the
external XML implementation if it is just a wrapper around it.
This means that, if the library is a wrapper around an external one, the
external library (let's for argument's sake assume libxml2, which seems to
bring
less licensing trouble compared to Xerces, the only other sufficiently
complete
XML library I can think of) must be distributed with Boost.
What does this entail?
The library must build as part of Boost. I haven't checked, but I assume
libxml2's build system right now is based on automake. That would have to be
translated to Boost.Build. As part of this process, configuration macros
might
need to be translated. This could easily lead to a real fork of the code
base.
Unless Boost wants to rely on the regression testing done by the authors of
libxml2, regression tests, portability tests and everything else must be
written and maintained.
And last but certainly not least, there's the licensing issue. Boost is
working
hard to get all code under the Boost license. Would we want an external
library
under any other license, no matter how permissive, in that code base? Or
would
the authors of libxml2 permit relicensing of the source? (As a programmer,
I'd
rather reimplement a library than pursuing such goals. ;) )

Second, the recommendation focused on a DOM-style API. As at least two people
[1][2] pointed out, DOM-style APIs are not as universally useful as other
APIs.
That said, I do intend to provide a DOM-syle API, but only after having
completed the event-based API and thought long and hard about what a
DOM-style
API means in C++.
Still, this is one of the main reasons why I asked for real-world use
cases. My
own uses of XML have usually been satisfied by SAX, although I would have
preferred a pull-style API. I'd love to hear how other people use XML.
I know that two Boost-internal uses could work with a pull API very well:
Property Tree's XML reader and the Serialization XML archive.

To sum up, I do believe we should reinvent the wheel here. But we should
create an improved wheel, and I think the Boost community is uniquely
suited to
create a wheel that works particularly well with C++.

To maintain thread integrity, I'll reply to each post individually.

[1] http://lists.boost.org/Archives/boost/2005/11/96131.php
[2] http://lists.boost.org/Archives/boost/2005/11/96521.php


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk