Boost logo

Boost :

Subject: Re: [boost] [ANN] libstudxml - modern XML API for C++
From: Dominique Devienne (ddevienne_at_[hidden])
Date: 2014-05-21 14:37:33


On Wed, May 21, 2014 at 7:57 PM, Boris Kolpackov
<boris_at_[hidden]> wrote:
>> In fact, I believe such an API should be robust enough to be able to
>> wrap different backends, rather than depending on a particular
>> implementation choice.
>
> I don't think it will be robust. I think it will be awful and inconvenient.
> Try to adapt straight SAX API to anything other than callback-based with
> inversion of control (i.e., SAX again).

SAX is not that bad, once you have a layer on top to push/pop handlers
for various XML elements. That's the technique the Java Ant build tool
used (on top of the Java SAX APIs), and I've adapted the same
technique on top of Qt's SAX API, before Qt's pull-parser came along.
Basically each function of a recursive descent parser is replaced by a
handler instance, and the C/C++ function stack is replaced by an
explicit stack. But that's beside the point, I also agree with you
that a pull-parser is much nicer to program against, and the DOM-like
APIs can easily be layered on top of those.

But it's actually harder that it looks to properly implement a
standard compliant XML parser dealing correctly with DTDs, character
and system entities, encodings, namespaces, space normalizations,
default attributes from inline or out-of-line DTDs, etc, etc... That
you base your library on the long established Expat parser, from James
Clark, one of the world's XML expert, is probably a good thing,
although the fact it hasn't seen any release since 2007 is a bit
worrying (and the license might indeed be an issue).

Many people don't care about these XML "details", but any library
worthy of boost that wants to be a foundational building block (in
Niall's term) at the bottom of a Boost/C++ XML ecosystem should strive
for full conformance IMHO, or at least provide all the low-level tools
to allow another library on top to be conformant.

Some apps want very-low level knowledge of the structure of an XML
document, including all low level irrelevant whitespace, processing
instructions, character entities, etc... (something even the XML
standards don't necessarily allow), while others don't care and want
the XML *InfoSet* as specified in the XPath/XSL standards.

Then there's also schema-aware processing which associates XSD types
to elements, validating parsers (DTDs or XSDs), etc...

Sounds like your library targets the lower-level parsing part, but
even that is non-trivial and rarely truly conformant in the many XML
libraries out there, so hopefully you're aware of all this, and will
explicitly document your conformance level, or lack thereof.

My $0.02. --DD


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk