Subject: Re: [boost] [ANN] libstudxml - modern XML API for C++
From: Boris Kolpackov (boris_at_[hidden])
Date: 2014-05-21 13:57:49
In gmane.comp.lib.boost.devel you write:
> Does it support a DOM-like API, i.e. an in-memory representation of the
> document ?
No, it does not. I spent quite a bit of time on the in-memory vs
streaming debate in my talk. How I wish the video was already
Until then, to summarize the key points:
* Most people think they need DOM. I believe it is not because in-memory
is conceptually better but because of the really awful and inconvenient
streaming APIs (like SAX). So I tried to convince the audience that a
well designed streaming pull API is actually sufficient for the majority
of cases. I didn't hear many objections.
Take a look at the API Introduction, it shows how to handle everything
from converters/filters that don't care about the data, to applications
that process the data without creating any kind of in-memory object
model, to C++ classes that know how to persist themselves in XML.
* On that last point (C++ class persistence) a lot of applications
extract XML data into some kind of object model (C++ classes that
correspond to the XML vocabulary). Creating an intermediate
representation of XML (DOM) just to throw it way moments later
seems kind of pointless.
* Of course there will always be applications that need to revisit
the bulk of raw XML data and for them in-memory would probably
always be a better choice.
* Which brings us to this point: it is easy to go from streaming to
in-memory but not the other way around.
* In fact, an even better approach would be to support hybrid, partially
streaming/partially in-memory parsing and serialization (also discussed
in the talk). Then, the fully in-memory would simply be a special case.
* libstudxml has the âhybridâ example which shows how to implement this
hybrid approach. You would be shocked how short and simple the code
is (I know I was once I wrote it ;-)).
> I have always strongly argued against the idea that an "XML API" was
> only about parsing XML data, as there are many useful features that
> involve manipulation of XML data (including transformations between
> documents, xpath-based search, etc.).
You need to start somewhere. And support for (relatively) low-level XML
parsing and serialization seems like a good place.
> In fact, I believe such an API should be robust enough to be able to
> wrap different backends, rather than depending on a particular
> implementation choice.
I don't think it will be robust. I think it will be awful and inconvenient.
Try to adapt straight SAX API to anything other than callback-based with
inversion of control (i.e., SAX again).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk