Boost logo

Boost :

Subject: Re: [boost] [ANN] libstudxml - modern XML API for C++
From: Stefan Seefeld (stefan_at_[hidden])
Date: 2014-05-21 14:17:24


On 2014-05-21 13:57, Boris Kolpackov wrote:
> Hi Stefan,
>
> In gmane.comp.lib.boost.devel you write:
>
>> Does it support a DOM-like API, i.e. an in-memory representation of the
>> document ?
> No, it does not. I spent quite a bit of time on the in-memory vs
> streaming debate in my talk. How I wish the video was already
> available...

Let me know when it is, I'm looking forward to hear your arguments. :-)

> Until then, to summarize the key points:
>
> * Most people think they need DOM. I believe it is not because in-memory
> is conceptually better but because of the really awful and inconvenient
> streaming APIs (like SAX). So I tried to convince the audience that a
> well designed streaming pull API is actually sufficient for the majority
> of cases. I didn't hear many objections.
>
> Take a look at the API Introduction[1], it shows how to handle everything
> from converters/filters that don't care about the data, to applications
> that process the data without creating any kind of in-memory object
> model, to C++ classes that know how to persist themselves in XML.
>
> * On that last point (C++ class persistence) a lot of applications
> extract XML data into some kind of object model (C++ classes that
> correspond to the XML vocabulary). Creating an intermediate
> representation of XML (DOM) just to throw it way moments later
> seems kind of pointless.
>
> * Of course there will always be applications that need to revisit
> the bulk of raw XML data and for them in-memory would probably
> always be a better choice.

Right. I can agree with you that a good API over SAX (or reader) could
be better than DOM in certain cases, but not all. Just think of someone
wanting to write an XML editor (e.g., to edit XHTML or DocBook
documents), with support for standard XML features such as xinclude,
xpath-based search, perhaps even xslt-based transformations.

Again, I'm definitely not suggesting everyone needs those features, but
there has to be a place where these can be added in boost.xml.

> * Which brings us to this point: it is easy to go from streaming to
> in-memory but not the other way around.

Yes, of course, a DOM API can be implemented on top of a streaming API.
But you are pushing down the road of yet another implementation of XML,
which I strongly object to. I'm not against anyone re-implementing an
XML library. But as I said, I don't think Boost.XML should mandate a new
implementation with so many existing choices. There just is no point in
such an exercise, other than self-education.

> * In fact, an even better approach would be to support hybrid, partially
> streaming/partially in-memory parsing and serialization (also discussed
> in the talk). Then, the fully in-memory would simply be a special case.
>
> * libstudxml has the ‘hybrid’ example which shows how to implement this
> hybrid approach. You would be shocked how short and simple the code
> is (I know I was once I wrote it ;-)).
>
> [1] http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml#2

Again, I'm resisting to get dragged into a discussion about
implementation A vs. implementation B. I don't want to argue about that.
I'm arguing for a Boost.XML API that supports multiple choices of
backends. This is mostly a maintainability question. XML is a complex
standard, with occasional updates and new feature additions. Just adding
a few new wrappers around existing implementations is far easier than
having to re-implement things just because of a bad design decision when
Boost.XML first came into being...

>> I have always strongly argued against the idea that an "XML API" was
>> only about parsing XML data, as there are many useful features that
>> involve manipulation of XML data (including transformations between
>> documents, xpath-based search, etc.).
> You need to start somewhere. And support for (relatively) low-level XML
> parsing and serialization seems like a good place.

>> In fact, I believe such an API should be robust enough to be able to
>> wrap different backends, rather than depending on a particular
>> implementation choice.
> I don't think it will be robust. I think it will be awful and inconvenient.
> Try to adapt straight SAX API to anything other than callback-based with
> inversion of control (i.e., SAX again).

Have you looked at existing XML libraries before you started libstudxml
? Did you know about Boost.XML, or Arabica ?
Anyhow, I'm not trying to convince you that you should change anything.
I'm trying to show you how a thin wrapper can look like.

Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk