Boost logo

Boost :

Subject: Re: [boost] [ANN] libstudxml - modern XML API for C++
From: Boris Kolpackov (boris_at_[hidden])
Date: 2014-05-21 13:57:49


Hi Stefan,

In gmane.comp.lib.boost.devel you write:

> Does it support a DOM-like API, i.e. an in-memory representation of the
> document ?

No, it does not. I spent quite a bit of time on the in-memory vs
streaming debate in my talk. How I wish the video was already
available...

Until then, to summarize the key points:

* Most people think they need DOM. I believe it is not because in-memory
  is conceptually better but because of the really awful and inconvenient
  streaming APIs (like SAX). So I tried to convince the audience that a
  well designed streaming pull API is actually sufficient for the majority
  of cases. I didn't hear many objections.

  Take a look at the API Introduction[1], it shows how to handle everything
  from converters/filters that don't care about the data, to applications
  that process the data without creating any kind of in-memory object
  model, to C++ classes that know how to persist themselves in XML.

* On that last point (C++ class persistence) a lot of applications
  extract XML data into some kind of object model (C++ classes that
  correspond to the XML vocabulary). Creating an intermediate
  representation of XML (DOM) just to throw it way moments later
  seems kind of pointless.

* Of course there will always be applications that need to revisit
  the bulk of raw XML data and for them in-memory would probably
  always be a better choice.

* Which brings us to this point: it is easy to go from streaming to
  in-memory but not the other way around.

* In fact, an even better approach would be to support hybrid, partially
  streaming/partially in-memory parsing and serialization (also discussed
  in the talk). Then, the fully in-memory would simply be a special case.

* libstudxml has the ‘hybrid’ example which shows how to implement this
  hybrid approach. You would be shocked how short and simple the code
  is (I know I was once I wrote it ;-)).

[1] http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml#2

> I have always strongly argued against the idea that an "XML API" was
> only about parsing XML data, as there are many useful features that
> involve manipulation of XML data (including transformations between
> documents, xpath-based search, etc.).

You need to start somewhere. And support for (relatively) low-level XML
parsing and serialization seems like a good place.

> In fact, I believe such an API should be robust enough to be able to
> wrap different backends, rather than depending on a particular
> implementation choice.

I don't think it will be robust. I think it will be awful and inconvenient.
Try to adapt straight SAX API to anything other than callback-based with
inversion of control (i.e., SAX again).

Boris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk