Boost logo

Boost :

Subject: Re: [boost] [GSoC] Boost.XML
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2010-03-20 19:00:01


Ilie Halip wrote:
> I have a few questions about the Boost.XML project.
>
> First, what actually needs to be done?

Shall we have another thread about what a good C++ XML library would
look like? It's been a while since the last one...

I have done a couple of projects using rapidxml, and until recently my
feeling was that it was close to the best design. If you're not
familiar with it, it holds the XML in memory (e.g. as a memory-mapped
file) and does a single-pass parse that builds up a tree that points
into the original XML for the strings. This is fast and reasonably memory-efficient.

However, recently I needed something that used less memory. I wanted
to process a very large file without ever having all of it in memory
(imagine e.g. loading a database). So I wrote something where the
element and attribute iterators (etc.) are just pointers into the
(memory-mapped) XML source. When an iterator is incremented it steps
through the source looking (textually) for the start of the next
element or attribute (etc.). The result is something that uses almost
no memory and is fast for the sorts of access pattern that I needed.

An interesting observation is that both a rapidxml-like method and my
new method could have very similar interfaces, albeit with different
complexity (c.f. std::vector vs. std::list). So it is interesting to
consider whether something like an XPath engine could be designed in
terms of an interface to multiple back-end "XML containers", if they
shared the same interface.

In fact, something "XPath-like" but also more "C++-like" would be the
next step to improve the "user" code in my application. Currently I
have too much verbose iteration looking for the elements that I want.
It would be great to have a XPath-like DSL for finding these elements.
(An application for Proto?)

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk