Boost logo

Boost :

From: Marcin Kalicinski (kalita_at_[hidden])
Date: 2008-03-12 20:14:43


I'll try to shed some light on what has happened to property_tree since
review and why it's been dragging.

Since review I had some feedback about the library. The issue raised most
frequently was unsatisfactory performance. Ptree does too many memory
allocations, and thus is fairly slow and unsuitable for many users. Plus
some of the original parsers (XML parser specifically) were really slow and
took ages to compile, especially on gcc. Because this is headers only
library that is supposed to be lightweight, this was a serious problem.

To solve it I spent some months implementing a very fast in-situ XML parser
for the library (called rapidxml - see rapidxml project on sourceforge).
This was finished in August last year and integrated with property tree. It
works very well and is so fast that time to parse XML is now totally
insignificant compared to the time it takes to build a ptree from the parsed
data.

The allocations problem remains. I have been unable to come up with a scheme
that would reduce the number of allocations without compromising simplicty
of the library. The key point is that I want to maintain validity of
iterators in presence of insertions/erases. This rules out array based
containers. The best I can think of is a custom list implementation. That
has potential to reduce number of allocations by roughly 30%, which is not
enough IMO.

On top of that I'm not sure whether the heavy type-parametrization of the
library (i.e. lots of template parameters everywhere) is a good thing. It
definitely makes it harder to document, use and understand. In all the the
feedback I got there is no evidence of anyone actually using these template
parameters. So IMO these should be reduced to just the character type.
Otherwise the library pretends to be a generic tree container, which it
wasn't supposed to be.

Finally, probably the main reason the library is dragging is docs. The
original docs (still available at
http://kaalus.atspace.com/ptree/doc/index.html) were written in a gargantuan
effort manually in HTML. This was a horrible experience that I do not wish
to repeat. Some sort of automated docs generation is needed. I know about
QuickBook and I like its wiki syntax. About a year ago I even managed to
setup the toolchain correctly (including doxygen). The output of
[code->doxygen->quickbook->boostbook->xml->xslt->html] pipe was not very
satisfactory though, the reference entries were badly formatted and
scattered all over the place in a disorganized fashion.

I wouldn't like to give up the library because I still work on it
occasionally. But as I wasn't able to solve major problems (docs tools,
performance) satisfactorily, the work towards the release has ground to a
halt.

IMO the issues that should be addressed before release are:
- Simplification by removing some of the template arguments (arguable -
maybe someone is using that?)
- Figure out which parsers are unneccessary and get rid of them (cmdline
parser?)
- Get some docs toolchain to work properly and provide up-to-date docs. Lots
of the text can be ported over (and my monkey english corrected :-) from the
original docs.
- Speedup through use of a custom container (to replace std::list, needs
some profiling work if it's even worth it)

If someone could help with the docs toolchain (Quickbook experts? Any
alternatives?) I'm happy to do the rest. I should probably give up on trying
to speed it up further and concentrate on getting it into releasable state.

thanks,
Marcin


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk