Boost logo

Boost Users :

From: Steve Horne (steve_at_[hidden])
Date: 2003-09-16 17:16:32


At 16:11 16/09/2003 -0400, you wrote:
>Steve Horne <steve_at_[hidden]> writes:
>
> > The main motivations in writing the C++ library were basically that
> > the standard library containers are too fragile, lack useful
> > functionality that cannot reasonably be retrofitted, and probably
> > (due to the red-black trees assumption of a constant-time memory
> > access model which is decades out of date and ignores the everyday
> > facts of caching and virtual memory) unnecessarily slow.
>
>Fascinating. Submitting these to Boost, perchance?

That sounds like an uphill struggle to me. My choosing to opt out of using
the STL containers myself is one thing, but submitting an incompatible
alternative to the standard library containers for widespread use is
something very different.

> > Using my containers, object lifetime and validity is already managed
> > even in C++, so I really don't need to worry about it in Python.
>
>I guess I'll have to trust you on that one. It's hard to see how you
>can accomplish your goals of optimizing speed and safety at the same
>time.

Of course there are trade-offs. I gain speed in some circumstances by my
choice of data structure. I lose speed due to the management and in order
to maintain auxiliary data within that structure which is used to provide
the extra features. And of course in some circumstances the standard
containers will be faster even if that management and auxiliary data is
disregarded.

There's also nothing revolutionary about the data structure I'm using - it
is essentially a very old and heavily tried-and-tested data structure. It's
just a variation of the age-old multiway trees, but stored in main memory
instead of on disk. In an age where caching and virtual memory are an
everyday fact of life on desktop PCs, I figure that main memory accesses
have a lot in common with disk accesses - though of course this would not
be true with most embedded systems.

By using a multiway tree rather than a binary tree, you lose the ability to
restructure the tree with only pointer rotations (you need to physically
move items on insertions and deletions) but you gain in cache and virtual
memory friendliness. Reads and simple writes are fast, and on the
assumption that most data gets read several times after being inserted
once, that can be a good tradeoff. And unlike an std::vector, the number of
items that need to be shifted is strictly limited by the size of the node.

I've gone with essentially what I believe are B+ trees (I've found it
difficult to get good definitions of the difference between B trees and B+
trees). That is, data is only held in 'leaf' nodes at the bottom layer of
the tree. It's also convenient to have prev and next links in the leaf
nodes for easy iteration, and for each leaf node to have a pointer to a
singly-linked chain of iterators (meaning that maintenance operations only
have to deal with the normally small number of iterators that refer into
one or two leaf nodes).

Add a count of the number of items in the subtree to each branch node and
subscripted access (while certainly not std::vector-fast) is also
reasonable. And because these nodes are quite large, the overheads are
quite small.

The convenience and safety benefits are always there, but speed benefits
are much more dubious. It's very easy to find cases where the standard
library containers will be much faster - if the items being held are large
and complex so that shifting the items within a node is slow, for instance.

Anyway, my code has several problems. Not least (1) it belongs to my
employer, and (2) it wasn't written to fit in with existing C++ libraries.
I'd love to think that the idea behind them might be adopted more widely,
but I suspect it's more of a niche thing - and maybe that niche is just me.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net