Boost logo

Boost :

Subject: Re: [boost] Seeking feedback on SegmentedTreeSeq (sequence data structure with efficient random access insert and erase)
From: Chris Clearwater (chris_at_[hidden])
Date: 2015-11-09 18:01:47


On 11/09/2015 10:14 AM, Phil Endecott wrote:
> I refer to the feedback that I gave last time you posted:
>
> http://thread.gmane.org/gmane.comp.lib.boost.devel/263167

Thanks for your feedback Phil. Sorry I did not properly respond in the
original thread.

> I think this could be most useful if it were decomposed into
> several layers:
>
> - An in-memory B+-Tree
> - An augmented tree
> - An order-statistics tree
>
> That way, we could have in-memory B+-trees the provide a std::map-like
> interface, which would surely be useful, and we could have O(log N)
> random access insert/erase on a binary tree where that is appropriate,
> and we could have other kinds of augmented trees such as interval trees.
>
> Working out exactly what these layers should look like is not easy,
> but I think it's worth doing in order to broaden the applicability.

My library implements a counted B+Tree. I see this as the basis to which
other indexes can be added. In particular I would like to add support
for maps and sets as this would provide a high performance alternative
to the standard containers for trivially-copyable keys. I see no reason
to make the count index optional. The memory overhead of storing the
count index for a set where the key is 64 bits is less than a tenth of a
percent. This is because the counts stored in level 1 of the tree are
used to encode the length of segments, which maps/sets need to know as
well as sequences. I have also benchmarked the CPU overhead of
maintaining the count index and found it to be negligible. On top of
this, it would dramtically complicate implementation for the count index
to be optional as I would still need to store the segment lengths
somewhere and implement iterator operations less efficiently when it was
not present. Keeping the index around would also mean that maps/sets
would recieve efficient access by position for practically free.

That being said, I think the data structure is useful today and should
be a part of Boost. Pouring over the mailing list I see that other
people have proposed and even implemented sequences with logarithmic
random access operations many times, so clearly there is a need for such
a thing in the community. The reaon I am posting again to the list is
because I have worked on reference documentation, boost documentation,
testing, and polished the implementation. This addresses some of the
comments from my last post so it's worth taking another look at.

Regards, Chris.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk