Subject: Re: [boost] [stl_ext_adv] ] Interest in advanced data structures
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2011-12-08 03:31:56
This message is a detailed reply to the analysis and comments by Joaquín M
The most challenging request, which required quite significant work, is
concerning memory consumption and performance of augmented B+ trees. This
aspect is quite broad. It is covered in two new sections of the updated
1) http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/bp_trees.html , the
section Space requirement ;
2) http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/performance.html , the
section Effect of augmenting on performance of B+ trees ;
The last document with performance measurements has been modified quite
considerably. It now includes tests of all variants of proposed B+ trees
and boost::multi_index_container as multiset.
Please have a look at the updated documentation and my replies in the text
below and let me know if I missed anything significant.
On Sat, Dec 3, 2011 at 11:25 PM, Joaquin M Lopez Munoz <joaquin_at_[hidden]>wrote:
> Vadim Stadnik <vadimstdk <at> gmail.com> writes:
> > Hi all,
> > Advanced data structures (ADS) offer useful STL extensions that are not
> > possible or not efficient with basic data structures, such as arrays,
> > linked lists and red-black trees.
> > The project
> > https://github.com/vstadnik/stl_ext_adv
> > http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html
> > focuses mainly on a generalized and unified computer representation of
> > sequences using the augmented B+ trees. It provides the following STL
> > extensions [...]
> This is a rough study of the functionality provided by your lib in
> general and as compared with Boost.MultiIndex.
> * Container_bptree's internal data structure relies on B+
> trees with two variants based on the "augmented data structure"
> idea explained in Cormen, Leiserson, Rivest and Stein's "Algorithms"
> (an online explanation can be found at http://tinyurl.com/d8lze7v ,
> for instance.) Briefly put, such augmenting scheme boils down
> to adding additional information to the nodes updated on insertion
> time and erasure time which is based on the additional info of the
> sibling nodes. The variants are:
> - bp_tree_idx: augmented to allow for efficient indexing
> - bp_tree_idx_acc: doubly augmented to support efficient
> indexing plus a generalized "accumulate" notion usable for
> calculating summatories, mean values, etc.
> * Question #1: why augmenting on B+ trees? The augmenting idea
> can be as easily built on top of RB-trees, which are
> the weapon of choice for implementing associative containers.
> Unless proved otherwise, RB-trees are assumed to be faster than B+
> trees for in-memory structures (which your own tests hint at, cf.
> "insert N random elements" table.) You say:
> "The red black trees are yet out of the scope of this project
> for the reasons explained in the documentation."
> I've failed to find such explanation. Also, I suspect memory
> consumption is higher with B+ trees than RB trees.
> - Request #1: explain why B+ trees rather than RB trees.
Thats my fault, I should have added to the original message reference to
The section Advantages of B+ trees briefly covers the topic.
- Request #2: provide code for performance tests so that
> Boost.MultiIndex and oher data structures can be compared.
The newly added file perform_tests.hpp contains the code of test
algorithms. My tests and timer are Windows specific, I am planning to
replace the timer with one of Boost timers and update this code. I have
added performance measurements for Boost multi_index_container as mutiset
to the document mentioned at the top of this message.
> - Request #3: augment the performance tests so as to yield
> memory consumption figures.
This request is covered in two new sections of the updated documentation as
explained at the beginning of this message.
> * Usage scenario 1: Plain B+ trees as a replacement for RB tress
> for associative containers. My hunch is that RB-trees would
> perform better, as explained above.
I agree absolutely,
Replacing red black trees with the augmented B+ trees is only justified if
it is beneficial for a user algorithm, such as using a new extended
functionality or a new efficient algorithm.
> * Usage scenario 2: sequence-like containers with efficient
> insertion/erasure. Directly based on bp_tree_idx, with a
> do-nothing sorting critera. These containers are not truly
> random-access, but rather provide O(log n) access, which can
> be acceptable (or not) for some applications.
This is correct, one operation supporting random access iterators has
logarithmic cost in the worst case. The new sequences are a good choice
when a user algorithm requires efficient insert, erase operations, which
outperform std::vector as shown in performance tests. They can also replace
inefficient bidirectional iterators of std::list.
> * Usage scenario 3: associative container with random access
> iterators. This roughly compares with a multi_index_container
> equipped with two indices, one ordered, the other random-access.
> Basic differences between both:
> - The multi_index_container provides truly O(1) random access
> whereas Container_bptree is O(logN) (which is not random access
> strictly speaking.)
> - With the multi_index_container, sorting order and insertion
> order are separate --some sort of synchronization between both
> indices should be needed to mimic the Continer_bptree case.
> - Insertion/erasure with the multi_index_container is linear
> (though much faster than with std::vector), logarithmic with
> - The memory overhead introduced by the multi_index_container
> is two pointers vs. only one word for Container_bptree.
The Boost Multi-index library offers a very rich functionality. My
knowledge of this library is not yet sufficient to make specific
suggestions. Nevertheless, I think that this library can benefit from the
proposed B+ trees and containers as well as from the method of augmenting
applied to other data structures. I will be happy to help in this area.
> * Usage scenario 4: containers with fast accumulate, a direct
> application of the augmenting idea explained above. There is
> nothing in Boost that provides this functionality.
Thank you for confirming this, since in one of previous messages there was
information that logarithmic efficiency of the standard function
accumulate() is already supported by Boost library.
> My conclusions:
> 1. Augmented data structures (of which efficient indexing
> is just a particular application) are definitely useful and
> worth having in Boost.
> 2. Unless proved otherwise, we should be using RB trees
> (or maybe AVL) rather than B+ trees.
> 3. Depending on the particular needs, some usage scenarios
> can be better served with Boost.MultiIndex (note I'm
> saying "depending on needs", since differences between
> approaches are substantial.)
> 4 To me, rather than providing out of the box containers with
> augmenting we should provide this via two libraries:
> - As an additional capability of Boost.Intrusive associative
> containers (and AVL-tree based containers as well, if
> technically possible.)
> - As an additional capability of Boost.MultiIndex ordered
> Hope this helps,
> Joaquín M López Muñoz
> Telefónica Digital
Thank you for your analysis and conclusions.
I am happy for the Boost experts to decide how to use the proposed data
structures and containers. To start work I need a list of tasks and/or
requests with priorities.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk