Boost logo

Boost :

Subject: Re: [boost] [stl_ext_adv] ] Interest in advanced data structures
From: Joaquin M Lopez Munoz (joaquin_at_[hidden])
Date: 2011-12-03 07:25:58


Vadim Stadnik <vadimstdk <at> gmail.com> writes:

>
> Hi all,
>
> Advanced data structures (ADS) offer useful STL extensions that are not
> possible or not efficient with basic data structures, such as arrays,
> linked lists and red-black trees.
>
> The project
>
> https://github.com/vstadnik/stl_ext_adv
>
> http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html
>
> focuses mainly on a generalized and unified computer representation of
> sequences using the augmented B+ trees. It provides the following STL
> extensions [...]

Hi,

This is a rough study of the functionality provided by your lib in
general and as compared with Boost.MultiIndex.

* Container_bptree's internal data structure relies on B+
trees with two variants based on the "augmented data structure"
idea explained in Cormen, Leiserson, Rivest and Stein's "Algorithms"
(an online explanation can be found at http://tinyurl.com/d8lze7v ,
for instance.) Briefly put, such augmenting scheme boils down
to adding additional information to the nodes updated on insertion
time and erasure time which is based on the additional info of the
sibling nodes. The variants are:
  - bp_tree_idx: augmented to allow for efficient indexing
  - bp_tree_idx_acc: doubly augmented to support efficient
  indexing plus a generalized "accumulate" notion usable for
  calculating summatories, mean values, etc.

* Question #1: why augmenting on B+ trees? The augmenting idea
can be as easily built on top of RB-trees, which are
the weapon of choice for implementing associative containers.
Unless proved otherwise, RB-trees are assumed to be faster than B+
trees for in-memory structures (which your own tests hint at, cf.
"insert N random elements" table.) You say:

  "The red black trees are yet out of the scope of this project
  for the reasons explained in the documentation."

I've failed to find such explanation. Also, I suspect memory
consumption is higher with B+ trees than RB trees.
  - Request #1: explain why B+ trees rather than RB trees.
  - Request #2: provide code for performance tests so that
  Boost.MultiIndex and oher data structures can be compared.
  - Request #3: augment the performance tests so as to yield
  memory consumption figures.

* Usage scenario 1: Plain B+ trees as a replacement for RB tress
for associative containers. My hunch is that RB-trees would
perform better, as explained above.

* Usage scenario 2: sequence-like containers with efficient
insertion/erasure. Directly based on bp_tree_idx, with a
do-nothing sorting critera. These containers are not truly
random-access, but rather provide O(log n) access, which can
be acceptable (or not) for some applications.

* Usage scenario 3: associative container with random access
iterators. This roughly compares with a multi_index_container
equipped with two indices, one ordered, the other random-access.
Basic differences between both:
  - The multi_index_container provides truly O(1) random access
  whereas Container_bptree is O(logN) (which is not random access
  strictly speaking.)
  - With the multi_index_container, sorting order and insertion
  order are separate --some sort of synchronization between both
  indices should be needed to mimic the Continer_bptree case.
  - Insertion/erasure with the multi_index_container is linear
  (though much faster than with std::vector), logarithmic with
  Container_bptree.
  - The memory overhead introduced by the multi_index_container
  is two pointers vs. only one word for Container_bptree.

* Usage scenario 4: containers with fast accumulate, a direct
application of the augmenting idea explained above. There is
nothing in Boost that provides this functionality.

My conclusions:

1. Augmented data structures (of which efficient indexing
is just a particular application) are definitely useful and
worth having in Boost.
2. Unless proved otherwise, we should be using RB trees
(or maybe AVL) rather than B+ trees.
3. Depending on the particular needs, some usage scenarios
can be better served with Boost.MultiIndex (note I'm
saying "depending on needs", since differences between
approaches are substantial.)
4 To me, rather than providing out of the box containers with
augmenting we should provide this via two libraries:
  - As an additional capability of Boost.Intrusive associative
  containers (and AVL-tree based containers as well, if
  technically possible.)
  - As an additional capability of Boost.MultiIndex ordered
  indices.

Hope this helps,

Joaquín M López Muñoz
Telefónica Digital


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk