Boost logo

Boost :

Subject: [boost] [stl_ext_adv] array based B+ tree
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2012-06-27 05:29:43


Hi all,

First of all, I would like to thank all Boosters for previous useful
discussion of dynamically allocated augmented B+ trees. The development of
the new array based variant of a B+ tree has been significantly influenced
by these comments and suggestions.

The problem of the C++ standard sequences based on linear data structures
is that every of them has at least one inefficiency, such as linear cost of
insert(), erase() functions of std::vector or slow bidrectional iterators
of std::list. These inefficiencies create performance bottlenecks and
significantly complicate the development of efficient general algorithms
based on STL sequences.

The new variant of an array based B+ tree offers a solution to these
problems of STL sequences by reducing the running time of inefficient
operations from linear to logarithmic in the worst case. The array based B+
tree implements a simple partition of an array into a list of sub-arrays
connected to a B-tree. Compared to dynamically allocated B+ trees the new
tree requires less space and improves locality of reference. In some aspect
the array based B+ tree represents an enhanced pool allocator. Performance
of the new tree has been optimized for container operations with large
numbers of elements. In a number of tests the improvement factor is about
10.

A significant advantage of the array based B+ tree over standard sequences
is the support of splice() functions with logarithmic running time in the
worst case. The high efficiency of these functions extends move semantics
for non-intrusive STL sequences, since they allow user algorithms to insert
and remove any range of consecutive elements. This high efficiency is not
available in standard containers of C++03 and C++11. The splice() function
of std::list has linear running time in the worst case. For a discussion of
this problem by Boost community, please have a look at this thread:

http://boost.2283326.n4.nabble.com/Containers-How-about-a-list-with-O-1-splice-td3875722.html

For associative containers the array based B+ tree offers random access
iterators and improved efficiency of many container operations against
dynamically allocated B+ trees. In the general case associative containers
cannot benefit from the splice() functions. Nevertheless, they can be used
in algorithms for efficient split operations, since these operations do not
change the order of elements in obtained containers.

I think that the array based variants of B+ trees are very important, since
they offer a number of performance benefits and can be used in many areas
from mobile applications to processing huge data sets.

The new array based B+ tree is described in this document:

http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/array_based_bp_trees.html

The project with source code and tests:

https://github.com/vstadnik/stl_ext_adv

The new file with implementation of the array based B+ tree:

https://github.com/vstadnik/stl_ext_adv/blob/master/container/bp_tree_array.hpp

I will appreciate any comments, suggestions and ideas concerning augmented
array based B+ trees and STL variants of containers using these data
structures.

Regards,
Vadim Stadnik


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk