Boost logo

Boost :

Subject: Re: [boost] Review managers wanted for augmented data structures
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2013-03-03 09:02:01


On Sun, Mar 3, 2013 at 9:00 PM, Martin Knoblauch Revuelta <
mkrevuelta_at_[hidden]> wrote:

> On Sun, Mar 3, 2013 at 11:57 AM, Vadim Stadnik <vadimstdk_at_[hidden]>
> wrote:
>
...

> > Do you know any other reasons and/or concerns why augmented data
> structures
> > should NOT be added to STL?
>
> The main concern is the balance between the value they add and their cost.
>
> On one hand, which are the real-life applications where these
> containers would really help? In my opinion, the programming community
> has learned to solve zillions of problems with arrays, lists, search
> trees, heaps and hash tables. Now it's not easy to find a real-life
> problem where augmented trees provide a significantly faster solution.
> Though, they may allow amazingly simple, non-tuning, easier-to-code,
> solutions.
>
> On the other hand, in addition to the usual bloat, there is a deep
> concern regarding the possible incorrect use of these sequence
> containers. Lazy unexperienced programmers might misuse them as an
> all-purpose replacement of vector and list.
>
> If the standard library adopted them, it would be somehow advocating
> their use. If there were no real-life applications, then it would be
> advocating inefficiency.
>
>
I agree with your concerns.
In practice, on Windows systems an algorithm using N<1000 elements of small
size normally does not have to use advanced data structures at all. It
performs quite well with the default STL container std::vector. The
performance problems arise when the number of elements increases and the
algorithms uses inefficient linear cost operations of std::vector.

To some extent the risk of misuse is reduced by the array based augmented
B+ trees. As I mentioned before they have good locality of reference and
guarantee at least quite efficient sequential processing.

The representation of strings and string algorithms is one of the most
practically important application areas for augmented trees. The B+ trees
proposed for the Boost review support not only O(log N) insertion and
erasure for a single element, but also O(log N) time for splice and split
operations. This advantage is not achievable with basic data structures and
STL containers.

> > [..]
> > The discussion of augmented trees by Boost community is particularly
> > interesting.
> > Do you know if there was a formal Boost review of any variant of
> augmented
> > data structures and/or containers?
>
> There was an interesting discussion when I proposed it:
>
>
> http://boost.2283326.n4.nabble.com/proposal-Sequence-container-with-O-log-n-random-access-and-O-log-n-insertion-removal-wherever-td2615747.html
>
>
> http://boost.2283326.n4.nabble.com/proposal-Rank-List-old-name-AVL-Array-in-the-Sandbox-Vault-td2617382.html#a2617384
>
> > [..] I proposed for the Boost
> > review array based augmented B+ trees and decided to postpone the review
> of
> > dynamically allocated augmented B+ trees.
>
> If you only plan to augment the tree with an integer, then this is
> perfectly reasonable.
>
>
Thank you for the links, I will analyze the discussions.

I have already implemented 5 variants of basic and augmented B+ trees and
STL containers using these trees. Three variants of dynamically allocated
B+ trees:

https://github.com/vstadnik/stl_ext_adv

and two variants of array based B+ trees:

https://github.com/vstadnik/stl_ext_adv_review

In combination with augmented red-black trees they should help find the
solution to the problem how to use augmented and advanced data structures
inside or alongside STL.

> Though, if you plan to augment the trees with a data type that doesn't
> support subtraction ---like the interval trees of "Introduction to
> Algorithms", or something more complex--- it might be better to stick
> to a few children per node. Otherwise, deleting an element might
> require a lot of [potentially expensive] additions of a user-defined
> type.
>

The development of augmented interval B+ trees is not impossible, but at
this stage my main priority is data structures that support STL interfaces.

Regards,
Vadim Stadnik


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk