
Boost : 
Subject: [boost] Review managers wanted for augmented data structures
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 20130226 15:08:10
Hi all,
At present, neither Boost library nor STL provides containers using
augmented data structures. I suggest that the Boost community consider
solving the challenging problem – how to extend STL by integrating advanced
data structures.
Is there an alternative of not integrating advanced data structures into
STL?
IMO, no, since it is equivalent to “STL must go”. C++ cannot ignore
performance benefits provided by these data structures. On the other hand,
advanced data structures can be implemented in many programming languages
and this will make C++ less competitive.
Augmented redblack trees could have been included in the very first
version of STL.
Why is it difficult and has not been done yet?
The major obstacle for the integration is the C++ standard specification of
computational complexities of single operations for containers and
iterators.
Augmented trees are nonstandard due to different performance guarantees.
At the same these data structures are beneficial for complex algorithms,
since they offer a wider set of more efficient operations compared to
standard containers. The typical performance improvement for one operation
is O(log N) against O(N). For less efficient operations these trees
normally provide running time O(log N) against O(1), which is good enough
for many applications.
The C++ STL will be under pressure as long as there are new data structures
that offer better running time for user algorithms. Asymptotically, the
main parameter that determines the running time of an algorithm is the
total computational cost (to simplify the discussion we omit the effect of
locality of reference). By definition, the total computational cost is the
sum of costs of single operations multiplied by their frequencies. For a
general purpose library, such as STL, it is impossible at the design stage
to know frequencies of all container and iterator operations required in
all possible user algorithms.
A more reasonable and practical approach to STL would be to provide a wide
set of interchangeable containers and data structures with different
performance guarantees and allow programmers to make decisions about the
most suitable containers for specific applications. Relatively small
frequencies of inefficient operations do not have significant effect on the
total computational cost of an algorithm. Thus, in principle, containers
and iterators can provide even inefficient operation in order to support
more unified interfaces and to improve the generality. The replacement of
interchangeable containers offers the advantage of low development cost
solutions to performance bottlenecks that arise due to enhancement
requests and/or an increase in the number of elements processed by a user
algorithm.
Also, I’d like to note that the representation of sequences using augmented
trees is a frequent point of confusion and probably the fact that was
overlooked in the development of previous versions of STL. The general
description of the method of augmenting data structures is given in the
section 14.2 of this book
T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein,
Introduction to Algorithms, 2nd Edition, MIT Press and McGrawHill, 2001.
The specific application of this method is provided for order statistic
trees and interval trees. Unfortunately, there is no example of a sequence
that stores unordered elements. Probably for this reason augmented
redblack trees are often considered to be an improvement of search trees
only. The efficient representation of sequences by the described redblack
trees follows from the fact that augmenting by the counter of children
nodes supports efficient mapping from nonnegative integers into container
elements. It works both for ordered and unordered containers with unique
and multiple elements or keys.
There are two libraries in the Boost review schedule:
http://www.boost.org/community/review_schedule.html
"STL Extensions" and "Countertree". None of them has as yet a review
manager. The order of the reviews may not be significant. It is more
important that at least one library pass the review. This would help
integrating other advanced data structures.
Are there Boost experts who could volunteer to take responsibility of
review manager for any of these two libraries?
Regards,
Vadim Stadnik
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk