Boost logo

Boost :

Subject: Re: [boost] new proposal: order statistics tree
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2013-01-13 16:31:54

On Sun, Jan 13, 2013 at 7:10 PM, Szymon Wojciechowski <sw10_at_[hidden]>wrote:

> > > I implemented augmented red-black tree which doesn't check the
> uniqueness
> > > of values. I have conducted a simple survey: I prepared 5 input files
> with
> > > unique 2,7*10e7 integers. The first data set has sorted values as
> other has
> > > randomized. I repeated 7 times on every input. A simple program reads
> from
> > > input integers and puts it to set (maybe I should use multiset) from
> > > VisualStudio or my order statistics tree. After that I measured time.
> Then
> > > program erases values from data structure in ordered way (from 0, 1,
> 2...).
> > > After that I also measured how much time it takes. I uploaded the
> results
> > > here: - it's a little bit strange.
> > > What kind of performance test would you think about?
> > >
> >
> > Did you implement or is it possible to implement for this augmented red
> > black tree split, join and splice operations with logarithmic
> computational
> > complexity?
> >
> > As for tests, there are a number of performance tests of basic container
> > operations in the following projects:
> >
> > 1. For augmented dynamically allocated B+ trees:
> >
> >
> > I have added this link, since this project has results of performance
> tests
> > for Boost.MultiIndex.
> >
> > 2. For augmented array based B+ trees:
> >
> >
> > This project is prepared for Boost review. It compares performance of new
> > containers with standard containers and Boost.Container.
> > The file "boost_performance.cpp" uses Boost.Chrono timer. If your
> > containers are STL compliant you should not have any problem with this
> test
> > code.
> >
> > Regards,
> > Vadim Stadnik
> I have performed such test for 6 data containers (including
> boost::multi_index described by Tony and Vadim): the comparison is uploaded
> here:
> I prepared 5 datasets (first one was sorted). I repeated 5 times whole
> test for each container and dataset. The test inserts and accumulates as in
> Vadim's tests. However I added the erasure performance.
The performance of your trees looks quite promising.
What are the sizes of the data sets used in the tests?

> I haven't thought about possiblility to implement such operations like
> splice, join, split. However, it seems to me to be impossible. When adding
> treeA + treeB, the elements of treeB should be considered independently
> where to place them in result. For example if thereA contains 1, 3, 5, 7,
> 9 and treeB contains 2, 4, 6, 8, 10 all treeB nodes will be independently
> located and inserted in O(n).
The support of these operations with logarithmic running time would be
quite beneficial for your variant of augmented trees.

The augmented trees can support not only STL associative containers, but
also STL sequences, such as std::vector and std::list. String algorithms
can also take advantage of these efficient operations. The split operation
is general. The join and splice operations are general for STL sequences.
For associative containers they can be used only in special cases. You can
find more details in documentation to augmented B+ trees.

The CGAL library

implements split and join (as function catenate()) operations for
Multiset. However, these trees are not augmented, this is why container
sizes must be re-counted in linear time, which creates a performance
problem for STL variants of containers. Note also that this is a problem of
all dynamically allocated data structures that can support bidirectional
iterators only, including std::list.

With augmented red-black trees the cost of these operations and
corresponding member functions of STL containers should be logarithmic.

Vadim Stadnik

Boost list run by bdawes at, gregod at, cpdaniel at, john at