Subject: Re: [boost] Countertree
From: Leo Goodstadt (leo.goodstadt_at_[hidden])
Date: 2012-11-19 06:02:08
On 17 November 2012 19:44, Francisco Jos? Tapia <fjtapia_at_[hidden]> wrote:
> 2012/11/16 Vadim Stadnik <vadimstdk_at_[hidden]>
> > On Sat, Nov 17, 2012 at 4:25 AM, Leo Goodstadt <leo.goodstadt_at_[hidden] wrote:
> > > > Project location ( zip file with code and documentation) :
> > > > https://dl.dropbox.com/u/8437476/works/countertree_code_doc.zip
> > > >
> > > > Quick view of documentation with code download :
> > > > https://dl.dropbox.com/u/8437476/works/countertree/index.html
> > > >
> > > > This library is an implementation of a binary red-black counter tree.
> > > > This tree have an additional counter in each leaf. This permit the access to
> > > > the elements by the position, like in a vector. It is a random access
> > > > container with random access iterators
> > >
> > >
> > > Among other things, countertree would allow the boost Statistical
> > > Accumulators Library to support rolling median and rolling quantiles
> > > efficiently. This would be very useful.
> > >
> > > I had previously had to hack the tree class in the Policy-Based Data
> > > Structures part of the gnu c++ library.
> > > (see http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_order_statistics_node_update.html )
> > As far as I know, this support is not yet available in countertree project,
> > although this is possible in principle.
> About the compatibility with the boost Statistical Accumulators Library, I
> didn't thought about it, but it would be useful. I will try to implement,
> but I need to examine in deep for to provide you a well founded opinion.
1) For counter trees, the median is just the N/2-th value in the
counter tree, where K is the size of the tree.
The Nth-percentile can be calculated similarly.
A rolling median with a fixed size window requires there to be an
insertion and deletion for each (most!) new value, so I am guessing
it should run in O(N log K) time, where N = size of data, K = size of window
2) The alternative is to use two multisets corresponding to values
smaller and larger than the current median (or the Nth percentile).
This should also have O(N log K) complexity.
3) With the special case where the data consists of integers in a
fixed range, we can use a segment tree where the complexity would be
O(N log R) where R = the size of the fixed range of data values. This
can result in considerable speed ups.
The best bit about counter trees are their flexibility, we can
calculate multiple Nth-percentiles at the same time (e.g. all the
quartiles). As to the actual rather than theoretical speed, I think
we need bench-marking....
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk