Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2007-01-31 16:54:14


On Jan 31, 2007, at 4:38 PM, Hans Meine wrote:

> Hi!
>
> On Wednesday, 31. January 2007 16:19, Ben FrantzDale wrote:
>>> Isn't this already a problem with sequences? Suppose you're
>>> storing the
>>> current median, and a new value comes along. What's the new median -
>>> without checking all the previous values? There are statistical
>>> estimates
>>> for this, but I don't know any exact way other than essentially
>>> resorting
>>> the data and checking the new median.
>>
>> Good point. Combining partial results may not be possible for all
>> accumulated statistics or there may be a cost to making a statistic
>> "combinable". Perhaps a refinement of the Accumulator concept
>> would be
>> needed.
>>
>> The idea of the Accumulator library is to implement online
>> algorithms; is
>> there a corresponding word for online algorithms for which partial
>> results
>> can be combined?
>
> Actually, I don't see any "online algorithms" whose partial results
> cannot be
> combined.
> ...
>
> Is it really such a "substantial extension"? I fully agree with
> John here:
>
> On Wednesday, 31. January 2007 10:23, John Maddock wrote:
>> But can't we make this a conceptual requirement that if
>> you try and combine two accumulator sets, then each accumulator
>> must have a
>> "combine" method (otherwise you get a compile time error). The
>> "combine"
>> methods for min/max/sum/count etc are all trivial aren't they -
>> and could
>> be added fairly easily ?
>
> Which accumulators would then have a non-trivial "combine" method?

There are two types of accumulators that I can think of immediately:

1. approximate median and quantile estimators such as the p^2 methods
cannot be combined but are very important for some applications

2. accumulators for correlates samples, e.g. autocorrelation
estimators, or jackknife-bins cannot easily be combined. Instead of
just merging the results one has to store the results from the
individual accumulator_sets separately and combine them only in the
final evaluations. Please keep in mind that correlated samples are
very common.

For 1. there is simply no way, and that is just it - one might either
have to make this cause a compile-time error or drop the median and
quantile estimators when combining. For 2. the problem is that
combined accumulator sets need a different data structure than
individual ones.

Matthias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk