Boost logo

Boost Users :

From: Thorsten Ottosen (tottosen_at_[hidden])
Date: 2006-01-10 12:22:01


John Maddock wrote:
>>I have some simple stuff in the sandbox under the "stat" directory.
>>You can view its usage here:
>
>
> Otto: first off we really need something like this, in Boost and in the std
> as well IMO. It really galls me that my desktop calculator has more math
> functions than <cmath> has, and wait for it: my calculator is 25 years old
> !!!!!!!!

I don't disagree with that :-) I dont' plan to work on this myself in
the next year; I have enough to do. We should have an official list of
volunteers that where looking for ways to contribute.

> However, I'm not sure that either of us has the right interface yet:

I'm not satisfied with my own use of tuple as return-values since it so
easy to forget which tuple element is what. I would prefer named tuples
or simply

template< class T >
struct least_square_result
{
   T slope, intersection, correlation;
};

etc.

> I'm
> particularly concerned that you're making these algorithms, it means that if
> you want to access more than one statistic you have to make multiple passes
> over the data.

Right. It's a trade-off. OTOH, if you only call one algorithm, you don't
want to pay for accumulation that is not used. So maybe we need
algorithms that takes iterators and algorithms that take some kind of
accumulator object.

> The advantage of the "make it an object" approach is that
> pretty much all stats you could want are accessible after a single pass over
> the data. More than that you can:
>
> * Pause at any time and read off the stats, and then continue adding more
> data if you want.

Right, this could be very useful.

> * An extension of the above would be to make the stats object serialisable.
> * Two or more objects can be "added" together to obtain the stats for the
> combined data without reaccessing the original data: imagine a weather
> station gathering temperature data over time: hourly stats can be combined
> into daily or weekly stats without going back to the original data - which
> may be either discarded (unlikely) or stored in offline storage.
>
> Unfortunately: this method is prone to numerical overflow/underflow :-(

How can this be different than just accumulating it all from scratch?
(Or is it the accumulator method in general that is error-prone?)

-Thorsten


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net