Boost logo

Boost :

From: Paul A. Bristow (boost_at_[hidden])
Date: 2003-02-13 14:19:07


Stats are definitely a must-have for Boost, but as ever, the presentation is not
so easy to agree upon.

But it is also crucial to get the most accurate answer, and be able to prove it.
For example, B D McCullough, American Statistician Nov 1998 52(4), 358 and 1999
53(2) 149-159 assessed several stats packages, and some came out rather badly -
you can guess which was worst, by far!

NIST provide some test datasets

http://www.itl.nist.gov/div898/strd/

against which code can be judged (and some naive algorithms fail badly).

Although I can see the benefits of an STL-style, I also have some difficulty in
imagining how the results returned can be other than reals? Even if we 'input'
integer types, although sum can sensibly also be integer, I have some difficulty
in seeing how the the mean, variance etc are useful as integer types?
And to expose the unsuspecting user to the risk of surprise seems unhelpful?

Benefits from STL-style would be most obvious if can be applied to a circular
buffer into which new data can be fed while stats can be recalculated Kalman
filter style.

While calculating the mean and variance, it is probably worth calculating the
higher two skew and kurtosis too.

And of course the median (and some percentiles) are also often more useful than
the mean.

Finally, there is the unsolved matter of the math functions we still badly need.
Confidence intervals are more informative than standard deviations etc.

Paul

Dr Paul A Bristow, hetp Chromatography
Prizet Farmhouse, Kendal, Cumbria, LA8 8AB UK
+44 1539 561830 Mobile +44 7714 33 02 04
mailto:pbristow_at_[hidden]

> -----Original Message-----
> From: boost-bounces_at_[hidden]
> [mailto:boost-bounces_at_[hidden]]On Behalf Of Jeff Garland
> Sent: Tuesday, February 11, 2003 4:19 PM
> To: Boost mailing list
> Subject: RE: [boost] Any interest in a stats class
>
>
> Scott K wrote:
>
> > Hi all,
> > I have a small family of statistics classes which I have used from time
> > to time. The one I use most often is simply called stats.
> > Here's an example of it's use:
> > ...details snipped...
>
> I'm sure there are folks interested in statistical (and other)
> functions. I've developed exactly this sort of class in the
> past so I understand the utility. However, I suspect some of
> us would hope statistical algorithms to be formulated as STL
> Algorithm extensions. Specifically concerning statistics see:
>
> http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
rithmExtensions/StatisticsAlgorithms
>
> and more generally:
>
> http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
rithmExtensions
>
> We definitely need volunteers to take these rough Wiki musings and
> convert them into actual documented libraries. I'm not sure this
> is what you had in mind, but I, for one, would welcome your effort
> either way!
>
> Jeff
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk