Boost logo

Boost :

From: Hubert Holin (Hubert.Holin_at_[hidden])
Date: 2003-02-14 08:24:54


Somewhere in the E.U., le 14/02/2003

   Bonjour

In article <AHEJIHEOOOBMJPAGPLIPAEOHDKAA.boost_at_[hidden]>,
 "Paul A. Bristow" <boost_at_[hidden]> wrote:

> Stats are definitely a must-have for Boost, but as ever, the presentation is
> not
> so easy to agree upon.

      I agree statistical utilities are a must. As many of us likely do,
I have a few things I can contribute, which I needed for some past work
(to work with (multi-dimentional) sequences of values, and with
densities of distributions).

      There still is the question of whether similarity with NR is a
problem or not (the language in which the techniques are implemented is
different, but implementations of the techniques themselves are of
course basically similar since they refer to the same math construction).

      I am hoping that with uBlas, we can contribute more numerical
stuff. I have some Gaussian Mixture Models code that I should be
rewriting in the not too distant future (currently based on an old
version of TNT, and most of the important pre-processing needed has to
be done elsewhere, for the then lack of svd).

> But it is also crucial to get the most accurate answer, and be able to prove
> it.
> For example, B D McCullough, American Statistician Nov 1998 52(4), 358 and
> 1999
> 53(2) 149-159 assessed several stats packages, and some came out rather badly
> -
> you can guess which was worst, by far!
>
> NIST provide some test datasets
>
> http://www.itl.nist.gov/div898/strd/
>
> against which code can be judged (and some naive algorithms fail badly).
>
> Although I can see the benefits of an STL-style, I also have some difficulty
> in
> imagining how the results returned can be other than reals? Even if we
> 'input'
> integer types, although sum can sensibly also be integer, I have some
> difficulty
> in seeing how the the mean, variance etc are useful as integer types?
> And to expose the unsuspecting user to the risk of surprise seems unhelpful?
>
> Benefits from STL-style would be most obvious if can be applied to a circular
> buffer into which new data can be fed while stats can be recalculated Kalman
> filter style.
>
> While calculating the mean and variance, it is probably worth calculating the
> higher two skew and kurtosis too.
>
> And of course the median (and some percentiles) are also often more useful
> than
> the mean.

      My old files provide number_of_samples , max, min,
first_max_index, first_min_index, mean, median, variance,
standard_deviation, average_deviation, skewness and kurtosis for
sequences (where appropriate), number_of_bins, mass, first_mode_value,
first_mode, mean, median, variance, standard_deviation,
average_deviation, skewness and kurtosis for deensities (where
appropriate).

> Finally, there is the unsolved matter of the math functions we still badly
> need.

      Err, I kind of forgot which ones where requested...

> Confidence intervals are more informative than standard deviations etc.
>
> Paul
>
> Dr Paul A Bristow, hetp Chromatography
> Prizet Farmhouse, Kendal, Cumbria, LA8 8AB UK
> +44 1539 561830 Mobile +44 7714 33 02 04
> mailto:pbristow_at_[hidden]
>
>
> > -----Original Message-----
> > From: boost-bounces_at_[hidden]
> > [mailto:boost-bounces_at_[hidden]]On Behalf Of Jeff Garland
> > Sent: Tuesday, February 11, 2003 4:19 PM
> > To: Boost mailing list
> > Subject: RE: [boost] Any interest in a stats class
> >
> >
> > Scott K wrote:
> >
> > > Hi all,
> > > I have a small family of statistics classes which I have used from time
> > > to time. The one I use most often is simply called stats.
> > > Here's an example of it's use:
> > > ...details snipped...
> >
> > I'm sure there are folks interested in statistical (and other)
> > functions. I've developed exactly this sort of class in the
> > past so I understand the utility. However, I suspect some of
> > us would hope statistical algorithms to be formulated as STL
> > Algorithm extensions. Specifically concerning statistics see:
> >
> > http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
> rithmExtensions/StatisticsAlgorithms
> >
> > and more generally:
> >
> > http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?STLAlgo
> rithmExtensions
> >
> > We definitely need volunteers to take these rough Wiki musings and
> > convert them into actual documented libraries. I'm not sure this
> > is what you had in mind, but I, for one, would welcome your effort
> > either way!
> >
> > Jeff

   A Bientot

HH


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk