|
Boost : |
From: Neal D. Becker (nbecker_at_[hidden])
Date: 2003-02-25 11:10:48
Please remember that stats can be more general. I frequently use stats for
complex types. In that case, mean is also complex, but var is scalar. The
proposed implementation doesn't address this.
On Tuesday 25 February 2003 12:29 am, Jason D Schmidt wrote:
> I know this is well after the discussion on the stats class has ended,
> but I think I have a good idea here.
>
> Scott Kirkwood proposed a class that behaves something like this:
>
> stats myStats;
> for (int i = 0; i < 100; ++i) {
> myStats.add(i);
> }
> cout << "Average: " << myStats.getAverage() << "\n";
> cout << "Max: " << myStats.getMax() << "\n";
> cout << "Standard deviation: " << myStats.getStd() << "\n";
>
> In one of my classes in grad school, I found it quite useful and
> effecient to do statistics on the fly like this, so this stats class
> interests me. Anyway, Scott has already alluded to the point I'm about
> to make. I think it's important and useful for this stats class to
> integrate with the STL well. This example code was inspired by the
> PointAverage example from "Effective STL" p. 161:
>
> // this class reports statistics
> template <typename value_type>
> class stats
> {
> public:
> stats(const size_t n, const value_type sum, const value_type
> sum_sqr):
> m_n(n), m_sum(sum), m_sum_sqr(sum_sqr)
> {}
> value_type sum() const
> { return m_sum; }
> value_type mean() const
> { return m_sum/m_n; }
> value_type var() const
> { return m_sum_sqr - m_sum*m_sum/m_n; }
> value_type delta() const // aka, standard dev
> { return sqrt(var() / (m_n-1)); }
> private:
> value_type m_n, m_sum, m_sum_sqr;
> };
>
> // this class accumulates results that can be used to
> // compute meaningful statistics
> template <typename value_type>
> class stats_accum: public std::unary_function<const value_type, void>
> {
> public:
> stats_accum(): n(0), sum(0), sum_sqr(0)
> {}
> // use this to operate on each value in a range
> void operator()(argument_type x)
> {
> ++n;
> sum += x;
> sum_sqr += x*x;
> }
> stats<value_type> result() const
> { return stats<value_type>(n, sum, sum_sqr); }
> private:
> size_t n;
> value_type sum, sum_sqr;
> };
>
> int main(int argc, char *argv[])
> {
> typedef float value_type;
> const size_t n(10);
>
> float f[n] = {0, 2, 3, 4, 5, 6, 7, 8, 9, 8};
>
> // accumulate stats over a range of iterators
> my_stats = std::for_each(f, f+n,
> stats_accum<value_type>()).result();
>
> m = my_stats.mean();
> m = my_stats.delta(); // aka, standard deviation
>
> return 0;
> }
>
> This seems to be pretty similar to what Scott has proposed, and it turns
> out that this method is very fast. In my tests it has been nearly as
> fast as if we got rid of the classes and used a hand-written loop. It's
> certainly much faster than storing the data in a std::valarray object,
> and using functions that calculate the mean & standard deviation
> separately. This is just a neat application of Scott's idea.
>
> I think this stats could be pretty useful for scientific computing, and
> in this example it works very well with the STL and has great
> performance. I'd like to see more code like this in Boost, but most of
> my work is numerical. Take my opinion or leave it.
>
> Jason Schmidt
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk