I know this is well after the discussion on the stats class has ended, but I think I have a good idea here.

Scott Kirkwood proposed a class that behaves something like this:

stats myStats;
    for (int i = 0; i < 100; ++i) {
        myStats.add(i);
    }
    cout << "Average: " << myStats.getAverage() << "\n";
    cout << "Max: " << myStats.getMax() << "\n";
    cout << "Standard deviation: " << myStats.getStd() << "\n";

In one of my classes in grad school, I found it quite useful and effecient to do statistics on the fly like this, so this stats class interests me. Anyway, Scott has already alluded to the point I'm about to make. I think it's important and useful for this stats class to integrate with the STL well. This example code was inspired by the PointAverage example from "Effective STL" p. 161:

// this class reports statistics

template <typename value_type>

class stats

{

public:

stats(const size_t n, const value_type sum, const value_type sum_sqr):

m_n(n), m_sum(sum), m_sum_sqr(sum_sqr)

{}

value_type sum() const

{ return m_sum; }

value_type mean() const

{ return m_sum/m_n; }

value_type var() const

{ return m_sum_sqr - m_sum*m_sum/m_n; }

value_type delta() const // aka, standard dev

{ return sqrt(var() / (m_n-1)); }

private:

value_type m_n, m_sum, m_sum_sqr;

};

// this class accumulates results that can be used to

// compute meaningful statistics

template <typename value_type>

class stats_accum: public std::unary_function<const value_type, void>

{

public:

stats_accum(): n(0), sum(0), sum_sqr(0)

{}

// use this to operate on each value in a range

void operator()(argument_type x)

{

++n;

sum += x;

sum_sqr += x*x;

}

stats<value_type> result() const

{ return stats<value_type>(n, sum, sum_sqr); }

private:

size_t n;

value_type sum, sum_sqr;

};

int main(int argc, char *argv[])

{

typedef float value_type;

const size_t n(10);

float f[n] = {0, 2, 3, 4, 5, 6, 7, 8, 9, 8};

// accumulate stats over a range of iterators

my_stats = std::for_each(f, f+n,

stats_accum<value_type>()).result();

m = my_stats.mean();

m = my_stats.delta(); // aka, standard deviation

return 0;

}

This seems to be pretty similar to what Scott has proposed, and it turns out that this method is very fast. In my tests it has been nearly as fast as if we got rid of the classes and used a hand-written loop. It's certainly much faster than storing the data in a std::valarray object, and using functions that calculate the mean & standard deviation separately. This is just a neat application of Scott's idea.

I think this stats could be pretty useful for scientific computing, and in this example it works very well with the STL and has great performance. I'd like to see more code like this in Boost, but most of my work is numerical. Take my opinion or leave it.

Jason Schmidt