I know this is well after the discussion on the stats class has ended, but
I think I have a good idea here.
Scott Kirkwood proposed a class that behaves something like this:
stats myStats;
for (int i = 0; i < 100;
++i) {
myStats.add(i);
}
cout <<
"Average: " << myStats.getAverage() << "\n";
cout << "Max: " << myStats.getMax() <<
"\n";
cout << "Standard deviation: " <<
myStats.getStd() << "\n";
In one of my classes in grad school, I found it quite useful and effecient
to do statistics on the fly like this, so this stats class interests me.
Anyway, Scott has already alluded to the point I'm about to make. I think
it's important and useful for this stats class to integrate with the
STL well. This example code was inspired by the PointAverage example
from "Effective STL" p. 161:
// this class reports statistics
template <typename value_type>
class stats
{
public:
stats(const size_t n, const value_type sum, const
value_type sum_sqr):
m_n(n), m_sum(sum), m_sum_sqr(sum_sqr)
{}
value_type sum() const
{ return m_sum; }
value_type mean() const
{ return m_sum/m_n; }
value_type var() const
{ return m_sum_sqr - m_sum*m_sum/m_n; }
value_type delta() const // aka, standard
dev
{ return sqrt(var() / (m_n-1)); }
private:
value_type m_n, m_sum, m_sum_sqr;
};
// this class accumulates results that can be used to
// compute meaningful statistics
template <typename value_type>
class stats_accum: public std::unary_function<const value_type,
void>
{
public:
stats_accum(): n(0), sum(0), sum_sqr(0)
{}
// use this to operate on each value in a
range
void operator()(argument_type x)
{
++n;
sum += x;
sum_sqr += x*x;
}
stats<value_type> result() const
{ return stats<value_type>(n, sum, sum_sqr);
}
private:
size_t n;
value_type sum, sum_sqr;
};
int main(int argc, char *argv[])
{
typedef float value_type;
const size_t n(10);
float f[n] = {0, 2, 3, 4, 5, 6, 7, 8, 9, 8};
// accumulate stats over a range of iterators
my_stats = std::for_each(f, f+n,
stats_accum<value_type>()).result();
m = my_stats.mean();
m = my_stats.delta(); // aka, standard
deviation
return 0;
}
This seems to be pretty similar to what Scott has proposed, and it turns
out that this method is very fast. In my tests it has
been nearly as fast as if we got rid of the classes and used a hand-written
loop. It's certainly much faster than storing the data in a std::valarray
object, and using functions that calculate the mean & standard deviation
separately. This is just a neat application of Scott's idea.
I think this stats could be pretty useful for scientific computing, and in
this example it works very well with the STL and has great performance.
I'd like to see more code like this in Boost, but most of my work is
numerical. Take my opinion or leave it.
Jason Schmidt