Boost logo

Boost :

From: Jason D Schmidt (jd.schmidt_at_[hidden])
Date: 2003-02-25 00:29:36


I know this is well after the discussion on the stats class has ended,
but I think I have a good idea here.

Scott Kirkwood proposed a class that behaves something like this:

  stats myStats;
    for (int i = 0; i < 100; ++i) {
        myStats.add(i);
    }
    cout << "Average: " << myStats.getAverage() << "\n";
    cout << "Max: " << myStats.getMax() << "\n";
    cout << "Standard deviation: " << myStats.getStd() << "\n";

In one of my classes in grad school, I found it quite useful and
effecient to do statistics on the fly like this, so this stats class
interests me. Anyway, Scott has already alluded to the point I'm about
to make. I think it's important and useful for this stats class to
integrate with the STL well. This example code was inspired by the
PointAverage example from "Effective STL" p. 161:

// this class reports statistics
template <typename value_type>
class stats
{
public:
    stats(const size_t n, const value_type sum, const value_type
sum_sqr):
    m_n(n), m_sum(sum), m_sum_sqr(sum_sqr)
    {}
    value_type sum() const
    { return m_sum; }
    value_type mean() const
    { return m_sum/m_n; }
    value_type var() const
    { return m_sum_sqr - m_sum*m_sum/m_n; }
    value_type delta() const // aka, standard dev
    { return sqrt(var() / (m_n-1)); }
private:
    value_type m_n, m_sum, m_sum_sqr;
};

// this class accumulates results that can be used to
// compute meaningful statistics
template <typename value_type>
class stats_accum: public std::unary_function<const value_type, void>
{
public:
    stats_accum(): n(0), sum(0), sum_sqr(0)
    {}
     // use this to operate on each value in a range
    void operator()(argument_type x)
    {
        ++n;
        sum += x;
        sum_sqr += x*x;
    }
    stats<value_type> result() const
    { return stats<value_type>(n, sum, sum_sqr); }
private:
    size_t n;
    value_type sum, sum_sqr;
};

int main(int argc, char *argv[])
{
    typedef float value_type;
    const size_t n(10);

    float f[n] = {0, 2, 3, 4, 5, 6, 7, 8, 9, 8};

   // accumulate stats over a range of iterators
    my_stats = std::for_each(f, f+n,
        stats_accum<value_type>()).result();

    m = my_stats.mean();
    m = my_stats.delta(); // aka, standard deviation

    return 0;
}

This seems to be pretty similar to what Scott has proposed, and it turns
out that this method is very fast. In my tests it has been nearly as
fast as if we got rid of the classes and used a hand-written loop. It's
certainly much faster than storing the data in a std::valarray object,
and using functions that calculate the mean & standard deviation
separately. This is just a neat application of Scott's idea.

I think this stats could be pretty useful for scientific computing, and
in this example it works very well with the STL and has great
performance. I'd like to see more code like this in Boost, but most of
my work is numerical. Take my opinion or leave it.

Jason Schmidt

________________________________________________________________
Sign Up for Juno Platinum Internet Access Today
Only $9.95 per month!
Visit www.juno.com



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk