# Boost :

From: Topher Cooper (topher_at_[hidden])
Date: 2007-01-31 10:40:23

At 03:58 PM 1/30/2007, you wrote:
>For now it seems that you use the default variance implementation
>which should be the naive estimator from the sum and sum of squares.

Why would you supply a poor implementation as the default when the
alternative (West's algorithm) is efficient, easily implemented and
is much more precise? The only reasons I can think for including the
naive algorithm at all is for those rare cases where a small increase
in performance is more important than a potentially very large loss
in precision or if you are using exact arithmetic (e.g., if instead
of floating point you are using rational numbers or if all your
values are integers).

The "pathological cases" where the naive algorithm does poorly are
when the sum of squares (and the square of the sum) is large relative
to the variance, this is very frequently the case. It can occur when
the variance is small relative to the mean or when there are more
than a few terms involved. How small relative to the mean or how
many terms depends on how much precision you really care about in
your variance. Because we are dealing with squares the error mounts
pretty quickly.

For the record:

West's algorithm:

t1 = (x[k] - M[k-1])
t2 = t1/k;
M[k] = M[k-1] + t2;
T[k] = T[k-1] + (k-1)*t1*t2
Mean{X[1]...X[n]} = M[n]
Var{X[1]...X[n]}=T[n]/m

Where "m" = (n-1) for the unbiased estimator of the population variance
or = n for the minimal-variance estimator of the
population variance

Topher Cooper