Boost logo

Boost :

Subject: Re: [boost] [histogram] Variance
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2018-09-17 20:33:05


AMDG

On 09/17/2018 02:08 PM, Bjorn Reese via Boost wrote:
> The variance of individual bins can be obtained when using the
> adaptive_storage (via h.at(i).variance().)
>
> I am trying to understand the overhead of this feature.
>
> If I interpret the code correctly, there is a space overhead because
> each counter has to keep track of both the count and the sum of squares.
> The computational overhead is that the sum of squares has to be
> calculated for each insertion. Is this correct?
>

It's only tracked if you use weights.

> If so, is there any way to use the adaptive storage policy without
> variance?
>
> Furthermore, why does variance() return the sum of squares? Should this
> not be divided by the sample size?
>

You're thinking of the formula
variance = \sum (x_i - mean)^2 / count = \sum x_i^2/count - mean^2
That formula doesn't apply in this case, since the variance
is the variance of the bin count, not the variance of the
weights. The estimate for the variance is described here:
http://hdembinski.github.io/histogram/doc/html/histogram/rationale.html#histogram.rationale.variance

In Christ,
Steven Watanabe


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk