Subject: Re: [boost] [histogram] Variance
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2018-09-17 20:33:05
On 09/17/2018 02:08 PM, Bjorn Reese via Boost wrote:
> The variance of individual bins can be obtained when using the
> adaptive_storage (via h.at(i).variance().)
> I am trying to understand the overhead of this feature.
> If I interpret the code correctly, there is a space overhead because
> each counter has to keep track of both the count and the sum of squares.
> The computational overhead is that the sum of squares has to be
> calculated for each insertion. Is this correct?
It's only tracked if you use weights.
> If so, is there any way to use the adaptive storage policy without
> Furthermore, why does variance() return the sum of squares? Should this
> not be divided by the sample size?
You're thinking of the formula
variance = \sum (x_i - mean)^2 / count = \sum x_i^2/count - mean^2
That formula doesn't apply in this case, since the variance
is the variance of the bin count, not the variance of the
weights. The estimate for the variance is described here:
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk