Boost logo

Boost :

Subject: Re: [boost] [histogram] Variance
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-09-18 07:41:25


Dear Bjørn,

> On 17. Sep 2018, at 22:08, Bjorn Reese via Boost <boost_at_[hidden]> wrote:
>
> The variance of individual bins can be obtained when using the
> adaptive_storage (via h.at(i).variance().)
>
> I am trying to understand the overhead of this feature.
>
> If I interpret the code correctly, there is a space overhead because each counter has to keep track of both the count and the sum of squares.
> The computational overhead is that the sum of squares has to be
> calculated for each insertion. Is this correct?
>
> If so, is there any way to use the adaptive storage policy without
> variance?

there is a minor overhead in the return value. Whenever you query the adaptive_storage, two doubles - one for the value and one for the variance -, which is slightly wasteful if you don't care about the variance, then you would need only one double. I don't know how smart compilers are in this case, the compiler may even remove the code that fills the second double when it is not used. In memory, the adaptive_storage uses only a single integer for each counter if you don't use weighted fills.

Returning two doubles even if one is sufficient is a minor overhead, but if this is bothering people I could add a compile-time option for the adaptive_storage class to turn all weight-handling off.

> Furthermore, why does variance() return the sum of squares? Should this
> not be divided by the sample size?

This was already answered by Steven (thanks!).

Kind regards,
Hans


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk