Boost logo

Boost :

Subject: [boost] [Autosave] Re: [math][accumulators] Empirical distribution function
From: Simon West (simon.west_at_[hidden])
Date: 2011-08-08 13:06:42


Hi,

On Sun, June 19, 2011 22:36, er wrote:
> Hope this can serve as a basis for a conversation:
>
>
> https://svn.boost.org/svn/boost/sandbox/acc_ecdf/
>

I'm assisting Eric with the maintenance of Accumulators. I've had a look
through the code at the above link, and would like to offer the following
comments (if I have misunderstood anything, please let me know).

My basic concern with the code is that a map is used to store the counts of
data-points that have been added (the map keys are the data-points, the
map values are the counts). In real-world floating point data it is rare
for two data-points to be exactly the same, so in practice the map would
have a single key-value pair for each data-point q_i, of the form
(key=q_i,value=1). This is inefficient, because all the key values will
be 1. Also, the memory usage will grow linearly with the number of
data-points accumulated, which doesn't seem to be in keeping with the
spirit of the Accumulators library.

For these reasons, I'm not convinced that the code should be added to the
library in its current state.

Simon.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk