Boost logo

Boost :

Subject: Re: [boost] Interest in algorithm for arithmetic mean of integers
From: Matt Calabrese (rivorus_at_[hidden])
Date: 2015-09-04 14:20:26


On Fri, Sep 4, 2015 at 7:43 AM, Rajaditya Mukherjee <
rajaditya.mukherjee_at_[hidden]> wrote:

> ​Hi Matt,
>
>
> I am imagining such a library would be greatly useful for numerical
> scientists and statisticians for precise computing. Can you give us a toy
> example of how it works ?

I can't post the code without releasing it but I can briefly describe how
it works.

The general input-range version works by accumulating a mixed number that
represents the mean at any given iteration. The mixed number has a whole
number part that is of the value_type of the range, and the fractional part
is represented as a numerator that is of the iterator's difference_type and
a denominator that is of the iterator's difference type. The initial
"value" of this state is {0, 0, 0}, which is indeterminate. At any given
iteration, you are guaranteed that the fractional part's denominator is
exactly equal to the number of elements that have been iterated over, the
fractional part's numerator is in the range [0,denominator), and the whole
number is the whole number part of the mean. In other words, the fractional
part is never reduced. By having a state represented in this manner you are
able to communicate back to the user both the exact average as well as the
number of elements in the range (since the number of elements is equal to
the denominator's value). This guarantee is also useful as we advance the
state. The reason the result is 3 values representing a mixed number and
not 2 values simply representing a rational number is because a rational
number consisting of 2 values of either the difference type or the value
type cannot be guaranteed to be able to represent the exact mean without
possibility of overflow. With 3 values of appropriate type, you can
guarantee that the mean can be represented since the whole number part will
always be bounded by the smallest and largest value in the input range (and
remember the whole part's type is the value_type of the range). The
components of the fractional part use the difference_type of the range
since I always represent the fractional part as N/D where D is the number
of elements seen so far and N is in the range [0, D). In this manner, the
exact mean is guaranteed to able to be represented at any given iteration,
and the accumulation function can advance the state in a manner that will
never overflow.

You use it like any STL algorithm:

//////////
auto result = mean(std::begin(range), std::end(range));
//////////

At the moment I do not have an implementation that works piecemeal as is
akin to Boost.Accumulators, however since the implementation of the
algorithm really is simply a call to std::accumulate, it should be very
simple to expose such a form, and I imagine that it would be especially
useful. Some further details on how the state is manipulated with each
iteration:

At any given point, we have the whole number part, the numerator of the
fractional part, and the denominator of the fractional part (remember that
the denominator is guaranteed to always be the size of the range up to this
point). When advancing to the next state, I do an operation that has the
same effect as taking a single, rational version of the mixed number N/D,
changing this value to N/(D+1), and then adding in V/(D+1), where V is the
next value in the range (remember that here D is the current number of
elements that we've iterated over). This isn't directly what I do since I'm
dealing with a mixed number instead of a rational number, but it's the same
overall effect and is done in a way that cannot overflow.

The only thing that is really hairy at this point is that there is
mixed-mode arithmetic involved (the value_type and the difference_type need
to be operands to arithmetic functions). At the moment I handle the
mixed-mode arithmetic by explicitly casting, but ultimately it should
probably be a customization point of the algorithm, especially if this is
to be useful with user-defined, integer-like value types.

-- 
-Matt Calabrese

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk