Boost logo

Boost :

Subject: Re: [boost] [Math/Statistical Distributions] Rethinking of distributiontemplate parameters.
From: John Maddock (john_at_[hidden])
Date: 2009-05-21 05:47:07


> This is a feature request for the next version of Math/Statisical
> Distributions lib.
>
> Currently, due to lack of input type information, discrete
> distributions can only be "emulated" by using the discrete_quantile
> policy.
> However, doing so the effective quantile type is still a real type.
>
> In my opinion, this have at least two disadvantages:

I believe your disadvantages are more imagined than real.

> 1. Operations are slow since the underlying quantile type is still
> real. Instead, operations on really integral types are generally
> faster.

Unfortunately there is no way the quantile of discrete distributions can be
calculated internally using all integer arithmetic (at least I can't think
of a case other than maybe the trivial bernoulli distribution). Normally
the result of the quantile is calculated as a real-number and then
appropriately rounded acording to the policy in effect, in a few cases the
result is calculated directly as an integer by summing CDF values
(hypergeomentric for example), but the internal calculations still have to
done using reals.

There's also no overhead from returning a real type (since it's usually
returned in a register just like an integer type would be), there might be a
tiny overhead if the user then casts to an integer, but if we internalised
that cast by returning an integer type then everyone would pay that cost no
matter what the use case :-(

BTW there are a few genuine use cases for returning a real-valued result
from the quantile of a descrete distribution.

> 2. Quantile comparison might be inaccurate since we are comparing real
> types

Nope, not if you've requested an integer result (which is the default
policy), as integers are represented exactly in floating point types: unless
the integer is so large as exceed the number of mantissa bits - but then the
result would likely overflow an integer type anyway. In fact this is an
important use case - the ability to return values larger than INT_MAX etc as
a real valued type.

There is one genuine concern here, but it can't be solved by your interface:
that is if the result of the quantile function is calculated to be very very
close to an integer value, but due to the usual rounding errors in
calculation we can't be sure which side of the integer the true value lies.
Unfortunately there is simply no way around this - we have to use
real-valued types in the internal calculation, and all the stats packages
I'm aware of have the same potential issue.

Cheers, John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk