Boost logo

Boost :

Subject: Re: [boost] [math] & [random] distribution object compatibility
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2014-04-16 16:03:02


On 16 Apr 2014, at 21:33, Steven Watanabe <watanabesj_at_[hidden]> wrote:

> AMDG
>
> On 04/14/2014 11:46 AM, Paul A. Bristow wrote:
>>
>> Indeed - but the requirements are quite different.
>>
>> Boost.Math aims to be accurate (and with extension to use Boost.Multiprecision
>> and <cstdfloat> , very, very accurate).
>>
>> Boost.Random must be very fast, but need not be accurate - indeed it may be
>> rather inaccurate?
>>
>
> I've considered ways to make Boost.Random more
> accurate for years. I think it's generally
> possible to achieve near perfect accuracy
> without a huge performance cost by iteratively
> tightening the squeeze steps. Since the expected
> number of iterations is only slightly greater
> than one, this won't materially affect performance.
> The primary increase in cost would come from generating
> the initial guess, as there's no way to get k bits
> of accuracy without getting at least k bits from
> the underlying PRNG.
>
> The biggest problem is that it's nearly impossible to
> test as it requires a ridiculous number of samples to
> detect the inaccuracy. If you have k bits of accuracy,
> you would need at least \Omega(2^k) and possibly as
> many as \Omega(2^{2k}) samples before the bias becomes
> apparent.
>
Exactly. The bottleneck in random simulation is sample noise. If you throw a coin 3 times then there is no good way to test for it’s bias, you won’t be able to distinguish between a good 50-50 coin and a bad 40-60 coin. The outcome will mainly be driven by change outcomes, and the bias will be 2nd order.

Even the simple uniform [0,1) distribution has lots of weird finite precision resolution issues. Samples close to 1 will have a minimal distance between them that’s much larger (around 2^-53) than samples close to 0 (std::numeric_limits<double>::min() which is 10^-300). So if you compute some Monte Carlo measure that depends on the distribution of distances between samples then you’ll have lots of trouble purely coming from float representations.

A friend and I have been thinking about quality measures that give you grips about biases causes by
* finite resolution in the random engine integers
* finite resolution in the distribution function and the float representation of the variate
* uncertainty due to finite number of samples

Another important issue in finance is that we use low discrepancy sequences to speed up convergence of Monte Calro simulation, but those techniques don’t work well with rejection sampling methods. Rejection sampling also gives synchronisation issues on GPU’s, e.g. when a single rejection in some core causes all other core to have to wait.

 
>> So I doubt if changing either is a good idea.
>>
>> (And anyway they are by different authors developed at different times - so NIH
>> probably applies).
>>
>
> In Christ,
> Steven Watanabe
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk