
Boost : 
Subject: Re: [boost] [random] Quantization effects in generating floating point values
From: Thijs van den Berg (thijs_at_[hidden])
Date: 20150305 09:18:17
On Thu, Mar 5, 2015 at 2:39 PM, John Maddock <jz.maddock_at_[hidden]>
wrote:
> First off, I notice there are no examples for generating floating point
> values in Boost.Random, so maybe what follows is based on a
> misunderstanding, or maybe not...
>
> Lets say I generate values in [0,1] like so:
>
> boost::random::mt19937 engine;
> boost::random::uniform_01<boost::random::mt19937, FPT> d(engine);
>
> FPT d = d(); //etc
>
> Where FPT is some floating point type.
>
> Now my concern is that we're taking a 32bit random integer and
> "stretching" it to a floating point type with rather more bits (53 for a
> double, maybe 113 for a long double, even more in the multiprecision
> world). So quantization effects will mean that there are many values which
> can never be generated.
>
> It's true that I could use independent_bits_engine to gang together
> multiple random values and then pass that to uniform_01, however that
> supposes we have an unsigned integer type available with enough bits.
> cpp_int from boost.multiprecision would do it, and this does work, but the
> conversions involved aren't particularly cheap. It occurs to me that an
> equivalent to independent_bit_engine but for floating point types could be
> much more efficient  especially in the binary floating point case.
>
> So I guess my questions are:
>
> Am I worrying unnecessarily? and
> What is best practice in this area anyway?
>
> Thanks, John.
>
I've worried about this in the past, but I've accepted that using a 64 bit
integer engine instead of a 32 is good enough. A 64 bit engine reasonably
saturates 64 bit float conversions, and having 2^64 probability resolution
is practically enough when computing statistics on large number of random
draws (1 trillion draws<< 2^64)
When using floating point random numbers there are a two main error sources:
* the finite resolution of the probability engine e.g. 32 bits in your
example. This determines the number of different random values you can
generate.
* but also the non linearity in the float representation. This determines
the number of individual values you can generate in a small interval. E.g.
there are many more float values close to zero then close to 1 when you
convert the mt19937 integers to floats the interval U01.
Since most statistical computations involve floating point computations so
you'll have type 2) issues anyway. In that respect I would find it
theoretically interesting (but I don't actually need it) to have
random floating point numbers with a fixed exponent. That would remove the
nonlinearity of the float representation.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk