Subject: Re: [boost] [random] Quantization effects in generating floating point values
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2015-03-05 09:18:17
On Thu, Mar 5, 2015 at 2:39 PM, John Maddock <jz.maddock_at_[hidden]>
> First off, I notice there are no examples for generating floating point
> values in Boost.Random, so maybe what follows is based on a
> misunderstanding, or maybe not...
> Lets say I generate values in [0,1] like so:
> boost::random::mt19937 engine;
> boost::random::uniform_01<boost::random::mt19937, FPT> d(engine);
> FPT d = d(); //etc
> Where FPT is some floating point type.
> Now my concern is that we're taking a 32-bit random integer and
> "stretching" it to a floating point type with rather more bits (53 for a
> double, maybe 113 for a long double, even more in the multi-precision
> world). So quantization effects will mean that there are many values which
> can never be generated.
> It's true that I could use independent_bits_engine to gang together
> multiple random values and then pass that to uniform_01, however that
> supposes we have an unsigned integer type available with enough bits.
> cpp_int from boost.multiprecision would do it, and this does work, but the
> conversions involved aren't particularly cheap. It occurs to me that an
> equivalent to independent_bit_engine but for floating point types could be
> much more efficient - especially in the binary floating point case.
> So I guess my questions are:
> Am I worrying unnecessarily? and
> What is best practice in this area anyway?
> Thanks, John.
I've worried about this in the past, but I've accepted that using a 64 bit
integer engine instead of a 32 is good enough. A 64 bit engine reasonably
saturates 64 bit float conversions, and having 2^-64 probability resolution
is practically enough when computing statistics on large number of random
draws (1 trillion draws<< 2^64)
When using floating point random numbers there are a two main error sources:
* the finite resolution of the probability engine -e.g. 32 bits in your
example-. This determines the number of different random values you can
* but also the non linearity in the float representation. This determines
the number of individual values you can generate in a small interval. E.g.
there are many more float values close to zero then close to 1 when you
convert the mt19937 integers to floats the interval U01.
Since most statistical computations involve floating point computations so
you'll have type 2) issues anyway. In that respect I would find it
theoretically interesting (but I don't actually need it) to have
random floating point numbers with a fixed exponent. That would remove the
non-linearity of the float representation.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk