
Boost : 
From: Damien Fisher (damienf_at_[hidden])
Date: 20060808 02:01:38
FWIW, I've tried using boost::random *many* times in serious projects,
and each time when it gets to testing I have to rewrite all my random
code to move away from it, due to what is generally very bizarre (and
nonrandom) behavior. The bugs you've listed are pretty indicative of
my experience, although I haven't been quite as professional (that is,
I never bothered submitting any reports ;) ).
It would be nice if boost::random were usable, but is it even actively
maintained anymore? I've gone back to it several times but always run
into a problem almost instantly.
On 8/8/06, Charles Karney <ckarney_at_[hidden]> wrote:
> >>I haven't delved into the code to see where all these problems arise.
> >
> > interestingly, just two days ago I received similar report regarding
> > "bad" distribution of uniform_int . Here is test program (comparing
> > number of random results in top and bottom 10% of range):
>
> This bug has a different cause from the ones I reported. Here I
> identify the underlying causes for the bugs:
>
> > (1) uniform_int<char>(low, high) produces values outside [low, high].
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<char> >
> > unichar(r, uniform_int<char>(100, 100));
> >
> > (2) uniform_int<short>(low, high) produces values outside [low, high].
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<short int> >
> > unishort(r, uniform_int<short int>(20000, 20000));
>
> These are both caused by overflow in the statement
>
> _range = _max  _min;
>
> in init() in random/uniform_int.hpp. For signed types this can
> overflow, so that _range is negative even if _max >= _min. The wreaks
> havoc with the rest of the code.
>
> > (3) uniform_int<long long>(low, high) produces a distribution with the
> > wrong mean. E.g.,
> >
> > variate_generator<random_generator&, uniform_int<long long> >
> > unill(r, uniform_int<long long>(0LL, 0xffffffffffLL));
> >
> > has a mean of about 2.0e9 instead of 5.5e11.
> >
> > (4) uniform_int<long long>(low, high) sometimes takes a LONG time.
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<long long> >
> > unillint(r, uniform_int<long long>(0LL, 0x10000000000LL));
> > unillint();
> >
> > takes about 40 seconds on a 800 MHz Intel PC.
>
> These are both caused by problems in do_compare<false, true> in
> random/detail/signed_unsigned_compare.hpp. In particular the statement
>
> static bool lessthan(T1 x, T2 y)
> { return y >= 0 && x < static_cast<T1>(y); }
>
> returns nonsensical results with T1 = unsigned and T2 = long long. A
> possible bandaid would be
>
> static bool lessthan(T1 x, T2 y) {
> return y >= 0 &&
> (std::numeric_limits<T1>::max() >= std::numeric_limits<T2>::max() ?
> x < static_cast<T1>(y) :
> static_cast<T2>(x) < y);
> }
>
> However I fear that there are other places where overflow can occur in
> uniform_int.
>
> The following wasn't reported in my earlier Email, but illustrates the
> bug forwarded by Bronek Kozicki:
>
> (3a) uniform_int<int>(low, high) produces a distribution with the
> wrong mean. E.g.,
>
> variate_generator<random_generator&, uniform_int<int> >
> uniint(r, uniform_int<int>(0, 8000000001));
>
> has a mean of about 3.7e8 instead of 4.0e8. This is caused by
> uniform_int invoking uniform_smallint which typically yields approximate
> results. The threshold for this
>
> if(brange / _range > 4) {
> return boost::uniform_smallint<result_type>(_min, _max)(eng);
> }
>
> is absurdly generous. In fact, I would argue that uniform_int should
> NEVER call uniform_smallint but should ALWAYS return accurate results
> (assuming that the underlying generator is perfect).
>
> Unfortunately if the test (brange / _range > 4) is changed to (false), a
> very inefficient rejection method takes over (see the 40 sec computation
> of a uniform_int<long long>(low, high) in (4) above). For an efficient
> rejection method that returns uniform integers in an arbitrary range,
> see the definition of IntegerC<T>(T m, T n) in
>
> http://charles.karney.info/random/
>
> > (5) uniform_01<random_generator, float> uni01(r) sometimes returns a
> > result outside [0,1). Specifically it can return 1.0f because
> > (2^321)/2^32 rounds to 1.
>
> uniform_01 returns
>
> result_type(_rng()  _rng.min()) /
> (result_type(_rng.max()  _rng.min()) + 1)
>
> This has two problems. If result_type is float, the result can overflow
> to 1 (for mt19937). If result_type is double or long double, the result
> only contains 32 bits of randomness (instead of 53 bits or 64 bits).
>
> Fixing these problems appears to be a very messy undertaking given the
> present structure of the random number library. Thus dealing with a
> random generator returning long ints in [2e9, 2e9] is going to be a
> pain since 4e9 isn't representable as a long int.
>
> One possible solution is to require more of the generators. Thus
> integer generators might be required to return unsigned results in
> [0,max]. Similarly real generators might be required to return results
> representable as eps*[0,max] for some (queriable) eps.
>
> I might even suggest going further and requiring that max = 2^b  1, so
> that a whole number of bits of randomness are returned. This greatly
> simplifies the computation of the basic integer and real distributions.
>
> In particular, my random number library (based on mt129937, see web site
> given above) provides implementations of uniform real distributions
> which as equivalent to sampling uniformly in [0,1] and EXACTLY rounding
> the results down either to a multiple of the machine precision, e.g.,
> Fixed<T>(), or to the nearest representable floating point number, e.g.,
> Float<T>().
>
> Perhaps this might slow some generators down (but notably not mt19937).
> But that is far preferable to the current situation where incorrect
> results are being obtained.
>
> 
> Charles Karney <ckarney_at_[hidden]>
> Sarnoff Corporation, Princeton, NJ 085435300
>
> URL: http://charles.karney.info
> Tel: +1 609 734 2312
> Fax: +1 609 734 2662
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk