Boost :

Date view	Thread view	Subject view	Author view

From: Damien Fisher (damienf_at_[hidden])
Date: 2006-08-08 02:01:38

Next message: Simon Atanasyan: "Re: [boost] Sun-5.8 failures"
Previous message: Oliver.Kowalke_at_[hidden]: "[boost] [defer/thread_pool] - added again to boost vault"
In reply to: Charles Karney: "Re: [boost] Bugs in boost random number distributions"
Next in thread: Charles Karney: "Re: [boost] Bugs in boost random number distributions"
Reply: Charles Karney: "Re: [boost] Bugs in boost random number distributions"

FWIW, I've tried using boost::random *many* times in serious projects,
and each time when it gets to testing I have to rewrite all my random
code to move away from it, due to what is generally very bizarre (and
non-random) behavior. The bugs you've listed are pretty indicative of
my experience, although I haven't been quite as professional (that is,
I never bothered submitting any reports ;) ).

It would be nice if boost::random were usable, but is it even actively
maintained anymore? I've gone back to it several times but always run
into a problem almost instantly.

On 8/8/06, Charles Karney <ckarney_at_[hidden]> wrote:
> >>I haven't delved into the code to see where all these problems arise.
> >
> > interestingly, just two days ago I received similar report regarding
> > "bad" distribution of uniform_int . Here is test program (comparing
> > number of random results in top and bottom 10% of range):
>
> This bug has a different cause from the ones I reported. Here I
> identify the underlying causes for the bugs:
>
> > (1) uniform_int<char>(low, high) produces values outside [low, high].
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<char> >
> > unichar(r, uniform_int<char>(-100, 100));
> >
> > (2) uniform_int<short>(low, high) produces values outside [low, high].
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<short int> >
> > unishort(r, uniform_int<short int>(-20000, 20000));
>
> These are both caused by overflow in the statement
>
> _range = _max - _min;
>
> in init() in random/uniform_int.hpp. For signed types this can
> overflow, so that _range is negative even if _max >= _min. The wreaks
> havoc with the rest of the code.
>
> > (3) uniform_int<long long>(low, high) produces a distribution with the
> > wrong mean. E.g.,
> >
> > variate_generator<random_generator&, uniform_int<long long> >
> > unill(r, uniform_int<long long>(0LL, 0xffffffffffLL));
> >
> > has a mean of about 2.0e9 instead of 5.5e11.
> >
> > (4) uniform_int<long long>(low, high) sometimes takes a LONG time.
> > E.g.,
> >
> > variate_generator<random_generator&, uniform_int<long long> >
> > unillint(r, uniform_int<long long>(0LL, 0x10000000000LL));
> > unillint();
> >
> > takes about 40 seconds on a 800 MHz Intel PC.
>
> These are both caused by problems in do_compare<false, true> in
> random/detail/signed_unsigned_compare.hpp. In particular the statement
>
> static bool lessthan(T1 x, T2 y)
> { return y >= 0 && x < static_cast<T1>(y); }
>
> returns nonsensical results with T1 = unsigned and T2 = long long. A
> possible band-aid would be
>
> static bool lessthan(T1 x, T2 y) {
> return y >= 0 &&
> (std::numeric_limits<T1>::max() >= std::numeric_limits<T2>::max() ?
> x < static_cast<T1>(y) :
> static_cast<T2>(x) < y);
> }
>
> However I fear that there are other places where overflow can occur in
> uniform_int.
>
> The following wasn't reported in my earlier E-mail, but illustrates the
> bug forwarded by Bronek Kozicki:
>
> (3a) uniform_int<int>(low, high) produces a distribution with the
> wrong mean. E.g.,
>
> variate_generator<random_generator&, uniform_int<int> >
> uniint(r, uniform_int<int>(0, 800000000-1));
>
> has a mean of about 3.7e8 instead of 4.0e8. This is caused by
> uniform_int invoking uniform_smallint which typically yields approximate
> results. The threshold for this
>
> if(brange / _range > 4) {
> return boost::uniform_smallint<result_type>(_min, _max)(eng);
> }
>
> is absurdly generous. In fact, I would argue that uniform_int should
> NEVER call uniform_smallint but should ALWAYS return accurate results
> (assuming that the underlying generator is perfect).
>
> Unfortunately if the test (brange / _range > 4) is changed to (false), a
> very inefficient rejection method takes over (see the 40 sec computation
> of a uniform_int<long long>(low, high) in (4) above). For an efficient
> rejection method that returns uniform integers in an arbitrary range,
> see the definition of IntegerC<T>(T m, T n) in
>
> http://charles.karney.info/random/
>
> > (5) uniform_01<random_generator, float> uni01(r) sometimes returns a
> > result outside [0,1). Specifically it can return 1.0f because
> > (2^32-1)/2^32 rounds to 1.
>
> uniform_01 returns
>
> result_type(_rng() - _rng.min()) /
> (result_type(_rng.max() - _rng.min()) + 1)
>
> This has two problems. If result_type is float, the result can overflow
> to 1 (for mt19937). If result_type is double or long double, the result
> only contains 32 bits of randomness (instead of 53 bits or 64 bits).
>
> Fixing these problems appears to be a very messy undertaking given the
> present structure of the random number library. Thus dealing with a
> random generator returning long ints in [-2e9, 2e9] is going to be a
> pain since 4e9 isn't representable as a long int.
>
> One possible solution is to require more of the generators. Thus
> integer generators might be required to return unsigned results in
> [0,max]. Similarly real generators might be required to return results
> representable as eps*[0,max] for some (queriable) eps.
>
> I might even suggest going further and requiring that max = 2^b - 1, so
> that a whole number of bits of randomness are returned. This greatly
> simplifies the computation of the basic integer and real distributions.
>
> In particular, my random number library (based on mt129937, see web site
> given above) provides implementations of uniform real distributions
> which as equivalent to sampling uniformly in [0,1] and EXACTLY rounding
> the results down either to a multiple of the machine precision, e.g.,
> Fixed<T>(), or to the nearest representable floating point number, e.g.,
> Float<T>().
>
> Perhaps this might slow some generators down (but notably not mt19937).
> But that is far preferable to the current situation where incorrect
> results are being obtained.
>
> --
> Charles Karney <ckarney_at_[hidden]>
> Sarnoff Corporation, Princeton, NJ 08543-5300
>
> URL: http://charles.karney.info
> Tel: +1 609 734 2312
> Fax: +1 609 734 2662
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Next message: Simon Atanasyan: "Re: [boost] Sun-5.8 failures"
Previous message: Oliver.Kowalke_at_[hidden]: "[boost] [defer/thread_pool] - added again to boost vault"
In reply to: Charles Karney: "Re: [boost] Bugs in boost random number distributions"
Next in thread: Charles Karney: "Re: [boost] Bugs in boost random number distributions"
Reply: Charles Karney: "Re: [boost] Bugs in boost random number distributions"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk