Boost logo

Boost :

From: Charles Karney (ckarney_at_[hidden])
Date: 2006-08-08 11:32:09


Damien Fisher wrote:
> FWIW, I've tried using boost::random *many* times in serious projects,
> and each time when it gets to testing I have to rewrite all my random
> code to move away from it, due to what is generally very bizarre (and
> non-random) behavior. The bugs you've listed are pretty indicative of
> my experience, although I haven't been quite as professional (that is,
> I never bothered submitting any reports ;) ).
>
> It would be nice if boost::random were usable, but is it even actively
> maintained anymore? I've gone back to it several times but always run
> into a problem almost instantly.

I don't know what the current maintenance situation is with random.
It's surprising that the bugs I reported have persisted so long (since
2000?).

Here's another misfeature: mt19937 is perhaps the most robust generator
supplied with boost::random. Unfortunately the seeding methods provided
by boost::random are pretty much guaranteed to result in correlated
sequences; these were abandoned by the MT authors long ago (Jan 2002).

I'm torn in knowing what to do. I'd love it if C++ had a random number
library that was well tailored for scientific applications. At a
MINIMUM, this would mean

* One (or at most a few) generators with no known defects. This would
  mean passing the birthday spacings test, all bits in the results
  equally random, etc. mt19937 is a candidate here.

* Seeding options that reliably open up "seed space" so that many (~
  2^30) uncorrelated random number sequences can be generated by
  manipulating the seed in a systematic way. (This is necessary when
  programming parallel applications, where, conceivably, each iteration
  in a do loop needs an independent random number sequence.) The
  2002/01 seeding method for mt19937 provides this. In my random number
  library, I've modified this method slightly to enable all of state
  space to be accessible. In addition, in order to provide a consistent
  interface, I specified the seed to be a vector of unsigned longs
  (possibly of zero length).

* EXACT implementations of

    uniform_int<T>(a,b)
    uniform_01<T>()
    bernoulli_distribution<T>(p)

  Exact here presumes that underlying generator is perfect. Thus a
  Bernoulli distribution with T = float and p = 1/pow(2.0f, 100) should
  yield true 1/2^100 of the time on average. I suggest at two variants
  for uniform_01<T>() (presumably with different names) defined as
  follows: select a real u uniformly in [0,1] and

  * round it down to the nearest multiple of
    1/2^numeric_limits<T>::digits: uniform_fixed01<T>(),

  or

  * round it down to the nearest representable T: uniform_float01<T>().
    Thus for T = float, 0.75f would be returned with probability 1/2^24,
    0.375f would be returned with probability 1/2^25, etc.

* Other numerical distributions (e.g., normal) would compute their
  results using these three basic distributions rather than directly
  accessing the raw data from the underlying generator. Thus:

               random generator produces raw random data
                           (assumed perfect)

                                   |
                                   v

                     exact uniform distributions of
             integers and reals and Bernoulli distribution

                                   |
                                   v

                  high quality numerical distributions
                       (normal, exponential, etc)

The "exactness" requirement can be met which reasonable efficiency
(faster that boost's implementation typically). For a sample
implementation (built on top of mt19937)

    http://charles.karney.info/random

(This also provides several additional exact uniform distributions,
e.g., rounding the random u in different directions, supplying results
in [-1/2,1/2] with guaranteed symmetry, varying the precision of the
real results, etc.)

I wish there were some way of "fixing" the boost implementation before
it becomes a C++ standard. Unfortunately some of the problems stem not
from the implementation but from the overall architecture:

* too loose requirements on the random generators;

* restrictive seeding facilities;

* a clumsy (and inexact) interface between the random generators and the
  random distributions.

Are these architectural issues up for discussion? I rather suspect that
we've missed the boat on this.

-- 
Charles Karney <ckarney_at_[hidden]>
Sarnoff Corporation, Princeton, NJ 08543-5300
URL: http://charles.karney.info
Tel: +1 609 734 2312
Fax: +1 609 734 2662

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk