Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2014-04-22 12:25:03


On 04/22/2014 08:51 AM, John Salmon wrote:
> <snip>
> With sequential
> generators, a sufficiently motivated and careful user can use the
> discard() function to initialize multiple generators in parallel.
> For most generators, discard(N) is O(N), defeating its use for this
> purpose, but for a few (including Threefry), discard(N) is O(1) making
> this a practical, but hardly elegant or natural approach to
> parallelism.

FWIW, discard isn't inherently O(N) for most generators.
Faster algorithms exist, but are not widely implemented.

> <snip>
> A better approach is to devise a trivial, application-specific mapping
> of time (iteration number) plus some stable aspect your logical
> elements (e.g., atom id) into unique 128-bit inputs. You can now
> obtain the random values you need, wherever and whenever you need them
> simply by generating a unique 128-bit input and calling the
> RandomFunction. There is no space overhead. If your parallel
> algorithm dictates that you need the same random value, e.g., to
> perform an "update" in two different threads or nodes, you just call
> the RandomFunction with the appropriate arguments wherever you need it
> and you get the results you need, where you need them. In parallel
> applications this is much more natural than trying to divvy up a
> single stream into multiple substreams by 'discarding' or 'jumping' an
> inherently sequential state. It is cheaper and easier than
> associating a stateful generator with each of your logical objects and
> worrying about the storage and bandwidth to keep them in-sync across
> the threads/nodes of a parallel computation.

The problem I have with this, basically comes down
to encapsulation.

- distributions consume an unspecified number
  of random values. I have no idea how to make
  the distributions work with a RandomFunction,
  besides requiring them to use a 1-to-1 mapping.
  Even with 64-bit values, this limits the precision
  of the random variates.
- Taking this one step farther, how can I write
  a generic function which uses an arbitrary
  RandomFunction to generate a sequence of k
  random variates? There's no way to know a
  priori how many unique inputs I would need.
  Even if k is constant, it still depends on
  the block size of the RandomFunction.
- A RandomFunction is strictly a PRNG interface.
  It doesn't work very well for expressing other
  sources of random values, such as /dev/urandom,
  specialized devices, and

I think that we can get most of the benefits of a
RandomFunction without going away from the Engine
concept, if we add a couple of extra guarantees
for engines intended for parallel use:
- seeding a new engine is fast
- distinct seeds (within some limits)
  produce uncorrelated sequences.
Both of these properties are trivial for Engines based
on RandomFunctions.

In Christ,
Steven Watanabe

Boost list run by bdawes at, gregod at, cpdaniel at, john at