Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: John Salmon (john_at_[hidden])
Date: 2014-04-23 10:59:41


On Tue, Apr 22, 2014 at 12:25 PM, Steven Watanabe <watanabesj_at_[hidden]>wrote:

> AMDG
>
> On 04/22/2014 08:51 AM, John Salmon wrote:
>
> > <snip>
> > A better approach is to devise a trivial, application-specific mapping
> > of time (iteration number) plus some stable aspect your logical
> > elements (e.g., atom id) into unique 128-bit inputs. You can now
> > obtain the random values you need, wherever and whenever you need them
> > simply by generating a unique 128-bit input and calling the
> > RandomFunction. There is no space overhead. If your parallel
> > algorithm dictates that you need the same random value, e.g., to
> > perform an "update" in two different threads or nodes, you just call
> > the RandomFunction with the appropriate arguments wherever you need it
> > and you get the results you need, where you need them. In parallel
> > applications this is much more natural than trying to divvy up a
> > single stream into multiple substreams by 'discarding' or 'jumping' an
> > inherently sequential state. It is cheaper and easier than
> > associating a stateful generator with each of your logical objects and
> > worrying about the storage and bandwidth to keep them in-sync across
> > the threads/nodes of a parallel computation.
> >
>
> The problem I have with this, basically comes down
> to encapsulation.
>

Yes. I'm arguing against encapsulation, and you're right to be
skeptical. Encapsulation is usually (but not always) a good thing. I
think Thijs put his finger on it. He said:

>> Most of the benefits you see revolve around not owning memory.

Exactly. RandomFunction gives you randomness without the overhead of
carrying around, storing and synchronizing encapsulated state. If
de-encapsulation just meant that the state had to live in memory
somewhere else, then it would be counter-productive. But there are
many applications, e.g., Monte Carlo simulations, where the
de-encapsulated state doesn't have to "live" anywhere, It can be
reconstructed, trivially, on-the-fly, from pre-existing application
data, whenever it's needed.

>
> - distributions consume an unspecified number
> of random values. I have no idea how to make
> the distributions work with a RandomFunction,
> besides requiring them to use a 1-to-1 mapping.
> Even with 64-bit values, this limits the precision
> of the random variates.
> - Taking this one step farther, how can I write
> a generic function which uses an arbitrary
> RandomFunction to generate a sequence of k
> random variates? There's no way to know a
> priori how many unique inputs I would need.
> Even if k is constant, it still depends on
> the block size of the RandomFunction.
>

Yes. Almost nobody wants or needs raw Engine output. Almost
everybody wants and needs the values returned by Distributions.
RandomFunctions absolutely must "work" with Distributions, and
retooling the implementation of Distributions is completely out of the
question.

So how do I propose making the RandomFunction work with distributions?
With an adapter class, let's call it counter_based_generator, that
models a bona fide Boost.UniformRandomNumberGenerator (so it can be
used as an argument to a Boost.Distribution), and that can be
constructed from an instance of a RandomFunction and an instance of
the RandomFunction's counter_type.

It might be used something like this:

template <typename RandomFunction>
void thermalize(AtomCollection& atoms, RandomFunction& randfunc){
  normal_distribution nd;
  for( atom& a : atoms ){
   RandomFunction::counter_type c = {timestep, a.atomid};
   counter_based_generator u(randfunc, c);
   nd.reset();
   a.vx = sqrt(kB * T/a.mass) * nd(urng);
   a.vy = sqrt(kB * T/a.mass) * nd(urng);
   a.vz = sqrt(kB * T/a.mass) * nd(urng);
  }
}

The counter_based_generator does have internal state (copies of c and
randfunc), but it is transient, so in practice the state lives in registers
and never touches memory. In contrast to the advice given at
http://www.boost.org/doc/libs/1_55_0/doc/html/boost_random/reference.html#boost_random.reference.generators
counter_based_generators are meant to be created and destroyed
frequently.

> - A RandomFunction is strictly a PRNG interface.
> It doesn't work very well for expressing other
> sources of random values, such as /dev/urandom,
> specialized devices, and www.random.org.
>

Agreed. But so what? RandomFunction doesn't have to
model /dev/urandom to be useful. The question is whether
it's useful in other contexts.

> I think that we can get most of the benefits of a
> RandomFunction without going away from the Engine
> concept, if we add a couple of extra guarantees
> for engines intended for parallel use:
> - seeding a new engine is fast
> - distinct seeds (within some limits)
> produce uncorrelated sequences.
> Both of these properties are trivial for Engines based
> on RandomFunctions.
>

I think we're getting closer here. If we simply document a few
details about what we expect from a RandomFunction, then we can
provide the guarantees you mention (fast seeding and initialization,
uncorrelated sequences) with a single, generic adapter class along the
lines of the counter_based_generator sketched above. That's basically
what I had in mind when I proposed the RandomFunction "Concept". It's
purpose is to document the requirements of the counter_based_generator
adapter so that it's relatively easy to create new RandomFunctions in
the future, with the expectation that they'll interoperate nicely with
the rest of Boost.Random.

John

> In Christ,
> Steven Watanabe
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk