Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: John Salmon (john_at_[hidden])
Date: 2014-05-06 09:43:28


There's a new version at:

https://github.com/DEShawResearch/Random123-Boost

that I think addresses your concerns. Click on the link to see the new
README.

On Fri, May 2, 2014 at 11:43 AM, Steven Watanabe <watanabesj_at_[hidden]>wrote:

> AMDG
>
> On 04/29/2014 10:00 AM, John Salmon wrote:
> >
> > I've dusted off the code I wrote a couple of years ago and republished
> it,
> > now on github:
> >
> > https://github.com/DEShawResearch/Random123-Boost
> >
>

> > <snip>
> >
> > CBRNGs overcome all these problems. With CBRNGs, the code looks like
> > this (see libs/random/examples/counter_based_example.cpp for a fully
> > worked example):
> >
> > using namespace boost::random;
> > typedef threefry<4, uint32_t> Prf;
> > normal_distribution nd;
> > Prf::key_type key = {seed};
> > Prf prf(key); // seed may come from command line
> > for(size_t i=0; i<atoms.size(); ++i){
> > float boltzmannfactor = sqrt(kT/atoms[i].mass);
> > Prf::domain_type d = {atoms[i].id, timestep, THERMALIZE_CTXT};
> > counter_based_urng<Prf> cbrng(prf, d);
> > nd.reset();
> > atoms[i].vx = boltzmannfactor*nd(cbrng);
> > atoms[i].vy = boltzmannfactor*nd(cbrng);
> > atoms[i].vz = boltzmannfactor*nd(cbrng);
> > }
> >
>
> Is there a good reason why this shouldn't be written as:
>
> for(...) {
> ...
> std::seed_seq s{ seed, atoms[i].id, timestep, THERMALIZE_CTXT };
> counter_based_engine<Prf> cbrng(s);
> ...
> }
>

Seed_seq is the wrong tool for the job. It's a middleman who charges a
hefty commission (time and space) and then delivers your product slightly
damaged. You give it values that you have carefully chosen so that they're
*guaranteed* to be unique. Seed_seq then transforms them into values that
are *probably* unique, but there are no guarantees. If you use it a lot,
the Birthday Paradox will eat away at that "probably" and after a few
billion uses you'll be in for a nasty surprise: two of your generators
will produce exactly the same output.

Seed_seq is an attempt to work around the problem that many generators (and
some of the most popular ones) are difficult to initialize in a way that
results in independent streams. The CBRNGs are designed to give you
independent streams without the overhead or risk of a seed_seq.

I think you'll like the new example code better. From the new README:

    typedef threefry<4, uint32_t> Prf;
    counter_based_engine<Prf, 32> cbeng(seed);
    for(size_t i=0; i<Natoms; ++i){
        float boltzmannfactor = sqrt(kT/atoms[i].mass);
        normal_distribution nd(0., rmsvelocity);
        Prf::domain_type base = {atoms[i].id, timestep, THERMALIZE_CTXT};
        cbeng.restart(base);
        atoms[i].vx = boltzmannfactor*nd(cbeng);
        atoms[i].vy = boltzmannfactor*nd(cbeng);
        atoms[i].vz = boltzmannfactor*nd(cbeng);
    }

You can seed the counter_based_engine directly with an arithmetic seed, or,
if you prefer,
you can use the the much larger key_type, or you can pre-initialize a Prf
and use that.

In addition to seed()-ing (which meets all the standard expectations),
counter_based_engine
can be 'restart()-ed' with a new 'base counter'. This is how you associate
different, independent
sequences with whatever program elements you like: threads, atoms,
whatever.

> I don't really like making the details of the
> algorithm (i.e. key and domain) part of the
> primary interface.
>
>
I think it's important that they be available, but in the new version, you
don't have to use them if you don't want to. A user *may* limit himself
to using standard seeding methods and avoid restart(). Their one
remaining advantage is that they offer constant-time discard(). The
additional features that rely on the details of the underlying
Prf are entirely optional.

> Let's consider the code changes between the two fragments:
> >
> > <snip>
> >
> > - Since it models a URNG, cbrng can be passed as an argument to the
> > normal_distribution, nd. In order to make each atom independent,
> >
> > nd.reset()
> >
> > is called each time through the loop.
> >
> > <snip>
>
> FWIW, I don't really think that reset was a good idea
> in the first place, and it's currently a no-op in all
> distributions. To run the loop in parallel, you'd need
> to make a copy of nd in every thread, anyway.
>

Fair enough. In the example, I've moved the normal distribution
into the loop and removed reset().

> > counter_based_urng
> > ------------------
> >
> > The counter_based_urng class uses the pseudo-random property of PRFs
> > to perform random number generation in accordance with the
> > requirements of a UniformRandomNumberGenerator. <snip>
> >
> > counter_based_engine
> > --------------------
> >
> > The counter_based_engine class is a templated adapter class that
> > models a bona fide RandomNumberEngine. <snip>
> >
>
> Is there really a good reason to make two separate
> templates? A RandomNumberEngine is a URNG, by definition.
>

There was, but not any more. There is now only a counter_based_engine.
It takes a template parameter that determines the length of individual
sequences and the number of permissible 'base counters'.
It has a 'restart()' method that resets the 'base counter'
to begin a new sequence. And it has a few constructor() and
seed() overloads that extend the capabilities of a generic Engine.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk