Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: John Salmon (john_at_[hidden])
Date: 2014-04-21 16:28:10

On Fri, Apr 18, 2014 at 4:00 AM, Thijs van den Berg <thijs_at_[hidden]> wrote:

> On 18 Apr 2014, at 06:54, Steven Watanabe <watanabesj_at_[hidden]> wrote:
> > AMDG
> >
> > On 04/17/2014 04:19 PM, Thijs van den Berg wrote:
> >> I've completed a nice new random engine that I would like to propose
> for addition to boost/random.
> >>
> >> The engine is called "threefry" and it's based on the paper "Parallel
> random numbers: as easy as 1, 2, 3" by Salmon, John K. and Moraes, Mark A.
> and Dror, Ron O. and Shaw, David E.
> >>
> >> I've been using it myself a lot for the last couple of years, and now
> I've rewritten it so that it will fit in the boost random framework and
> adhere to the concepts. It has some really nice properties, in particular
> for large scale parallel computing:
> >> * High quality randomness (according to the BigCrush tests) with a
> cycle length of 2^258
> >> * Fast (comparable to Mersenne Twister) & little memory usage (13x 64
> bit integers)
> >> * O(1) discard and guarantees for non-overlapping sequences when the
> (up to 256 bit) seeds are unequal
> >>
> >> I've uploaded the code, description and tests to
> >> The actual engine is at
> >>
> >> I would really appreciate any feedback about the idea for inclusion and
> of course comments about the quality of the code,tests,docs
> >>
> >
> > I /really/ dislike the constraint that
> > rounds shall be 13 or 20. It looks like
> > encrypt_counter follows a regular pattern.
> > I'd strongly prefer it to be implemented
> > as a loop and allow rounds to be any
> > non-negative integer. Not all possible
> > values make sense, but that's normal.
> > You have to know what you're doing when
> > you define a specialization of any random
> > number engine.
> Very good. I’ll have to change things a a bit. The algorithm has indeed
> reppetitive patterns of multiples of length 4 and 8. I’ll make it an
> unconstrained integer and a loop.
> >
> > Do not use reinterpret_cast in operator().
> > The behavior is undefined, and will certainly
> > differ depending on endianness.
> I did this for speed and I didn’t worry about endianness because it never
> crossed my mind. Changing the order of bits/bytes should not affect the
> quality of the randomness, but it’s indeed not nice to have different
> behaviour on different machines. I’ll fix this.
> >
> > I'm not sure that I like the fact that the
> > textual representation interleaves the three
> > arrays. Also output is a function of key and
> > counter and does not need to be stored. (The
> > same observation applies to the comparison
> > operators)
> Yes, I’ve already changed that.
> >
> > Is it really a good idea to expose set_key/set_counter?
> >
> Indeed, it’s not needed, I’ve moved them to private.
> > It would be better to have an explicit width parameter,
> > like mersenne_twister instead of relying on the size of
> > UIntType.
> Ok, this takes a bit of time but it’s not dramatic. It’s something that
> combines nicely with the operator() point.
> > I'd also prefer to see uint_least64_t in place
> > of uint64_t.
> OK.
> > (Actually I'd like it even more if the
> > arrays stored UIntType, but this doesn't look particularly
> > feasible, given the way the algorithm works.)
> This algorithm is what they call the 4x64 variant. The paper also talks
> about a 2x64 and 4x32 variants (but e.g. not 8x32, 2x32). However I’m not
> sure if we need to implement all the variant the’ve studied (maybe yes,
> maybe no). They recommend 4x64 with 20 rounds as the best choice for CPU’s
> and that’s the main one I want to provide. AFAIK the state and the
> return_type need not be the same. Any 32 bit engine can be turned into a 64
> engine by patching two returns together.
> The 32 bit state type versions have different constants in the repetitive
> operators. The authors also studies variants with different cryptographic
> functions. E.g. one uses EAS (but that has a different name than threefry)
> which can be faster on CPU's supports EAS instructions. It however calls
> for assembler. There is also a 3rd variant that targets CUDA machines. In a
> wider setting all these variant use a counter and a cryptographic function
> (which probably all use a key?). I haven’t studies the other variants.
> Perhaps we can later refactor -without changing the interface- the code
> when we decide to add more variants? E.g. can we stick to the 4x64 threefry
> algorithm function, have two versions that return either 32 or 64 bit
> integers specified by a width argument. The two other variant 2x64 and 4x32
> could perhaps be added later, those are however not recommended by the
> paper. I would call the 4x64 part of the algorithm, not NxM with template
> arguments that can be tweaked by smart users.


I'm an author of the original Threefry implementation and the paper
that describes it

First, let me say that I'm extremely happy to see our work being
considered for boost, and I thank Thijs for initiating the discussion.

But it seems to me that the code under discussion is
too specifically related to Threefry itself. Threefry is one of a
potentially large class of random number generators that derive their
power from a stateless random function rather than a stateful random
generator. We called these "Counter Based Random Number Generators"
in our SC11 paper, but perhaps "stateless" would have been a better
name. We described three new classes of generators: Threefry, Philox
and ARS, and noted that others have been discussed in the past,
including generators based on DES, MD5, SHA-1 and TEA. We hope that
more will be discovered/described in the future.

It is the *statelessness* of the underlying function, which is common
to all of them, that makes them so well-suited to parallelization.
Statelessness allows them all to trivially 'discard' values and to
create practically unlimited numbers of non-overlapping streams. It
allows applications to easily associate statistically independent
random sequences with individual logical computational objects (atoms,
grid cells, ticker symbols, data points, timesteps, etc.), independent
of physical location (thread id, node number, etc).

I would love to see a boost implementation that captured and exposed
this capability for general use. Thus, I think a better approach is
as follows:

- define a new "RandomFunction" concept that dscribes/enforces the
essential, common features of the randomizing function in Threefry,
Philox, ARS and other counter-based random generators.

- implement a Threefry (and perhaps other) function templates that
model RandomFunction.

- implement an adapter that produces a bona fide "Random Number
Engine" (c.f. C++11 standard section [rand.req.eng]) from any
RandomFunction. This is where we bridge the gap between the new
RandomFunction and the conventional "generators" whose interface is
already well-defined. This adapter lets us use a new RandomFunction,
e.g., Threefry, with code that expects to talk to an "Engine".

It would be similar to the r123::Engine class documented at
but completely rewritten to avoid the C99/C++03/CUDA/OpenCL baggage
that clutters/obscures that code.

- implement an adapter that produces a bona fide "Uniform Random
Number Generator" (c.f. [rand.req.urng]). Similarly inspired
by the MicroURNG adapter documented at
but again, completely rewritten.

This feels like the right level of generality to me. I'd love to
discuss this further and reach consensus - especially the details of
the RandomFunction concept. If there seems to be interest, I'd
be happy to do some coding as well. In any case, let me reiterate my
appreciation for Thijs' work.

John Salmon

Boost list run by bdawes at, gregod at, cpdaniel at, john at