Subject: Re: [boost] [random] new threefry random engine
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2014-04-21 17:38:16
On 21 Apr 2014, at 22:28, John Salmon <john_at_[hidden]> wrote:
> On Fri, Apr 18, 2014 at 4:00 AM, Thijs van den Berg <thijs_at_[hidden]> wrote:
>> On 18 Apr 2014, at 06:54, Steven Watanabe <watanabesj_at_[hidden]> wrote:
>>> On 04/17/2014 04:19 PM, Thijs van den Berg wrote:
>>>> I've completed a nice new random engine that I would like to propose
>> for addition to boost/random.
>>>> The engine is called "threefry" and it's based on the paper "Parallel
>> random numbers: as easy as 1, 2, 3" by Salmon, John K. and Moraes, Mark A.
>> and Dror, Ron O. and Shaw, David E.
>>>> I've been using it myself a lot for the last couple of years, and now
>> I've rewritten it so that it will fit in the boost random framework and
>> adhere to the concepts. It has some really nice properties, in particular
>> for large scale parallel computing:
>>>> * High quality randomness (according to the BigCrush tests) with a
>> cycle length of 2^258
>>>> * Fast (comparable to Mersenne Twister) & little memory usage (13x 64
>> bit integers)
>>>> * O(1) discard and guarantees for non-overlapping sequences when the
>> (up to 256 bit) seeds are unequal
>>>> I've uploaded the code, description and tests to
>>>> The actual engine is at
>>>> I would really appreciate any feedback about the idea for inclusion and
>> of course comments about the quality of the code,tests,docs
>>> I /really/ dislike the constraint that
>>> rounds shall be 13 or 20. It looks like
>>> encrypt_counter follows a regular pattern.
>>> I'd strongly prefer it to be implemented
>>> as a loop and allow rounds to be any
>>> non-negative integer. Not all possible
>>> values make sense, but that's normal.
>>> You have to know what you're doing when
>>> you define a specialization of any random
>>> number engine.
>> Very good. Ill have to change things a a bit. The algorithm has indeed
>> reppetitive patterns of multiples of length 4 and 8. Ill make it an
>> unconstrained integer and a loop.
>>> Do not use reinterpret_cast in operator().
>>> The behavior is undefined, and will certainly
>>> differ depending on endianness.
>> I did this for speed and I didnt worry about endianness because it never
>> crossed my mind. Changing the order of bits/bytes should not affect the
>> quality of the randomness, but its indeed not nice to have different
>> behaviour on different machines. Ill fix this.
>>> I'm not sure that I like the fact that the
>>> textual representation interleaves the three
>>> arrays. Also output is a function of key and
>>> counter and does not need to be stored. (The
>>> same observation applies to the comparison
>> Yes, Ive already changed that.
>>> Is it really a good idea to expose set_key/set_counter?
>> Indeed, its not needed, Ive moved them to private.
>>> It would be better to have an explicit width parameter,
>>> like mersenne_twister instead of relying on the size of
>> Ok, this takes a bit of time but its not dramatic. Its something that
>> combines nicely with the operator() point.
>>> I'd also prefer to see uint_least64_t in place
>>> of uint64_t.
>>> (Actually I'd like it even more if the
>>> arrays stored UIntType, but this doesn't look particularly
>>> feasible, given the way the algorithm works.)
>> This algorithm is what they call the 4x64 variant. The paper also talks
>> about a 2x64 and 4x32 variants (but e.g. not 8x32, 2x32). However Im not
>> sure if we need to implement all the variant theve studied (maybe yes,
>> maybe no). They recommend 4x64 with 20 rounds as the best choice for CPUs
>> and thats the main one I want to provide. AFAIK the state and the
>> return_type need not be the same. Any 32 bit engine can be turned into a 64
>> engine by patching two returns together.
>> The 32 bit state type versions have different constants in the repetitive
>> operators. The authors also studies variants with different cryptographic
>> functions. E.g. one uses EAS (but that has a different name than threefry)
>> which can be faster on CPU's supports EAS instructions. It however calls
>> for assembler. There is also a 3rd variant that targets CUDA machines. In a
>> wider setting all these variant use a counter and a cryptographic function
>> (which probably all use a key?). I havent studies the other variants.
>> Perhaps we can later refactor -without changing the interface- the code
>> when we decide to add more variants? E.g. can we stick to the 4x64 threefry
>> algorithm function, have two versions that return either 32 or 64 bit
>> integers specified by a width argument. The two other variant 2x64 and 4x32
>> could perhaps be added later, those are however not recommended by the
>> paper. I would call the 4x64 part of the algorithm, not NxM with template
>> arguments that can be tweaked by smart users.
> I'm an author of the original Threefry implementation and the paper
> that describes it
> First, let me say that I'm extremely happy to see our work being
> considered for boost, and I thank Thijs for initiating the discussion.
> But it seems to me that the code under discussion is
> too specifically related to Threefry itself. Threefry is one of a
> potentially large class of random number generators that derive their
> power from a stateless random function rather than a stateful random
> generator. We called these "Counter Based Random Number Generators"
> in our SC11 paper, but perhaps "stateless" would have been a better
> name. We described three new classes of generators: Threefry, Philox
> and ARS, and noted that others have been discussed in the past,
> including generators based on DES, MD5, SHA-1 and TEA. We hope that
> more will be discovered/described in the future.
> It is the *statelessness* of the underlying function, which is common
> to all of them, that makes them so well-suited to parallelization.
> Statelessness allows them all to trivially 'discard' values and to
> create practically unlimited numbers of non-overlapping streams. It
> allows applications to easily associate statistically independent
> random sequences with individual logical computational objects (atoms,
> grid cells, ticker symbols, data points, timesteps, etc.), independent
> of physical location (thread id, node number, etc).
> I would love to see a boost implementation that captured and exposed
> this capability for general use. Thus, I think a better approach is
> as follows:
> - define a new "RandomFunction" concept that dscribes/enforces the
> essential, common features of the randomizing function in Threefry,
> Philox, ARS and other counter-based random generators.
> - implement a Threefry (and perhaps other) function templates that
> model RandomFunction.
> - implement an adapter that produces a bona fide "Random Number
> Engine" (c.f. C++11 standard section 22.214.171.124 [rand.req.eng]) from any
> RandomFunction. This is where we bridge the gap between the new
> RandomFunction and the conventional "generators" whose interface is
> already well-defined. This adapter lets us use a new RandomFunction,
> e.g., Threefry, with code that expects to talk to an "Engine".
> It would be similar to the r123::Engine class documented at
> but completely rewritten to avoid the C99/C++03/CUDA/OpenCL baggage
> that clutters/obscures that code.
> - implement an adapter that produces a bona fide "Uniform Random
> Number Generator" (c.f. 126.96.36.199 [rand.req.urng]). Similarly inspired
> by the MicroURNG adapter documented at
> but again, completely rewritten.
> This feels like the right level of generality to me. I'd love to
> discuss this further and reach consensus - especially the details of
> the RandomFunction concept. If there seems to be interest, I'd
> be happy to do some coding as well. In any case, let me reiterate my
> appreciation for Thijs' work.
Good to hear form you!
In my view we could indeed decompose the design into something like
* a counter, tied to the random engine discard() and have O(1)
* function specific constants and variables (like the key in threefry, or perhaps the rotation constants?) and tie some of those to the seed(). Other options are to tie the seed to the counter, but that would result in something you can already achieve with the discard.
and merge the two together into a boost engine (which is a superset of the c++11 engine, it has an additional generate() member function)
On the other hand, -from a user perspective-, it shouldnt matter much *how* the engines are constructed as long as they provide a random engine interface. My plan was to have a narrow scope and get a good engine in production within a short timeframe. I know that little design details can cost *a lot* of time and those can sometimes cause projects to retire before they even get completed, ..thats my biggest worry (Ive seen quite a few ambitious things in the boost sandbox), I remember you talking about this a long time ago but it didnt materialise. Ive used the threefry a lot myself -wrote my own version-, and I like it a lot -I think its a brilliant idea in many ways-, and so my goal was to have a narrow scope, deliver, and make sure that people could start using it.
The perfect way forward for me would be to come up with a generic crypto counter template interface like you suggest of which the current threefry would be a specialisation (the function, state etc).That way we could finalise the current threefry, and absorb it in a bigger project later. *but* in case the bigger projects ends in the sandbox, we would still has this initial threefry.
..so I *really* like the idea -Ill join the project if you like-, but Im worried about the risks of widening the scope.
> John Salmon
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk