Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: Thijs (M.A.) van den Berg (thijs_at_[hidden])
Date: 2014-04-27 08:58:57


On Apr 27, 2014, at 1:39 PM, Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:

> On 22/04/14 01:51, Steven Watanabe wrote:
>> AMDG
>>
>> On 04/21/2014 02:28 PM, John Salmon wrote:
>>> <snip>
>>> I would love to see a boost implementation that captured and exposed
>>> this capability for general use. Thus, I think a better approach is
>>> as follows:
>>>
>>> - define a new "RandomFunction" concept that dscribes/enforces the
>>> essential, common features of the randomizing function in Threefry,
>>> Philox, ARS and other counter-based random generators.
>>
>> I don't think this is a particularly useful
>> abstraction. Most users should not care at
>> all about this, since they should be using
>> the predefined engine typedefs. If more
>> similar generators are added later, then the
>> implementation can be adjusted to factor
>> out the common boilerplate, but I really
>> don't see the point of creating a whole new
>> concept for this.
>
> People usually want to do something like this
>
> #pragma omp for
> for(int i=0; i<n; ++i)
> {
> out[i] = in[i] * random_value();
> }
>
>
> If the random_value generator is shared, not only do you get poor performance due to synchronization, but you also get non-deterministic results.
Yes. That was the problem with the old random generators, but nowadays engines have local state.
>
> If you use a per-thread generator, this is a bit more annoying to set up, manage and reproduce as the results will still depend on which threads get to do what and how each of them are seeded.
>
> With a single, stateless generator, everything is much simpler. One seed, one generator, completely reproducible results regardless of scheduling. The interface i want to have is this:
>
> #pragma omp for
> for(int i=0; i<n; ++i)
> {
> out[i] = in[i] * random_value(i);
> }
I like this interface. I'm experimenting now with something similar.

You would still need to split the range [0,n) into non-overlapping subsets per thread. You will need some unique id per thread -either a thread id, or some static counter that threads use when they are initialized- to do that. That unique id can also be used to seed regular random engines, so that is to me not a big addition.

The main gains I see are:
1) that you can be more efficient with memory
2) that you have better guarantees wrt independent sequences. If you seed a regular engine with 1 and another one with 2 you won't now if the two generated random sequences will have overlap or not. With a counter you can guarantee that, you have control over the things you put into the random function.

I'm currently experimenting with "random functions" and a "generic counter engine". It patches together a seed+counter (the seed goes to the high bits, counter to the low bits) and it hands out random numbers from the output the random function fills (which can be 64>bits, eg 256 for threefry, 160 for SHA1).

There is also overlap between (the interface or concept of) random functions and std::hash, a generic counter engine should perhaps work with the union of both? We could also use an adapter that wraps a hash interface on a random function which would allow us to construct new hash functions.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk