Boost logo

Boost :

Subject: Re: [boost] [random] new threefry random engine
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2014-04-20 07:17:09


On 20 Apr 2014, at 01:39, Steven Watanabe <watanabesj_at_[hidden]> wrote:

> AMDG
>
> On 04/19/2014 04:35 PM, Thijs van den Berg wrote:
>> What’s your view on limiting the round to <=20 for the template
>
> I still don't like it. If all else fails, you can use the
> optimized version for rounds <= 20, and the slow version
> for rounds > 20.
>
>> and providing only the 20 round as a typedef?
>>
>
> I'd favor providing both. There are plenty of inferior algorithms
> in Boost.Random. I would anticipate that anyone who wants to
> use an algorithm other than mt19937 would have some idea of
> the tradeoffs.
Ok, I now have these 4 typedefs.
* 13 and 20 rounds
* a 32 and 64 bit engine

>
>> I have addressed most other points you’ve mentioned, but the performance issue of a generic rounds version has failed me.
>>
>
> In theory it could be optimized. What compiler and
> optimization settings are you using? In particular,
> are you using -funroll-loops (GCC)? The version
> you show unrolls the loop 4x. What if you
> unroll 8x and kill the constant arrays? What
> about 20x and eliminate the % 5 in the key addition?
> Or 40x and eliminate both?
>
I’ve done the 40x version now, and wrapped a loop around it so that you can get arbitrary number of rounds by repeating (partially) this block of 40 rounds. The code is a bit repettitive but the performance is good now. Here are timings for the 40x unrolled version for 8,13,20 and 99 rounds

// loop version
threefry4x64_08_64: 10.2318 nsec/loop = 19.13350 CPU cycles
threefry4x64_13_64: 14.3048 nsec/loop = 26.75000 CPU cycles
threefry4x64_20_64: 22.6186 nsec/loop = 42.29680 CPU cycles
threefry4x64_99_64: 100.0110 nsec/loop = 187.02200 CPU cycles

// 40x manual unrolled version
threefry4x64_08_64: 3.7386 nsec/loop = 6.99118 CPU cycles
threefry4x64_13_64: 5.1223 nsec/loop = 9.57870 CPU cycles
threefry4x64_20_64: 7.3078 nsec/loop = 13.66560 CPU cycles
threefry4x64_99_64: 29.3599 nsec/loop = 54.90300 CPU cycles

Do you think is is good a solution?
The code is at https://github.com/sitmo/threefry/blob/master/random/include/boost/random/threefry.hpp
 
> In Christ,
> Steven Watanabe
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk