
Boost : 
From: troyer (troyer_at_[hidden])
Date: 20000531 03:29:05
Jens Maurer wrote in reply to my comments:
> troyer wrote:
> > 1.a.) the concept UniformRandomNumberGenerator specifies
> > v.min()
> > v.max()
> >
> > to be strict lower respective upper bounds. This is perfect
> > for integer generators but is problematic for floating point
> > generators that usually produce numbers in an interval
> > [0,1), i.e. 0 <= x < 1.
>
> I acknowledge the problem. In uniform_01, care has been taken
> so that max() indeed returns 1epsilon.
>
> I have no clear preference on how to solve the general problem.
> I think that a max() member function with different semantics
> depending on integer vs. float may not be intuitive. On the
> other hand, it seems to be really practical and easy. For
> integer generators, there is no way around the [min,max]
> interpretation of the borders due to the described range
> issue. And yes, we do have LCGs with a modulus of 2^32.
I agree with you with respect to integer generators. For
floating point generators my point of view is the following:
a) the normal user will not care if the maximum is inclusive
or not.
b) the power user often needs to not explicitly that the maximum
is NOT inclusive, while I do not see any case where an inclusive
maximum would be required.
I tend to think that slightly differing semantics should not be
a problem, since x<max() implkies x<=max() and leads to
no problems for users not aware of the difference.
Also, differing semantics for integral and floating point
types exist also in std::numeric_limits<T>::min(), where
there is more potential for confusion.
If nobody finds any reason to object I would strongly
recommend the exclusive maximum. It will help with other
distributions (see below).
> > The documentation for all these distributions reads:
> > It transforms a uniform distribution into a ....
> >
> > This is not completely correct. For the code to work
> > it must be a distribution in [0,1) (compare 1.a), such
> > as provided by the "uniform_01" generator.
>
> I cannot confirm this observation. All distributions work
> with arbitrary uniform distributions. One data member for the
> distributions often reads
>
> >>> uniform_01<base_type, RealType> _rng;
Sorry, I missed that here and for uniform_sphere.
>
> which means that any uniform distribution can be used, which
> will be implicitly scaled to a [0,1)distribution. I would
> like to hear about distributions as implemented in my library
> which are buggy in this (or any other) respect. I may still
> have a problem regarding whether the max() value is inclusive
> or exclusive, but this is a different issue (see above).
Correct. If you use a uniform_real, which has an inclusive max()
value when you hit the max() value. For that reason I prefer
the exclusive max() value.
> It is a topic worth a discussion whether it is desirable to
> have this freedom of input generator in the first place.
While it does not seem to be really necessary it is still
nice to have and I agree that one should keep it.
> However, as a general remark, I not think it is appropriate to add
> substantial portions of new code during or following a review. Instead,
> I suggest publishing the library with review *changes* incorporated and
> adding the additional portions as a first update, which shall undergo
> another review.
> Beman, you may want to add this to your "Formal Library Review Process"
> document or suggest a different procedure.
I just asked Beman on Monday on how I should proceed. He also suggested
that I send suggested additions to you after the review has ended.
The uniform_in_sphere code we have currently is Fortran 77. We will
convert it next week.
>
> > 3.) Performance issues
> >
> > Even using fast generators we often spend 10%20% of our CPU time on
> > generating random numbers. Using a slow generator like the Mersenne
> > twister this grows to more than 95%.
>
> In my tests, Mersenne Twister seemed to be the fastest generator
> (see the table in the "introduction" section of the "generators" doc
> page). Of course, I've reimplemented it from the pseudocode and
> it's now fully inlined.
>
> Nonetheless, I look forward to seeing your "directly generate
> random floatingpoint number" approach.
The performance comparison was with the Blitz++ implementation
of the Mersenne twister.
I will send some of the floating point generatirs to you today
or tomorrow as soon as I have converted them to your concepts.
>
> > 3.a.) uniform_smallint
>
> > My proposal
> > is to change it to change the division by a bit shift >>
> > and to use a rejectance step to avoid quantization errors.
>
> I have refrained from using rejectance methods in the library,
> because it adds running time uncertainity which may not be
> acceptable for realtime or embedded systems. In order to limit
> the complexity of the boost submission, alternative implementations
> which use rejectance methods but are faster in the general
> case (for normal_distribution, for example) have been expressly
> removed.
>
> I may reconsider this policy if the review shows support for these
> options.
In production codes I always go for the fastest implementation.
I thus usually use floating point generators, even for something
like uniform_smallint. Maybe one could later add a version of
uniform_smallint that works based on floating point generators.
I agree that for the first submission performance should not be the
key criterion, but the design and interface are more important.
I still would like to see optimized implementations in the future.
Maybe an extra template parameter could be
introduced at a later stage specifying whether one wants
implementations with constant execution times (an no rejectance
methods) or the fastest implementations (with maybe variable execution
times).
>
> > 3.b.) uniform_sphere
>
> There are two issues:
>  How to pass the return value efficiently?
>  The current approach uses a runtimeconfigurable dimension. Is it
> useful to have a compiletime configurable dimension (i.e. template
> parameter) as well? If so, we could specialize it for 3 dimensions
> and unroll loops and/or use faster methods.
>
> > template<class UniformRandomNumberGenerator, class RealType>
> > std::vector<RealType>
> > uniform_sphere<UniformRandomNumberGenerator, RealType>::operator()()
> > {
> > result_type result(_dim);
> > //....
> > return result;
> > }
> >
> > might make sense but is extremely inefficient.
>
> Not really. For compilers with advanced return value optimization,
> the return type will be constructed in the caller's stack frame.
That's possible.
>
> > iii) this is the solution I prefer for large dimensions.
> > Provide an extra data member of type std::vectory<RealType>,
> > have the operator() write into that data member and
> > have it return a const reference to this data member.
>
> I like this approach. It avoids copying and dynamic memory
> management, if the caller is cooperative.
Does anyone see a case where this might lead to problems?
>
> > of course one need not return a std::vector but could
> > provide the output container as an additional template
> > parameter.
>
> Without delving into any std::vector vs. std::deque wars, this
> seems to be a usable generalization.
I would also like that, since in our applications I use
a blitz::TinyVector and would not like to be restricted to std::vector.
> > 4.) Quality
> >
> > 4.a) uniform_int
> >
> > This class provides random integers with more bits
> > than the base random number generatot by concatenating
> > several unform random integers. As in LCGs the low bits are
> > usually bad it is useless to concatenate it with another
> > random number. Instead of supporting this bad habit by a
> > library in the class I would remove it and rather encourage
> > the use of generators with more bits.
>
> I do not quite understand. Is it a bad habit to use LCGs
> (seconded, but they're wellknown and easy to understand) or
> is it a bad habit to have the concatenating uniform_int?
> The latter still seems to make sense for nonbraindamaged
> base generators. Sometimes, the orthogonality of the library
> (generators and distributions are orthogonal) hurts slightly.
I would never trust a concatenated uniform_int as long as I have
not tested the low bits very carefully. My point of view is that
if a user ever needs random integers with a large number of bits
he has to know exactly what he is doing an needs to test the
quality carefully. Thus:
>
> > Anyways, at least a strong
> > warning should be written into the documentation.
>
> Consider it done.
this warning is probably all that is needed.
> > 4.b) mersenne_twister
> > Since the Merseene Twister failed the random walk test badly
> > we do not use it anymore.
>
> Was that with my (possibly buggy) implementation or with the
> reference implementation?
It was with the optimized C version on the web page of the
authors. To be fair I have to add that most random number gernators
fail the test. The only one we found to pass it is the highest
level ranlux of the CERN library, which is even slower than the
Mersenne twister. The Fibonacci generators also fail this particular test,
but not as badly as the Mersenne twister. Thus we use the Fibonacci
generators for speed reasons.
>
> > 4.c.) tests of the generators
> > We now plan to also test all the
> > generators in this library and to publish the results once the tests
> > are finished.
>
> It would be tremendously useful to include a test suite with the
> boost random number library.
We will consider doing this. As the test suite is currently
Fortran 77 we will need some time to convert it.
>
> Thank you for your comments!
Thank you for your efforts in writing this library. It saves
me weeks of work!
Matthias Troyer
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk