Boost logo

Boost :

From: Jens Maurer (jmaurer_at_[hidden])
Date: 2000-05-30 17:04:18


The Formal Review Guidelines don't say something about whether
the author of a library may comment on review comments.

I'll try anyway.

First of all, thank you for your helpful comments.

troyer wrote:
> 1.a.) the concept UniformRandomNumberGenerator specifies
>
> X::min_value
> X::max_value

These two are not relevant for floating point generators,
because compile-time constants must be of integral type.

> v.min()
> v.max()
>
> to be strict lower respective upper bounds. This is perfect
> for integer generators but is problematic for floating point
> generators that usually produce numbers in an interval
> [0,1), i.e. 0 <= x < 1.

I acknowledge the problem. In uniform_01, care has been taken
so that max() indeed returns 1-epsilon.

I have no clear preference on how to solve the general problem.
I think that a max() member function with different semantics
depending on integer vs. float may not be intuitive. On the
other hand, it seems to be really practical and easy. For
integer generators, there is no way around the [min,max]
interpretation of the borders due to the described range
issue. And yes, we do have LCGs with a modulus of 2^32.

> 1.b.) validation of floating point generators.
> As solutions I propose a
> modified validation function
>
> bool validation(T x)
>
> that checks if the calculated number x is within the expected numerical
> errors of a precalculated hardcoded one.

This is a more general approach to validation, which I shall
implement unless other reviewers have differing opinions.

> The documentation for all these distributions reads:
> It transforms a uniform distribution into a ....
>
> This is not completely correct. For the code to work
> it must be a distribution in [0,1) (compare 1.a), such
> as provided by the "uniform_01" generator.

I cannot confirm this observation. All distributions work
with arbitrary uniform distributions. One data member for the
distributions often reads

>>> uniform_01<base_type, RealType> _rng;

which means that any uniform distribution can be used, which
will be implicitly scaled to a [0,1)-distribution. I would
like to hear about distributions as implemented in my library
which are buggy in this (or any other) respect. I may still
have a problem regarding whether the max() value is inclusive
or exclusive, but this is a different issue (see above).

It is a topic worth a discussion whether it is desirable to
have this freedom of input generator in the first place.

> 2.b) uniform_sphere
>
> Our local random number expert has written a paper on this topic,
> and we have used his methods extensively. Thus I realized that
> the implementation of uniform_sphere is incorrect as it is.

No. (At least not in this particular respect.)

> It IS correct if the input random number
> generator produces Gaussian random numbers. Either the documentation
> should be changed or the normal_distribution class used inside
> uniform_sphere.

But it is: There is a data member in uniform_sphere which reads
exactly as you propose:

>>> normal_distribution<base_type, RealType> _rng;

> 2.c) more distributions
>
> I propose to replace uniform_sphere by two classes, uniform_on_sphere
> for points on the surface on a sphere and uniform_in_sphere for points
> anywhere inside a sphere.

This distinction clarifies the current "uniform_sphere" class, which
is indeed a "uniform_on_sphere". I shall rename "uniform_sphere" to
"uniform_on_sphere".

However, as a general remark, I not think it is appropriate to add
substantial portions of new code during or following a review. Instead,
I suggest publishing the library with review *changes* incorporated and
adding the additional portions as a first update, which shall undergo
another review.
Beman, you may want to add this to your "Formal Library Review Process"
document or suggest a different procedure.

> 3.) Performance issues
>
> Even using fast generators we often spend 10%-20% of our CPU time on
> generating random numbers. Using a slow generator like the Mersenne
> twister this grows to more than 95%.

In my tests, Mersenne Twister seemed to be the fastest generator
(see the table in the "introduction" section of the "generators" doc
page). Of course, I've re-implemented it from the pseudo-code and
it's now fully inlined.

Nonetheless, I look forward to seeing your "directly generate
random floating-point number" approach.

> 3.a.) uniform_smallint

> My proposal
> is to change it to change the division by a bit shift >>
> and to use a rejectance step to avoid quantization errors.

I have refrained from using rejectance methods in the library,
because it adds running time uncertainity which may not be
acceptable for real-time or embedded systems. In order to limit
the complexity of the boost submission, alternative implementations
which use rejectance methods but are faster in the general
case (for normal_distribution, for example) have been expressly
removed.

I may reconsider this policy if the review shows support for these
options.
 
> 3.b.) uniform_sphere

There are two issues:
 - How to pass the return value efficiently?
 - The current approach uses a runtime-configurable dimension. Is it
useful to have a compile-time configurable dimension (i.e. template
parameter) as well? If so, we could specialize it for 3 dimensions
and unroll loops and/or use faster methods.

> template<class UniformRandomNumberGenerator, class RealType>
> std::vector<RealType>
> uniform_sphere<UniformRandomNumberGenerator, RealType>::operator()()
> {
> result_type result(_dim);
> //....
> return result;
> }
>
> might make sense but is extremely inefficient.

Not really. For compilers with advanced return value optimization,
the return type will be constructed in the caller's stack frame.

> iii) this is the solution I prefer for large dimensions.
> Provide an extra data member of type std::vectory<RealType>,
> have the operator() write into that data member and
> have it return a const reference to this data member.

I like this approach. It avoids copying and dynamic memory
management, if the caller is co-operative.

> of course one need not return a std::vector but could
> provide the output container as an additional template
> parameter.

Without delving into any std::vector vs. std::deque wars, this
seems to be a usable generalization.

> 3.c.) floating point generators

Please send them in.

> 4.) Quality
>
> 4.a) uniform_int
>
> This class provides random integers with more bits
> than the base random number generatot by concatenating
> several unform random integers. As in LCGs the low bits are
> usually bad it is useless to concatenate it with another
> random number. Instead of supporting this bad habit by a
> library in the class I would remove it and rather encourage
> the use of generators with more bits.

I do not quite understand. Is it a bad habit to use LCGs
(seconded, but they're well-known and easy to understand) or
is it a bad habit to have the concatenating uniform_int?
The latter still seems to make sense for non-braindamaged
base generators. Sometimes, the orthogonality of the library
(generators and distributions are orthogonal) hurts slightly.

> Anyways, at least a strong
> warning should be written into the documentation.

Consider it done.

> 4.b) mersenne_twister
> Since the Merseene Twister failed the random walk test badly
> we do not use it anymore.

Was that with my (possibly buggy) implementation or with the
reference implementation?

> 4.c.) tests of the generators
> We now plan to also test all the
> generators in this library and to publish the results once the tests
> are finished.

It would be tremendously useful to include a test suite with the
boost random number library.

Thank you for your comments!

Jens Maurer


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk