Boost logo

Boost Users :

Subject: Re: [Boost-users] [random]: several issues
From: Thomas Mang (thomasmang.ng_at_[hidden])
Date: 2009-09-24 09:00:01


Steven Watanabe wrote:
> AMDG
>
> Thomas Mang wrote:
>> Steven Watanabe wrote:
>>> Thomas Mang wrote:
>>>> After some time I came back to the boost random library, and several
>>>> things I noticed earlier have not changed since years ago. I hereby
>>>> ask what the present state of ideas regarding these ideas is.
>>>>
>>>> Here's my list:
>>>>
>>>> a) The library itself provides files for random deviates of
>>>> distributions not given in the documentation (e.g. poisson, gamma)
>>>> etc. I find it truly sad that implementation and documentation is
>>>> that out of synchronisation
>>>>
>>>> b) Several distributions require the engine to return a uniform
>>>> deviate between [0,1), while other's don't have this prerequisite. I
>>>> find this extremely error prone and purely documented (it is
>>>> documented, but it should be so clearer, and especially, louder).
>>>> Worse, I find it even harder (or impossible) to find out which range
>>>> the engines return. I am not sure if there is any engine returning
>>>> this range per se, or if I have always to go through uniform_01.
>>>> Worst of all, there is neither a compile time check nor a runtime
>>>> check if that [0,1) result-requirement of the engine holds (as far
>>>> as I have seen the code) - I absolutely fail to see why such a
>>>> critical assert is completely missing, also given the poor state of
>>>> the documentation.
>>>> The poisson and gamma for example also fall in this category, but
>>>> are not documented at all. I consider this highly dangerous.
>>>
>>> You're not supposed to use the distributions directly.
>>> boost::variate_generator works with any engine and
>>> any distribution.
>>
>> Well, in C++ there are many things one is not supposed to do, but
>> that's not the point here. Does it really hurt that much to implement
>> an assert in the distributions that the random draw was restricted to
>> [0,1) ? (which can be done for the engines provided by boost at
>> compile-time even, at least mostly I think)?
>
> This should be easy. Patches welcome :).
>
> However, the new standard eliminates variate_generator and requires that
> every random number engine work with every distribution. Eventually,
> Boost.Random needs to be brought into line with this.

I am not up-to-date with the proposals of C++0x. Will the random library
become (conceptually) like a part of C++0x ?

>
>> Also if I am not supposed to use the distributions directly, then the
>> interface of them is too public to me.
>
> The interface for distributions needs to be specified so that
> new distributions can be created that plug into framework.
>
>>>> c) Random numbers are tightly linked to statistical distributions,
>>>> offered by the library of it's own. Wouldn't it be convenient to try
>>>> to integrate the whole distribution part of the random numbers more
>>>> closely into that library? Presently they are too confusingly
>>>> standalone.
>>>
>>> Are you referring to the distributions in Boost.Math?
>>
>> Yes, boost.Math/Statistical Distributions.
>>
>> Note that, in general through the inverse of the cdf, a random draw
>> can be obtained from any distribution, just for some distribution the
>> draw is 'particularly' simple.
>> I find it poor from a design point of view to offer random draws of
>> distributions in one library, and distributions without random draws
>> in another library, and have them +- separate (of course I can
>> manually use the inverse of the cdf and a uniform_01 random draw to
>> achieve what I want, but this is less a technical issue, but more a
>> organizatorial issue).
>> My out-of-the-guts intuition is to have the random library only about
>> worrying generating random numbers. How to turn these random numbers
>> into draws from distributions should be something the distribution
>> library worries about. But of course I know the random library has
>> been developed long before the distributions library.
>
> Even for distributions for which there is no simple formula for generating
> a random variate, there are algorithms that are much more efficient than
> using the inverse cdf. Also, random distributions may need to maintain
> state that is not needed for any other use of the distributions. There
> needs
> to be some integration between Boost.Random and Boost.Math, but I'm
> not exactly sure how to go about it.

I am in general not familiar with the details of all the numerical
algorithms of drawing random numbers, but I fully agree that the draws
should be certainly efficient and used whenever applicable. Just keep in
mind there is a major difference of going through the numerical
calculation of the inverse cdf once you got a [0,1) input, or if you get
a [0,1) input and then use any algorithm you like to convert that into
the random draw. You seem to address the former, while I am more
addressing the later, that is a common interface for the input to get a
random draw.

One way I could think of an integration - from a pure design point of
view - is to specify a nested classes within the distribution class
(like fisher_f_distribution<>::random) that kind of substitutes the
distributions in the random library. There could be two nested classes
actually, one that does sort of work as variate_generator, and one that
accepts a [0,1) number as raw input. If there is an efficient algorithm
for the distribution, use it. If not, go through the inverse cdf. And
yeah of course invokation shall be able with both the generators of the
random library (in which case all the information about the range of the
random draws can be used, at compile-time occasionally even), or a
general generator which provides the information of the bounds (either
at run-time or compile-time) and finally a [0,1) draw gotten from
whereever else, but with an assert of course that this precondition was
held.
This approach should not only also ease the integration of random draws
for which the random library is not set up yet (inverse cdf, for
example. I prefer a slightly less efficient implementation over none at
all), but also to have a shared interface for additional distributions
not yet in math.distributions (like mulitvariate normal, von-mises etc).

Is there any interest in helping out regarding that, or develop these
ideas further (if you know there is none, no changes will be accepted, I
  can safe time thinking of that further...) ?

Thomas


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net