|
Boost : |
From: jeetsukumaran_at_[hidden]
Date: 1999-11-05 14:45:19
I got your e-mail, but your ISP bounced my reply -- both to your
original address an your @boost.org address, so I'm posting my reply
here. Apologies to people who might get offended.
Hello Beman:
1. Regarding the copyright
Short answer: no problem, as long as it does not violate the license of
the
original C source of Nishimura.
I am a firm believer that high-quality, peer-reviewed, freely-available
open-source code SHOULD be the way of future programming. The only
reasons
I placed MersenneTwister under the GNU GPL:
(a) the original C source code was under the GNU GPL. My reading of the
license led me to think that any work based on work released under this
license needs to be licensed under it too.
(b) I was aware that I needed to copyright my code, and provide a
license
under which it can be used. Legalese, capitalism, international
copyright
law etc. are not my forte. In fact I am utterly ignorant on this score.
The GNU "copyleft" seemed to provide a ready-made solution that was
reasonably acceptable to me (i.e. I thought it DID mean freely-available
open-source code).
I have no problems placing MersenneTwister, or any other code I write
for
the boost library, under whatever license-scheme is suitable and/or
necessary. I am not aware if code can be copyrighted and placed in the
public-domain at the same time (I thought it couldn't) ... but if you
think
so, or if you have any other suggestions, I'd be happy to hear and
consider
it.
That takes care of (b). (a), however, is a bigger problem. Did I
misinterpret the GNU license? Would it be possible for me to have
released
the MersenneTwister code under whatever license I saw fit? The core
algorithm, i.e. MT19937B, is, of course, identical in both
implementations.
At the same time, the simple fact that MersenneTwister is implementated
as a
class in C++, whereas the original was a C function, means that the two
are
radically different animals.
I leave it to you to figure out the copyright complexities, since you
probably know better :). I wrote the code myself, at home, on my own
time,
using Nishimura's original source code as a guide. If I have the full
freedom to release the code under whatever license that I like, then I
shall
unhesitatingly do so. I've attached Nishimura's C source code for you
to
compare.
Basically, if I can legally re-release the code under the HP or SGI
copyright scheme, or release a different version of it under that
scheme, I
will. I will see if I can find a sample license, or maybe you could
e-mail
me a suitable one to use as a template?
2. The Interface
I really like the idea of a standardized interface for random number
generators (and, for that matter, for any other classes that can be
grouped -- e.g. streams, encryption and so on).
I believe gnu has this common base class that all their rng's derived
from,
as one of your suggestions. This seems like a good solution -- it is
not
some heavy-handed externally-enforced convention, but an internal one
enforced by the mechanisms of the language itself: if you want your prng
class to conform and be used as easily and interchangably as any prng
class
in the library, you need to fit your class in the hiearchy, and to do
so,
you need to provide, at a minimum, methods which implement or override
methods in the base class.
Personally, though, I am leaning toward your second suggestion -- a
template
parameterized on the prng used. The actual prng classes would be (at a
minimum) simple function objects, returning a 32-bit random integer when
their operator() is invoked. The template will provide the various
random
number request protocols, and manipulate the return value from the prng
object to fulfill these requests (e.g. integers, reals, integers within
a
certain range, reals within a certain range and so on). This seems not
only
elegant, but a much more flexible option -- if later, a new random
number
request protocol needs to be added, it can be added to the template,
and the
extant functors need not be changed.
That still leaves the question of who and how to decide what would be a
good
interface -- and I know this might work out to be a sticky point. The
set
of methods I provided with MersenneTwister seemed to me, to quote Meyer:
"minimal but complete":
- I deliberately left out some methods like getInteger(a, b), where the
return value is between a and b -- I figured that I, or anyone else,
could
simply use a + getInteger(n), where n=b-a, instead.
- At the same time, I felt that I needed two real-returning methods,
one
for the range of [0,1] and the other [-1,1]. More than any other, even
more
than getInteger(), I use both of these a LOT.
- The checkProbability(p) is a matter of debate, and I finally decided
to
leave it in -- decision-making circumstances arise frequently enough,
and in
client code it was both simpler as well as very nicely readable to
write "if
(mt.checkProbability(0.5)) { }" than "if (mt.getReal() < 0.5) { }".
- the member template function I included was an experiment. I was
thinking -- maybe I need a random character? Or a random signed integer
etc.. Of course, all the computation to convert the 32-bit integer
random
value to whatever type needed by the client code could be done by the
client
code itself, but since I was writing a class ... why not provide the
method?
And so, I felt that the template function provided a way to cater to
all my
"maybe's" without creating an obese interface.
3. Integer types
I had no idea that cstdint.hpp existed! Well I do now, and I cannot see
any
real reason not to use them. I don't suppose there is a similar set of
standard typedefs for real types, are there? I generally like
typedef'ing
all my built-in types -- it makes maintenance and experimentation a
whole
lot easier.
4. Naming convention
Now THIS might be a bit painful :). But, while I'm quite attached to
the
BiCapital and camelCase conventions, I really don't see any reason why I
cannot write in the all_lower_case_with_underscore style of the
standard C++
library. So yes, I'm willing to rewrite MersenneTwister to be
"mersenne_twister" (or "m_twist" or "mt_19937b" or whatever). Same
goes for
the methods and so on. I presume that this only extends to the
interface --
I am free to retain the trailing underscore for instance variables?
Speaking of names, I used "getInteger" for lack of anything better.
I'm not
entirely happy with it since "get" is traditionally used for retrieving
an
object's properties -- which is not quite what it is happening here.
Furthurmore, I am not happy at all with "getNoise," but could not find a
satisfactory way to distinguishing between something that returns a
real in
[0,1] and something that returns a real in [-1,1]. Perhaps
gen_univariate_11 and gen_univariate_01 (with gen_univariate_i for the
integer)? I don't know -- that is for the person or persons who design
the
template to decide!
As a side-note, regarding prng's in particular, I should note that I am
not
very qualified in this field. It would help if there were people who
were
who could review/make suggestions, especially with regards to
optimization
techniques that don't impact the accuracy or randomness of the
generator. I
believe dependibility and accuracy is of paramount importance, more so
than
speed, and thus, when faced with a choice, not being informed enough to
decide whether I could get more speed and yet retain the uniform random
distribution, took the safe, if more clumsy way. For example, a common
practice is to take the modulus of the randomly-generated integer to
fit it
within a range (genint() % 100 to get a number between 0 and 99). This
is
an extremely BAD thing to do with a linear congruential prng, which are
usually weak in the low bits, and consequently have their uniform random
properties destroyed (or so I've read -- see Numerical Recipes).
However,
is this practice acceptable with other prng methods? More to the point
here, is the MT19937B Mersenne Twister a linear congruential generator
(I
really don't know!! -- shows you how little I understand what's going
on)?
Is the Mersenne Twister's (EXCELLENT) psuedo-random number generation
properties adversely affected if we take the modulus of its output? In
my
implementation, I took the safe route, and went for division instead --
this
neccessiated some hackery to keep the numbers within the desired range.
Then there is a question of generating random numbers with specific
distributions -- e.g. gaussian, poisson etc. These all use uniform
deviate
generators as a basic component, and it would be a good idea to design
the
library's interface and/or inheritance hierarchy (if any) to take this
into
accout. Again, I am out of depth here, and have to leave it to wiser
souls
to figure it out.
Anyway, I am very keen to contribute to boost. I think a standardized,
peer-reviewed, free code repository for portable C++ solutions is a
FANTASTIC idea. I am willing to work along with whatever standards,
conventions and requirements that are in place (or if unhappy with them
-- suggest changes!) to help the effort. I hope to hear from you soon. -- jeet
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk