 # Boost :

From: Kris Thielemans (kris.thielemans_at_[hidden])
Date: 2003-06-04 18:41:01

Hi

I've had a look at random/poisson_distribution. The crucial part there goes

result_type operator()()
{
// TODO: This is O(_mean), but it should be O(log(_mean)) for large
_mean
RealType product = RealType(1);
for(int m = 0; ; ++m) {
product *= _rng();
if(product <= _exp_mean)
return m;
}
}

where _rng is a UniformRandomNumberGenerator and _exp_mean = exp(-_mean).

I must say I don't really understand why this would give you a Poisson
random number. I do think that it is terribly inefficient though as it call
rng potentially a lot of times.

My own code looks rather different. It is based on inverting the comulative
distribution for Poisson statistics. (Also, for large mean I use the usual
'normal' approximation).

Unfortunately, my code does not look like boost::random framework yet (it's
just a function), but I'd be happy to let somebody else to do that for me
:-). Here it is

typedef boost::mt19937 base_generator_type;
// initialize by reproducible seed
static base_generator_type generator(boost::uint32_t(42));
boost::uniform_01<base_generator_type,char> random01(generator);

// Generate a random number according to a Poisson distribution
// with mean mu
int generate_poisson_random(const float mu)
{
// check if mu is large. If so, use the normal distribution
// note: the threshold must be such that exp(threshold) is still a floating point number
if (mu > 60.F)
{
boost::normal_distribution<base_generator_type> randomnormal(generator, mu, sqrt(mu));
const double random = randomnormal();
return random<0 ? 0 :
round(random); // note: this is just floating point rounding
}
else
{
double u = random01();

// prevent problems of n growing too large (or even to infinity)
// when u is very close to 1
if (u>1-1.E-6)
u = 1-1.E-6;

const double upper = exp(mu)*u;
double accum = 1.;
double term = 1.;
int n = 1;

while(accum <upper)
{
accum += (term *= mu/n);
n++;
}

return (n - 1);
}
}

As you see, this calls the random number generator only once. Should be far more efficient I think. Am I wrong?

I realise that one would need a 'proof' that this works. If necessary I can dig up the math. I did also check some histograms with 1 million random numbers or so.

Finally, a good implementation would do something about the more or less arbitrary constants (ok. you could just make them part of the template).

Any reactions/suggestions?

All the best

Kris Thielemans
Imaging Research Solutions Ltd