Boost logo

Boost :

From: Paul A Bristow (pbristow_at_[hidden])
Date: 2007-08-16 14:14:03

>-----Original Message-----
>From: boost-bounces_at_[hidden]
>[mailto:boost-bounces_at_[hidden]] On Behalf Of Brook Milligan
>Sent: 15 August 2007 22:48
>To: boost_at_[hidden]
>Subject: [boost] [Probability] version 0.2.3 released
>In order to better motivate the need for the Boost Probability
>library, I have updated the documentation, which is accessible at
>Although this constitutes a new release, the only difference is in
>documentation. As a result, the contents of v0.2.2 in the Boost Vault
>still reflect exactly the most recent release and I haven't uploaded a
>new copy.
>The new motivational example is taken from the problem of ascertaining
>the long-term trend of global climate. One database used to assess
>this is available from the NOAA National Climate Data Center
>( It
>contains monthly data for thousands of stations worldwide, in many
>cases for decades. Today's version, for example, contains 590,543
>records of mean temperature. A typical likelihood calculation
>evaluating a model of climate would involve a product of likelihoods
>across all of these records, almost certainly yielding a result on the
>order of 10^{-600,000} or less. Such numbers cannot be handled using
>typical floating point representations, so specialized solutions of
>some form are required. The natural method is to accumulate the sum
>of logarithms of likelihoods, rather than the product of likelihoods,
>across the dataset. This keeps the values within suitable bounds, but
>requires keeping track of the fact that different types of values
>(probabilities, likelihoods, and log likelihoods) are being used
>throughout a typical program. If these are all represented using
>native types, such as double, it is easy to lose track of the fact
>that they have different semantics.
>A real solution of this problem would include modules taking care of
>calculating the probability of each individual data record and modules
>taking care of accumulating that information across the records. The
>problem is complex enough that each of these responsibilities would
>realistically be divided across many units and it would not be
>unreasonable to expect development to be divided among many
>programmers. In such situations it is all too easy to lose track of
>what semantics apply to a specific value when the only information
>available in the code is the data type (e.g., double) which provides
>little help and some (perhaps untrustworthy) comments that may or may
>not be read and in any case cannot affect the compiler.
>Using the Probability library, one can encode the exact semantics
>using the type system in a way that lends itself to generic
>programming. The resulting clarity, safety, and maintainability is
>retained regardless of how large the code base becomes and how the
>operations are distributed across modules and/or programmers.
>As a result of these features, I feel that this library makes a
>significant contribution to solving a well-defined set of problems
>that occur in certain types of scientific programming and modeling. I
>hope you will take a serious look at its capabilities and provide me
>with further feedback. I am especially interested in improving the
>portability of the code and need testers with access to compilers
>other than g++.

Thanks for this further motivational example: it had previously seemed a bit of a sledge hammer to crack a nut, but I now see
situations in which it could provide much more safety than I had imagined.

I am busy trying to help John Maddock get the Math Toolkit 'out of the door' and fully into Boost (1.35?) but I will return to look
at this in more detail.


Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS

Boost list run by bdawes at, gregod at, cpdaniel at, john at