
Boost : 
From: Brook Milligan (brook_at_[hidden])
Date: 20070502 14:09:46
This note is to announce the release of a library for handling
probability and likelihood quantities, together with their logarithms,
in a consistent and (when desired) transparent manner. Ultimately, I
think this could be useful to the Boost community (e.g., it may find
application within the recently reviewed math toolkit or be useful
together with the recently accepted units library, from which it draws
certain ideas). I would however appreciate feedback on that point, as
well as on ways to improve its design, implementation, or
documentation. What follows is an overview, and an indication of a
couple of areas I know I need help with. More information is also
available at
http://biology.nmsu.edu/software/probability
The code may be downloaded at
ftp://biology.nmsu.edu/pub/software/probability/probability0.1.tar.gz
OVERVIEW
As you know, both probabilities and likelihoods recur throughout
statistical models. Consequently, those quantities must be
represented by the type system in computational statistical models.
Commonly, this is done by using a suitable native floating point type
(e.g., double). One specific drawback with this approach is that the
compiler cannot enforce the correct usage of types and confusion may
arise between probabilities/likelihoods and other floating point
types.
More insidious, however, is the confusion likely to arise when the
computations involve both these quantities and their logarithms.
Indeed, the solution to many statistical models involves logarithms of
probabilities and likelihoods more than the natural quantities.
However, models that require both are not uncommon.
If a native floating point type is used for probabilities and
likelihoods, the programmer must not only distinguish those from other
floating point quantities (e.g., parameters), but also must
distinguish them from their logarithms. Subtle mistakes easily result
from forgetting which domain (linear or logarithm) a quantity refers
to.
Much clearer code would result if the following syntax were available
to express the intents that (i) the likelihood should be calculated as
a product across independent observations, and (ii) it should be
accumulated using logarithms to avoid underflow.
probability poisson (unsigned int i); // probability model
log_likelihood l;
// product of likelihoods across a series of independent observations
for (observations::const_iterator i = obs.begin(); i != obs.end(); ++i)
l *= poisson(*i);
The purpose of this library is to encapsulate both probability and
likelihood quantities within an appropriate set of types, while
simultaneously achieving the following design goals.
 Provide both convenient default and more flexible advanced types for
representing all probability and likelihood quantities.
 Maintain type safety between probabilities and likelihoods, and
between their corresponding native quantities and logarithms.
 Incur no runtime cost. That is, all type manipulations should be
performed at compile time.
 Impose no limitations on the type used to represent the value of a
probability or likelihood, beyond the obvious requirement that it
models a real number within a suitable domain.
 Ensure that the layout of arrays and structures of probability and
likelihood quantities is identical to the corresponding layout for
the underlying value type.
 Verify the validity of probability values within the closed domain
[0,1] and likelihood values within the closed domain [0,infinity],
both as native quantities and as logarithms.
 Support the validator concept to allow either complete removal of
the validation checks, thereby eliminating their runtime cost, or
replacement with an alternative.
 Provide all appropriate arithmetic operations, while limiting
implicit type conversions to those absolutely necessary and
naturally expected.
 Provide consistent semantics for the arithmetic operations in terms
of their definitions for native quantities in the linear domain.
One consequence of this is that in many contexts the quantities can
be regarded simply as probabilities or likelihoods without respect
to their representational domain. Another important consequence is
that generic algorithms may be constructed based on a common set of
operators that retain their semantics across the representational
domains.
Given these design goals, when the types provided by the library are
used the compiler can enforce type safety, provide conversions as
required. and guarrantee that the intent is directly expressed in the
source code. Furthermore, any type that models a real number may be
used as the value type for probabilities and likelihoods. If a double
is sufficient, however, the simple default types make the task of
instantiating these quantities easier.
AREAS NEEDING ASSISTANCE
The following are a few items I can identify that I need assistance
with.
 Compiler specifics: This is thoroughly tested with g++ (v3 and v4),
but I need information on its portability.
 The test cases seem to require BOOST_TEST_DONT_PRINT, even though
stream operators are present. Although the current code works, it
seems incorrect to require that macro.
 Some tests cannot use BOOST_CHECK_EQUAL when they otherwise seem
like they should. The problem appears to be an interaction with the
printing of test cases and may be related to the previous item.
 Being new to the Boost community, I am unfamiliar with how to
incorporate the tests and the documentation into the Boost
framework. Guidance here is welcome.
 Anything else that I may have overlooked.
Thanks for your input. I hope this library is close to Boost
standards and can be improved to the point of being a worthwhile
inclusion within the set of libraries.
 Brook Milligan Internet: brook_at_[hidden] Department of Biology New Mexico State University Telephone: (505) 6467980 Las Cruces, New Mexico 88003 U.S.A. FAX: (505) 6465665
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk