Boost logo

Boost :

Subject: Re: [boost] math statistical distribution: multivariate gaussian
From: Stjepan Rajko (stjepan.rajko_at_[hidden])
Date: 2008-11-19 15:51:34


2008/11/19 Thijs van den Berg <thijs_at_[hidden]>:
>
> I see a couple of things that we could start working on, perhaps you have
> other/additional idea's!
>
> * define a name for the sandbox folder & create it.
>

How about sandbox/multivariate_distributions ?

If this is geared towards Boost.Math, we should probably consider that
library's directory structure. Currently, the statistical
distributions files appear to be placed as follows:

include files: ..../boost/include/math/distributions
docs: ..../libs/math/distributions/doc/sf_and_dist
tests: ..../libs/math/distributions/test
examples: ..../libs/math/distributions/example

I'm not sure whether we should re-use all of those directories or
change some (or all) of them.

> * start a doc in that folder where we collect the details: interfaces,
> function, equations, algorithms.
> Can that be done in Latex & compiled pdf ? What would be a good doc format?

I would recommend quickbook, like John suggested. I can set up the
basic files for a starting docs build once we decide on the directory
structure.

> ----
> These first two are probably the best way to start... John Maddock suggest
> starting with docs, I agree with that, that should be covered with these
> first two points! More thing that we will need to do are:
>
> * define a list of generic function for generic multivariate densities (non
> member properties) along the lines of this:
> http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/nmp.html
>

I'd suggest following John's suggestion in starting with the subset of
that list that applies to multivariate distributions. If we start
adding things and this ends up in Boost.Math, then the same things we
add for multivariate distributions should also probably be added for
the univariate distributions (if they apply) for consistency.

> some things that "I" need to implement -as a user- for some other project
> can be seen in this list
> http://www.cs.toronto.edu/~roweis/notes/gaussid.pdf
> ..and there are many more things being used related to multivariate
> Gaussians. E.g. a lot of machine learning project work with multivariate
> Gaussians -they need parameter estimation from data- Some of these things
> might be too specific to add to boost distributions, and could fill up a
> whole "Gaussian lib" in itself! I don't know.

I think parameter estimation from data would be a very useful thing to
add, but if we do we should keep all distributions in mind.

>
> We might also look at other mathematic packages like Matlab, R, Octave to
> see what they do with multivariate distributions.
>
> * using that list, we will see what type of matrix operators we will need,
> and that will allow us think about either between making a dependency to
> ublas & other, or keep it void from external dependencies & implement it
> ourselves.

I think the concept-based way is the way to go. We can let the user
provide the matrix type, as long as it provides the operations we
need. Maybe we can use ublas matrices as the default type if it is
sufficient, since that is already in boost (and header-only), and
maybe test with some other libraries just to make sure we're not
requiring syntax that is too ublas-specific.

Best,

Stjepan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk