
Boost : 
From: Paul A Bristow (pbristow_at_[hidden])
Date: 20060711 11:02:55
 Original Message
 From: boostbounces_at_[hidden]
 [mailto:boostbounces_at_[hidden]] On Behalf Of Deane Yang
 Sent: 11 July 2006 15:11
 To: boost_at_[hidden]
 Subject: Re: [boost] [math/staticstics/design] How best to
 namestatisticalfunctions?

 So let's use the Students T distribution as an example. The
 Students T
 distribution is a *family* of 1dimensional distributions
 that depend on a single parameter, called "degrees of freedom".
Does the word *family* implies integral degrees of freedom?
Numerically, and perhaps conceptually, it isn't  it's a continuous real.
So could one also regard it as a two parameter function f(t, v) ?
However I don't think this matters here.
 Given a value, say, D,
 for the degrees of freedom, you get a density function p_D and
 integrating it gives you the cumulative density function P_D.
What about the Qs? (complements)
 As I mentioned before, these should be member functions,
 which could be called "density" (also called 'mass')
 and "cumulative".
OHOH many books don't mention either of these words!
The whole nomenclature seems a massive muddle,
with mathematicians, statistics, and users or all sorts using different
terms
and everyone thinks they are the 'Standard' :(
And the highest priority in my book is the END USERS,
not the professionals.
 The cumulative density function is a strictly increasing
 function and
 therefore can be inverted. The inverse function could be called
 "inverse_cumulative", which is a completely unambiguous name.
But excessively long :(
 I would say that these three member functions should be
 common to all
 implemented distributions. Other common member functions
 might include
 "mean", "variance", and possibly others.
Median, mode, variance, skewness, kurtosis are common given, for example:
http://en.wikipedia.org/wiki/Student%27s_t
 Finally, you observe that it is often useful to specify the
 cumulative
 probability for a given value of the random variable and
 solve for the
 parameter (the "degrees of freedom" for a Students T
 distribution) that
 determines the distribution. Since each family of
 distributions depends
 on a different set of parameters (for example, normal distributions
 depend on two parameters, the mean and variance), the
 interface for this is trickier to define.
 I can think of two possibilities (I prefer the first):

 1) Define ad hoc inverse functions for each specific
 distribution. So
 for the Students T distribution, you would define a member
 function of the form:

 double degrees_of_freedom(double cumulative_probability, double
random_variable) const;
I don't like 2 either, so I have snipped it ;)
This seems OK to me.
I'd be grateful if you could sketch out how you see the whole Student's t
class would look (just for double and omit the equations of course).
(This will avoid any confusion about what we are talking about).
However:
But I still worried that the whole scheme will lead to much bigger code
compared to a set of names of (template) functions
(because code that isn't in fact used will be generated).
Can anyone advise on this?
It also would seem that the names will be much longer  perhaps
overshadowing the gain in clarity?
Paul
 Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk