
Boost : 
From: Paul A Bristow (pbristow_at_[hidden])
Date: 20060712 05:11:24
 Original Message
 From: boostbounces_at_[hidden]
 [mailto:boostbounces_at_[hidden]] On Behalf Of Topher Cooper
 Sent: 11 July 2006 17:32
 To: boost_at_[hidden]
 Subject: Re: [boost] [math/staticstics/design] How best to
 namestatisticalfunctions?

 At 11:02 AM 7/11/2006, Paul A Bristow wrote:


 > So let's use the Students T distribution as an example. The
 > Students T
 > distribution is a *family* of 1dimensional distributions
 > that depend on a single parameter, called "degrees of freedom".
 >
 >Does the word *family* implies integral degrees of freedom?

 No, a "family of distributions" does not imply that the parameters
 are integral. What is frequently referred to as *the* normal
 distribution is also a family parameterized by the mean and standard
 deviation. Transformation between members of the family is so easy
 that we generally transform everything into and from one member of
 the family the "standard normal" distribution.

 Keep in mind that a distribution is not a function, although it is
 associated with several functions or functionlike entities.

 Standard usage is to consider the distributions in the family to be
 indexed by parameters and therefore the associated functions to be
 indexed, single parameter functions. There isn't much difference
 mathematically, though, between p[mu, sigma](x) and p(mu, sigma, x)
 (even when the indexes *are* integral), and sometimes it is
 useful to reframe them in that way. The point is, that is a
 reframing, and the
 standard (no, I am not imagining that it is standard) usage is to
 treat singledimensional distributions as being singledimensional.
Thanks, I think I understand better now.
 >And the highest priority in my book is the END USERS,
 >not the professionals.

 Exactly  the professionals are aware of the nonstandard
 usage. Lets give the end users a chance of being able to use what
 they learned in their high school stat class.
My main objective :))
 . Other common member functions might include
 > "mean", "variance", and possibly others.
 >
 >Median, mode, variance, skewness, kurtosis are common
 given, for example:
 >
 >http://en.wikipedia.org/wiki/Student%27s_t

 Skewness and kurtosis are generally defined but rarely used for
 distributions. Their computation on small or even moderate samples
 tends to be rather unstable, so comparison to the ideal
 distributions
 isn't terribly useful. I wouldn't bother with them. Mode is not
 uniquely defined for many distributions, nor is it that
 commonly used
 (even if the references give a formula) in practice for unimodal
 distributions. Except for some specialized uses, these are more
 useful for theory than for computation  more algebraic
 than numerical.

 There are a lot of other possible associated functions, such as
 general quantiles or various confidence intervals, but I don't think
 many of them have general enough use to bother with for all
 distributions. People who need it could use the distribution as a
 template parameter. The only exception I would suggest would be to
 include the convenience of the standard deviation as well as the
 variance. One might stick in RNG here but that is redundant
 at this point.
 As to naming of the probability functions:

 My personal preference would be to use what is probably the most
 common abbreviations for the basic functions. They are simple,
 compact and standard. Maybe a little obscure for those who
 only took
 statistics in high school or some who only know cookbook statistics
  but that is what documentation is for. The ignorant are
 after all
 ignorant whatever choice is made, but you can do something about it
 by using the standard terms:

 dist.pdf(x)  Probability Density Function, this is what looks like
 a "bell shaped curve" for a normal distribution, for
 example. A.k.a. "p"
 dist.cdf(x)  Cumulative Distribution Function. P
 dist.ccdf(x)  Complementary Cumulative Distribution Function;
 ccdf(x) = 1  cdf(x)
 dist.icdf(p)  Inverse Cumulative Distribution Function: P';
 icdf(cdf(x)) = x and vice versa
 dist.iccdf(p)  Inverse Complementary Cumulative Distribution
 Function; iccdf(p) = icdf(1p); iccdf(ccdf(x)) = x
My instinct is that these are too abbreviated, despite their logicalness.
But this is the key problem  being clear, not curt, and yet concise.
students_t.inverse_complement_cumulative_probability certains fails! ;))
so we a getting to:
template <T> // T an integral or real or floatingpoint type.
T distribution(T x) const; // Probability Density Function or pdf or p
T cumulative_probability(T x) const; // Cumulative Distribution
Function. P
cumulative_probability is too long :(
Do we REALLY need the cumulative here?
T probability(T x) const; // Cumulative Distribution Function or cdf or
P
T quantile(T probability) const; // Also known as Inverse cumulative
Distribution Function
what do we call
T complementary_cumulative_probability(T x) const; // Complementary
Cumulative Distribution Function. Q
??? :((
and worse what about Inverse Complementary Cumulative Distribution
complementary_quantile??? :((
and the ad hoc 'extra's
static T degrees_of_freedom(T quantile, T probability) const;
So I feel we haven't QUITE got there yet.
But many thanks for your help so far.
Paul
 Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow_at_[hidden] PS Since everybody obviously knows far more about stats that I do, can you also suggest fully worked examples that can be used to demonstrate usage in a tutorial. I'm especailly keen to show how superior using this would be to the traditional tables and fixed 95% confidence limits.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk