Boost logo

Boost :

From: Paul A Bristow (pbristow_at_[hidden])
Date: 2006-07-11 11:02:55


| -----Original Message-----
| From: boost-bounces_at_[hidden]
| [mailto:boost-bounces_at_[hidden]] On Behalf Of Deane Yang
| Sent: 11 July 2006 15:11
| To: boost_at_[hidden]
| Subject: Re: [boost] [math/staticstics/design] How best to
| namestatisticalfunctions?
|
| So let's use the Students T distribution as an example. The
| Students T
| distribution is a *family* of 1-dimensional distributions
| that depend on a single parameter, called "degrees of freedom".

Does the word *family* implies integral degrees of freedom?
Numerically, and perhaps conceptually, it isn't - it's a continuous real.
So could one also regard it as a two parameter function f(t, v) ?
However I don't think this matters here.

| Given a value, say, D,
| for the degrees of freedom, you get a density function p_D and
| integrating it gives you the cumulative density function P_D.

What about the Qs? (complements)

| As I mentioned before, these should be member functions,
| which could be called "density" (also called 'mass')

| and "cumulative".

OHOH many books don't mention either of these words!

The whole nomenclature seems a massive muddle,
with mathematicians, statistics, and users or all sorts using different
terms
and everyone thinks they are the 'Standard' :-(

And the highest priority in my book is the END USERS,
not the professionals.
  
| The cumulative density function is a strictly increasing
| function and
| therefore can be inverted. The inverse function could be called
| "inverse_cumulative", which is a completely unambiguous name.

But excessively long :-(
 
| I would say that these three member functions should be
| common to all
| implemented distributions. Other common member functions
| might include
| "mean", "variance", and possibly others.

Median, mode, variance, skewness, kurtosis are common given, for example:

http://en.wikipedia.org/wiki/Student%27s_t
  
| Finally, you observe that it is often useful to specify the
| cumulative
| probability for a given value of the random variable and
| solve for the
| parameter (the "degrees of freedom" for a Students T
| distribution) that
| determines the distribution. Since each family of
| distributions depends
| on a different set of parameters (for example, normal distributions
| depend on two parameters, the mean and variance), the
| interface for this is trickier to define.

| I can think of two possibilities (I prefer the first):
|
| 1) Define ad hoc inverse functions for each specific
| distribution. So
| for the Students T distribution, you would define a member
| function of the form:
|
| double degrees_of_freedom(double cumulative_probability, double
random_variable) const;

I don't like 2 either, so I have snipped it ;-)

This seems OK to me.

I'd be grateful if you could sketch out how you see the whole Student's t
class would look (just for double and omit the equations of course).
(This will avoid any confusion about what we are talking about).

However:

But I still worried that the whole scheme will lead to much bigger code
compared to a set of names of (template) functions
(because code that isn't in fact used will be generated).
Can anyone advise on this?

It also would seem that the names will be much longer - perhaps
overshadowing the gain in clarity?

Paul

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow_at_[hidden]
 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk