Boost :

From: Paul A Bristow (pbristow_at_[hidden])
Date: 2006-07-11 04:38:58

| -----Original Message-----
| From: boost-bounces_at_[hidden]
| [mailto:boost-bounces_at_[hidden]] On Behalf Of Deane Yang
| Sent: 10 July 2006 21:41
| To: boost_at_[hidden]
| Subject: Re: [boost] [math/staticstics/design] How best to
| name statisticalfunctions?
|
| Paul A Bristow wrote:
| > | -----Original Message-----
| > | From: Kevin Lynch
|
| > | Why not hide the functions behind a class interface?
| After all, the
| > | various functions are "properties" of the distributions. Hence:
| > |
| > | class students_t {
| > | students_t(double mu);
| > | double P(double x);
| > | double Q(double x);
| > | double invP(double p); (or perhaps inverseP or Pinv or
| > | something)
| > | .....
| > | }
| > |
| > | class normal {
| > | normal(double mu, double sigma);
| > | double P(double x);
| > | double Q(double x);
| > | double invP(double x);
| > | ......
| > | }
| >
| > Rather interesting idea.
|
| I support Kevin's proposal rather strongly for exactly the
| reasons he
| states. But I'm not sure what P, Q, invP mean. I would prefer:
|
| double density(double x);
| double cumulative(double x);
| double inverse_cumulative(double y);
|
| > How would you envisage this working with Fisher, for
| example which has
| > degrees of freedom 1 and 2, and a variance ratio.
| >
| > Is this a 1D or 2D or 3D?
| >
| > Its inversion will return df1 (given df2 and F and Probability)
| > or df2 (given df1, F and Prob)
| > or F (given Df1 and df2 and Prob)
| >
| > WOuld you like to flesh out how you suggest handling all these?
| >
|
| Could you clarify your question? Isn't the F distribution still the
| probability distribution of a single real random variable? The
| cumulative and inverse cumulative density functions have a
| consistent mathematical meaning for any 1-dimensional probability
| distribution, do they not?

Well, if you regard the degrees of freedom as fixed, or the probability as
fixed, often 95%,

then yes,

but, I would say that they are 2D (and others 3D) distributions.

To keep it simpler, lets go back to the students t which I have
implemented (actually templates but ignore that for now) as

double students_t(double degrees_of_freedom, double t)

t is roughly a measure of difference between two things (means for example)

this returns the probability that the things are different.

If degrees_of_freedom are small (you only measured 3 times, say),

then t can be big, but it still doesn't mean much.

But if you made a 100 measurements, it probably does.

When you do the inverse, you may want to say, I want to be 95% confident,
and I already have fixed the degrees_of_freedom, so what is the
corresponding
value for t. This is what the ubiquitous styudent's t tables do.

On the other hand, sometimes you may decide you want 95% confidence, and you
have already made some measurements of t, but you want to know how many
(more probably) measurements (degrees_of_freedom) you would have to make to
get this 95%.

This is common problem - and often reveals in drug trials, for example, that
there are not enough potential patients available to carry out a trial and
achieve a 95% probability.

If you accept this, then the problem is how to name the two, or three
'inverses' (and complements).

students_t_inv_t and students_t_inv_df ???

Paul

PS I also worry about the risk of code bloat. At present, I think that you
don't pay for what you don't use. We certainly don't want all the possible
functions discussed above instantiated, even for one floating-point type, if
only one function is actually used.

```---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow_at_[hidden]

```