 # Boost :

From: Deane Yang (deane_yang_at_[hidden])
Date: 2006-07-11 10:11:14

Paul A Bristow wrote:
>
> Well, if you regard the degrees of freedom as fixed, or the probability as
> fixed, often 95%,
>
> then yes,
>
> but, I would say that they are 2D (and others 3D) distributions.
>
> To keep it simpler, lets go back to the students t which I have
> implemented (actually templates but ignore that for now) as
>
> double students_t(double degrees_of_freedom, double t)
>
> t is roughly a measure of difference between two things (means for example)
>
> this returns the probability that the things are different.
>
> If degrees_of_freedom are small (you only measured 3 times, say),
>
> then t can be big, but it still doesn't mean much.
>
> But if you made a 100 measurements, it probably does.
>
> When you do the inverse, you may want to say, I want to be 95% confident,
> and I already have fixed the degrees_of_freedom, so what is the
> corresponding
> value for t. This is what the ubiquitous styudent's t tables do.
>
> On the other hand, sometimes you may decide you want 95% confidence, and you
> have already made some measurements of t, but you want to know how many
> (more probably) measurements (degrees_of_freedom) you would have to make to
> get this 95%.
>
> This is common problem - and often reveals in drug trials, for example, that
> there are not enough potential patients available to carry out a trial and
> achieve a 95% probability.
>
> If you accept this, then the problem is how to name the two, or three
> 'inverses' (and complements).
>
> students_t_inv_t and students_t_inv_df ???
>

I think you're confusing *the* inverse cumulative distribution function
with other possible inverse functions that can be defined for each
specific distribution. This is why I really dislike a name like
"students_t_inv_t", which tells me very little about what it is.

So let's use the Students T distribution as an example. The Students T
distribution is a *family* of 1-dimensional distributions that depend on
a single parameter, called "degrees of freedom". Given a value, say, D,
for the degrees of freedom, you get a density function p_D and
integrating it gives you the cumulative density function P_D.

As I mentioned before, these should be member functions, which could be
called "density" and "cumulative".

The cumulative density function is a strictly increasing function and
therefore can be inverted. The inverse function could be called
"inverse_cumulative", which is a completely unambiguous name.

I would say that these three member functions should be common to all
implemented distributions. Other common member functions might include
"mean", "variance", and possibly others.

Finally, you observe that it is often useful to specify the cumulative
probability for a given value of the random variable and solve for the
parameter (the "degrees of freedom" for a Students T distribution) that
determines the distribution. Since each family of distributions depends
on a different set of parameters (for example, normal distributions
depend on two parameters, the mean and variance), the interface for this
is trickier to define. I can think of two possibilities (I prefer the
first):

1) Define ad hoc inverse functions for each specific distribution. So
for the Students T distribution, you would define a member function of
the form:

double degrees_of_freedom(double cumulative_probability, double
random_variable) const;

2) Always specify distribution parameters (other than the random
variable itself) in the constructor using a tuple (a 1-tuple for the
Students T and a 2-tuple for the normal). You could then define
templated inverse functions:

template <unsigned int index>
double inverse(double cumulative probability, double random_variable) const;

Each function would hold all other parameters fixed (as set by the
constructor) and solve for the parameter specified by the index.

(I don't like using tuples as an input type, because it means I always
have to be very careful about the order of the parameters.)

Deane