Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2006-07-08 12:37:28


Paul Bristow has been toiling away producing some statistical functions on
top of some of my Math special functions, and we've encountered a bit of a
naming dilemma that I hope the ever resourceful Boosters can solve for us
:-)

For a given cumulative distribution function (I'm going to use the
students-t function as an example below) there are two (or maybe three)
variations:

P: this is the regular cumulative distribution function, and is a rising
function in it's argument (rises from 0 to 1).

Q: this is 1-P and is also known as the complement of the cumulative
distribution function. It falls from 1 to 0 over the range of it's
argument.

A: this is less well used and is P-Q or 1-2Q depending upon your point of
view.

Naming scheme 1:
~~~~~~~~~~~~~~~~

We have the reasonably obvious:

students_t(df,x) : calculates P
students_t_c(df,x) : calculates Q

However that varies slightly from the existing practice of erf/erfc which if
followed here would lead to:

students_t(df,x) : calculates P
students_tc(df,x) : calculates Q

but the lack of the underscore doesn't look right to me.

Naming Scheme 2:
~~~~~~~~~~~~~~~~

How about we call a spade a spade and use:

students_t_P(df,x) : calculates P
students_t_Q(df,x) : calculates Q

Not pretty, but the P and Q notations are universally used in the
literature, and of course we could handle the A case as well if that was
felt to be needed.

It doesn't follow normal Boost all_lower_case_names either, but since lower
case "p" and "q" have slightly different meanings in the literature (they're
for values of P and Q) I'm less keen on:

students_t_p(df,x) : calculates P
students_t_q(df,x) : calculates Q

Wacky Scheme 3:
~~~~~~~~~~~~~~~

Both of the above suffer from a rather spectacular explosion of function
prototypes once you include every variant for each distribution, an
alternative using named parameters might be:

P(dist=students_t, df=4, x=5.2); // P for 4 degrees freedom and x=5.2
Q(dist=students_t, df=5, x=20.0); // Q for 5 degrees freedom and x=20.0

But of course internally this would have to forward to something like (1) or
(2) so it doesn't actually save you any implementation effort, just reduces
the number of names.

Inverses:
~~~~~~~~~

And if that's not enough, we also have inverses:

* Calculate x given degrees of freedom and P.
* Calculate x given degrees of freedom and Q.
* Calculate degrees of freedom given x and P.
* Calculate degrees of freedom given x and Q.

At present we're looking at something like:

students_t_inv(df,p); // Calculate x given degrees of freedom and P.

But the other variants don't have obvious names under this scheme?

So I'm hoping some Boosters can work their usual naming magic :-)

Many thanks,

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk