Boost logo

Boost :

From: Deane Yang (deane_yang_at_[hidden])
Date: 2006-07-11 12:15:08


Paul A Bristow wrote:
> | -----Original Message-----
> | From: boost-bounces_at_[hidden]
> | [mailto:boost-bounces_at_[hidden]] On Behalf Of Deane Yang
> | Sent: 11 July 2006 15:11
> | To: boost_at_[hidden]
> | Subject: Re: [boost] [math/staticstics/design] How best to
> | namestatisticalfunctions?
> |
> | So let's use the Students T distribution as an example. The
> | Students T
> | distribution is a *family* of 1-dimensional distributions
> | that depend on a single parameter, called "degrees of freedom".
>
> Does the word *family* implies integral degrees of freedom?

No. It's a continuous family of distributions, depending on 1 real
parameter.

> Numerically, and perhaps conceptually, it isn't - it's a continuous real.
> So could one also regard it as a two parameter function f(t, v) ?

Yes.

> However I don't think this matters here.

No, it doesn't.

>
> | Given a value, say, D,
> | for the degrees of freedom, you get a density function p_D and
> | integrating it gives you the cumulative density function P_D.
>
> What about the Qs? (complements)

In other words, 1 - P. Right? One response is why do you need to define
it, given how easy it is to get from the cumulative density function?

If not, use a common name for it. Unfortunately, I don't have a good
suggestion.

>
> | As I mentioned before, these should be member functions,
> | which could be called "density" (also called 'mass')
>
> | and "cumulative".
>
> OHOH many books don't mention either of these words!

No? Surely they give the function P a name? I've always seen it referred
to as the cumulative density function (CDF for short).

>
> The whole nomenclature seems a massive muddle,
> with mathematicians, statistics, and users or all sorts using different
> terms
> and everyone thinks they are the 'Standard' :-(
>
> And the highest priority in my book is the END USERS,
> not the professionals.

And that justifies using one-letter names? My highest priority is
designing an interface that facilitates good programming practice by
good programmers.

I am in no way suggesting that the names I've proposed are "standard".
The point is to use names that *are* widely used and do at least suggest
the correct meaning of the functions. If you don't like my suggestions,
please suggest others. But please don't use cryptic abbreviations like
"P", "Q", and "inv_t"; they are no more standard than my suggestions,
and they convey a lot less information.

>
> | The cumulative density function is a strictly increasing
> | function and
> | therefore can be inverted. The inverse function could be called
> | "inverse_cumulative", which is a completely unambiguous name.
>
> But excessively long :-(

Compared to what? I personally am very grateful that I no longer see
short, cryptic function and variable names in code.

>
> I'd be grateful if you could sketch out how you see the whole Student's t
> class would look (just for double and omit the equations of course).
> (This will avoid any confusion about what we are talking about).

Here's a first stab (I'm sure it can be improved):

class StudentsT
{
public:
     explicit StudentsT(double degrees_of_freedom);

     double density(double x) const;
     double cumulative_probability(double x) const;
     double quantile(double probability) const; //Also known as inverse
                                                // cumulative

     double degrees_of_freedom(double quantile, double probability) const;

     //Functions below may return NaN, if undefined
     double mean() const;
     double variance() const;
     double skewness() const;
     double kurtosis() const;
};

>
> However:
>
> But I still worried that the whole scheme will lead to much bigger code
> compared to a set of names of (template) functions
> (because code that isn't in fact used will be generated).
> Can anyone advise on this?

Why exactly do you worry about this? We're just repackaging the same set
of functions. Also, note that by using classes, you can improve the
computational speed, because the class can cache intermediate results
common to the different functions, whereas separate functions need to
recompute everything from scratch each time.

To be honest, you sound like you're more comfortable programming in C
than C++.

>
> It also would seem that the names will be much longer - perhaps
> overshadowing the gain in clarity?

Definitely not for me.

Deane


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk