
Boost : 
From: Paul A Bristow (pbristow_at_[hidden])
Date: 20060712 05:11:24
 Original Message
 From: boostbounces_at_[hidden]
 [mailto:boostbounces_at_[hidden]] On Behalf Of John Maddock
 Sent: 11 July 2006 17:26
 To: boost_at_[hidden]
 Subject: Re: [boost] [math/staticstics/design] How
 besttonamestatisticalfunctions?

 >> As I mentioned before, these should be member functions,
 >> which could be called "density" (also called 'mass')

 Or distribution :)
This seems quite clear to me  both density and mass sound too physical to
me,
though they are in common use.
What is important is that the documentation gives ALL the other possible
names.
 >> The inverse function could be called "inverse_cumulative"
 > But excessively long :(
 True, how about "persentile", or is that to ambiguous?
Percentile might be better  it is in the dictionary ;))
But quantile is a more modern term and doesn't raise any questions about
multiplying /dividing by/with 100, a source of unnecessary confusion  as we
have found with Boost.Test.
So I'm strongly in favour of quantile.
But I also wonder if 'fraction' is a possible name?
 >> 1) Define ad hoc inverse functions for each specific
 >> distribution. So
 >> for the Students T distribution, you would define a member
 >> function of the form:
 >>
 >> double degrees_of_freedom(double cumulative_probability, double
 >> random_variable) const;

 That could be a static member function, since we're solving
 for the degrees of freedom parameter.
OK
 It would also be more natural to me for the
 cumulative_probability parameter to come last in the list.
Why? Quantile is also cumulative?
 > But I still worried that the whole scheme will lead to much bigger
 > code compared to a set of names of (template) functions
 > (because code that isn't in fact used will be generated).

 For template classes member functions are only instantiated
 when used, so if
 you only use one member, then that's the only one instantiated.
What that's what I thought  but I wanted expert reassurance before driving
into a deadend ;)
So my worry turns into a killer feature  keeping the cost of calling a
single student's t down to reasonable levels is crucially important.
Compared to linking to a "All_the_stats_functions_you_could_ever_want'.dll
it should be easily 'affordable', as they say.
Which also means that the cost of a Q or complement function is nothing
unless you use it.
(and you probably won't use the P version as well).
>> In other words, 1  P. Right? One response is why do you
>> need to define
>> it, given how easy it is to get from the cumulative density
>> function?
> Perhaps not really needed? Is there an accuracy reason for both?
 It depends how accurate you want to be: calculating 1P incurs
cancellation
 error if P is very near 1, where as for most (all?) distributions we can
 calculate Q directly without the subraction from unity.
 I think the "Boostified" name would be in all lower case: students_t or
whatever.
Agree with this.
Paul
 Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk