
Boost : 
From: Paul A Bristow (pbristow_at_[hidden])
Date: 20060710 09:23:39
 Original Message
 From: boostbounces_at_[hidden]
 [mailto:boostbounces_at_[hidden]] On Behalf Of Kevin Lynch
 Sent: 09 July 2006 11:49
 To: boost_at_[hidden]
 Subject: Re: [boost] [math/staticstics/design] How best to
 name statistical functions?

 John Maddock wrote:
 > Paul Bristow has been toiling away producing some
 statistical functions on
 > top of some of my Math special functions, and we've
 encountered a bit of a
 > naming dilemma that I hope the ever resourceful Boosters
 can solve for us :)
 Why not hide the functions behind a class interface? After all, the
 various functions are "properties" of the distributions. Hence:

 class students_t {
 students_t(double mu);
 double P(double x);
 double Q(double x);
 double invP(double p); (or perhaps inverseP or Pinv or
 something)
 .....
 }

 class normal {
 normal(double mu, double sigma);
 double P(double x);
 double Q(double x);
 double invP(double x);
 ......
 }
Rather interesting idea.

 This interface has a few major benefits over raw functions:

 1) Since Paul is using your C++ special functions library in the
 implementation, there's no argument on the implementation side for C
 compatibility. Without C compatibility as a driving force, you don't
 need to stick with free functions and the corresponding combinatorial
 explosion of hard to remember names.
Agreed.
 2) A class interface also lets you carry around data specific to the
 current "in use" distribution in one place, rather than
 needing to stuff
 it into every call (the mean in the case of Student's t, the mean and
 deviation for the Normal, etc).
 3) This "normalizes" the interface for the calls to the distribution
 functions  every call for "P" has exactly one argument, and
 not two or three or four depending on the distribution in use.
How would you envisage this working with Fisher, for example which has
degrees of freedom 1 and 2, and a variance ratio.
Is this a 1D or 2D or 3D?
Its inversion will return df1 (given df2 and F and Probability)
or df2 (given df1, F and Prob)
or F (given Df1 and df2 and Prob)
WOuld you like to flesh out how you suggest handling all these?
 4) The consistent interface is of course easier to document,
 teach and learn, and easier to use.
Yes, usability is a major requirement to allow all and sundry to USE this.
 You might also want to provide a
 function to obtain the noncumulative distribution value (perhaps
 operator() or dist() or something).
Yes  most desriable  but this project is getting bigger, day by day ;)
(as an aside, John has devised a way to avoid bloat caused by the
expectation that one can provide degrees of freedom as an integer OR a
floatingpoint. Without his metamagic, a serious downside of a fully
templated version would be instantiation of many variants of functions).
 Of course, you would probably templatize and you might want
 to inherit
 from 1D or 2D abstract base classes if you plan to provide
 multidimensional distributions (or maybe not ...) and functions that
 operate on distributions.

 In any case, I look forward to the results....
Watch this space...
Paul
 Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk