Boost logo

Boost :

From: Paul A Bristow (pbristow_at_[hidden])
Date: 2006-07-10 09:23:39


 

| -----Original Message-----
| From: boost-bounces_at_[hidden]
| [mailto:boost-bounces_at_[hidden]] On Behalf Of Kevin Lynch
| Sent: 09 July 2006 11:49
| To: boost_at_[hidden]
| Subject: Re: [boost] [math/staticstics/design] How best to
| name statistical functions?
|
| John Maddock wrote:
| > Paul Bristow has been toiling away producing some
| statistical functions on
| > top of some of my Math special functions, and we've
| encountered a bit of a
| > naming dilemma that I hope the ever resourceful Boosters
| can solve for us :-)
| Why not hide the functions behind a class interface? After all, the
| various functions are "properties" of the distributions. Hence:
|
| class students_t {
| students_t(double mu);
| double P(double x);
| double Q(double x);
| double invP(double p); (or perhaps inverseP or Pinv or
| something)
| .....
| }
|
| class normal {
| normal(double mu, double sigma);
| double P(double x);
| double Q(double x);
| double invP(double x);
| ......
| }

Rather interesting idea.

|
| This interface has a few major benefits over raw functions:
|
| 1) Since Paul is using your C++ special functions library in the
| implementation, there's no argument on the implementation side for C
| compatibility. Without C compatibility as a driving force, you don't
| need to stick with free functions and the corresponding combinatorial
| explosion of hard to remember names.

Agreed.

| 2) A class interface also lets you carry around data specific to the
| current "in use" distribution in one place, rather than
| needing to stuff
| it into every call (the mean in the case of Student's t, the mean and
| deviation for the Normal, etc).

| 3) This "normalizes" the interface for the calls to the distribution
| functions - every call for "P" has exactly one argument, and
| not two or three or four depending on the distribution in use.

How would you envisage this working with Fisher, for example which has
degrees of freedom 1 and 2, and a variance ratio.

Is this a 1D or 2D or 3D?

Its inversion will return df1 (given df2 and F and Probability)
or df2 (given df1, F and Prob)
or F (given Df1 and df2 and Prob)

WOuld you like to flesh out how you suggest handling all these?

| 4) The consistent interface is of course easier to document,
| teach and learn, and easier to use.

Yes, usability is a major requirement to allow all and sundry to USE this.
 
| You might also want to provide a
| function to obtain the non-cumulative distribution value (perhaps
| operator() or dist() or something).

Yes - most desriable - but this project is getting bigger, day by day ;-)

(as an aside, John has devised a way to avoid bloat caused by the
expectation that one can provide degrees of freedom as an integer OR a
floating-point. Without his meta-magic, a serious downside of a fully
templated version would be instantiation of many variants of functions).
  
| Of course, you would probably templatize and you might want
| to inherit
| from 1D or 2D abstract base classes if you plan to provide
| multidimensional distributions (or maybe not ...) and functions that
| operate on distributions.
|
| In any case, I look forward to the results....

Watch this space...

Paul

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow_at_[hidden]
  

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk