Boost :

Date view	Thread view	Subject view	Author view

From: Topher Cooper (topher_at_[hidden])
Date: 2006-07-14 09:45:33

Next message: Wang Weiwei: "[boost] [BGL] Graphviz"
Previous message: John Maddock: "Re: [boost] [math/staticstics/design] How besttonamestatisticalfunctions?"
In reply to: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"
Next in thread: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"
Reply: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"

I'm not sure what you are quoting with your first line, but, of
course, there isn't a single inverse for any distribution.

One task in statistics is hypothesis testing. In traditional
statistics to do this you require the inverse cumulative
distribution. Its well and consistently defined for every single
dimensional distribution. It is also used in setting confidence
limits and many other purposes. A numerical statistical package that
doesn't include it is worthless.

We also use the inverse cumulative in setting confidence
bounds. Roughly speaking -- given that the underlying process is
controlled by the following distribution, what x[lb] and x[ub] can I
be 95% certain that any specific single sample will lie between? (A
"single sample" actually could be a particular, single statistic for
a set of samples, you just need the right distribution).

The inverse of the complementary CDF is also useful but trivially
derived from the other. Nice to have both, but not strictly necessary.

*Some* distributions, such as the negative binomial distribution
(note, NOT a function) and the binomial distribution are discrete
distribution. It then is also meaningful to define an inverse for
the PDF, especially a "fuzzy" inverse (how long a string of failures
has a probability of 0.05 of occurring). Of course, in that case the
inverse is not generally a function. Once in a great while, this
might be useful.

We can also take the distributional parameters and stop treating them
like indexes for the family of CDFs and treat them like function
arguments. We can then speak meaningfully about inverses for each of
them. So, given the CDF for the normal distribution we have, lets
say (this is math not any proposal for C++ naming):

CDFz[mu, sigma](x) -> P

becomes

CDFz(x, mu, sigma) -> P

The "standard" inverse CDF is then

CDF'z(p, mu, sigma) -> x

And one of the others is:

CDF'z(x, mu, p) -> sigma

I.e., given that I know a sample was generated from the normal
distribution with mean mu and that the probability that the sample
was greater than a particular precise value, x, is a particular
precise probability, p, then what is the standard deviation, sigma,
for that distribution?

This is an important question algebraically. It allows us to derive
distributions for parameter estimation that we can then use the
inverse cumulative distribution function to give us confidence bounds
for parameters. For example, given a particular sample drawn from
say, a chi-square distribution, what is the distribution of possible
values for the number of degrees of freedom?

There may be situations where a particular distribution applies where
a numerical inversion around a parameter is called for, but I can't
think of any. Can you give me a reasonable scenario where these
inverses around the parameters would be widely used? Lets have a use-case.

I certainly think that after the common structure of the distribution
classes have been put in place it is reasonable to ask what
additional, distribution specific, methods should be added. If you
want to put every formula in the handbooks in, go ahead -- little of
it will ever be used in practice, but it will be there if some
unanticipated need comes up and the user will be able to avoid the
bother of looking up the formula themself. Some kind of naming
convention for some of this distribution specific stuff seems
reasonable. Having read accessors for each distribution parameter
seems like a good idea, for example ("(x -
aNormDist.mu)/aNormDist.sigma" where, in this case aNormDist.mu =
aNormDist.mean and aNormDist.sigma = aNormDist.standardDeviation).

Topher

At 05:05 AM 7/14/2006, you wrote:
>
>THE inverse?
>
>Another quick question - I'm still in partial disambiguation mode.
>
>With the negative binomial distribution function (or are there more than one
>but one is THE Standard one?), which is **THE** inverse?
>
>the one that tells you the number of failures (MathCAD qnbinom & DCDFLIB)
>
>or the one that tells you the success probability? (Cephes, Wikipedia &
>DCDFLIB)
>
>John's response to this question was faintly blasphemous ;-)
>
>Same question with F and chisqr of course...
>
>Both/all of course are potentially useful :-)
>
>(and I feel all should be provided).
>
>Paul
>
>---
>Paul A Bristow
>Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
>+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
>pbristow_at_[hidden]
>
>
>
>
>
>
>
>_______________________________________________
>Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Next message: Wang Weiwei: "[boost] [BGL] Graphviz"
Previous message: John Maddock: "Re: [boost] [math/staticstics/design] How besttonamestatisticalfunctions?"
In reply to: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"
Next in thread: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"
Reply: Paul A Bristow: "Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk