 # Boost :

Date: 2008-05-06 06:22:14

Johan Råde wrote:
>
>> This is true of all asymmetric distributions of course, you need to add the
>> two tails calculated separately:
>>
>> cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n)
>>
>> Ah... wait, because it's discrete, that misses out one value from the right
>> tail? So should be:
>>
>>
>> cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n - 1) ???
>
> If the distribution is asymmetric you may want neither total - n nor total - n - 1.
> What you want is
>
> sum/integral of pdf(dist,y) over all y such that pdf(dist,y) <= pdf(dist,x).
>

I consulted some statistics books, and realized that the above statement is complete nonsense.
That is not how you do it, it is in fact more complex than that.
Finding the right cut-off for the other tail in a two-sided test can be tricky.
(In many situations there is an exact answer,
given by a so called UMP (unbiased most powerful) test.
But the formulas that give theses test can be of rather implicit type.)

For symmetric distributions (for instance the t-test),
you just calculate one tail and multiply by two.
For the F-test that does not give the exact answer,
but is known to give a good approximation.
For the Fisher exact test (hypergeometric distribution)
that is often not even a good approximation,
and some other rule should be used to
find the correct cutoff for the other tail.

I'll look some more into this.

--Johan