Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2008-05-06 04:39:15


Johan Råde wrote:
> There is one difficulty with the two-sided Fisher exact test.
>
> To calculate a p-value for the left-sided test,
> you take the cdf of the hypergeometric distribution
> for the observed value of the test statistica.
> I you want to calculate a p-value for the right-sided test,
> then you just take cdf complement for the observed value of the test
> statistica.
>
> But for a two-sided test, well, say that the observed value
> of the test statistica is n. Then you should sum the pdf over all k
> such that pdf(k) <= pdf(n). This means summing over both tails.
> And since the distribution is not symmetric,
> you can not just sum over one tail and multiply by 2,
> as you do with the 2-sided t-test.
>
> I don't see how to do that in a clean way using the current
> statistical distributions API. (Am I missing something?)

This is true of all asymmetric distributions of course, you need to add the
two tails calculated separately:

cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n)

Ah... wait, because it's discrete, that misses out one value from the right
tail? So should be:

cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n - 1) ???

> Maybe some extension to the statistical distributions API is needed.
> Something like cdf(symmetric(dist,x)) for the sum/integral of
> pdf(dist,y)
> over all y such that pdf(dist,y) <= pdf(dist,x).

Hmm, that's a slightly different quantity: I'm not especially familiar with
Fisher's exact test, but from what I've seen there appear to be differences
of opinion on how two sided tests are calculated? For the second "side" you
want to sum the probabilities of all the contingency tables that are "at
least as extreme" as the one you observed but in the other direction. One
way as I've suggested above is to sum all the tables that are as
*asymmetric* as the one you observe, yours is to sum all the tables with
lower or equal *probablity*.

Are you certain you require the latter? I'm sure you are... just double
checking :-)

I don't see any easy way of doing this, except by brute force - or maybe
doing a numeric inversion on the PDF to find the correct right tail test
statistic value, and then using the CDF's as above?

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk