Boost logo

Boost :

From: Johan Råde (rade_at_[hidden])
Date: 2008-04-30 01:59:57


John Maddock wrote:
> Johan Råde wrote:
>> A typical data mining scenario might be to calculate the cdf for the
>> t- or F-distribution for
>> each value in an array of say 100,000 single or double precision
>> floating point numbers.
>> (I tend to use double precision.)
>> Anything that could speed up that task would be interesting.
>
> Nod, the question is what the actual combination of arguments that get
> passed to the incomplete beta are: if the data isn't unduely sensitive, what
> would be really useful is to have a log of those values so we can see which
> parts of the implementation are getting hammered the most.

Here is a detailed test case,
that I think is typical of the needs of data mining applications:

Select a t distribution with 10 - 1000 degrees of freedom,
or an F distribution where the first number of degrees
of freedom is 2-10 and the second 10-1000.

1. Preparation - generate test data:

Generate 100000 random numbers with uniform [0,1] distribution.
Apply the quantile function (i.e. inverse of cdf) to each number.
This gives the test data.

2. The test

Apply the cdf or cdf complement to the test data from Step 1.

Is this the kind of suggestion you were asking for?

--Johan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk