
Boost : 
From: Johan Råde (rade_at_[hidden])
Date: 20080430 01:59:57
John Maddock wrote:
> Johan Råde wrote:
>> A typical data mining scenario might be to calculate the cdf for the
>> t or Fdistribution for
>> each value in an array of say 100,000 single or double precision
>> floating point numbers.
>> (I tend to use double precision.)
>> Anything that could speed up that task would be interesting.
>
> Nod, the question is what the actual combination of arguments that get
> passed to the incomplete beta are: if the data isn't unduely sensitive, what
> would be really useful is to have a log of those values so we can see which
> parts of the implementation are getting hammered the most.
Here is a detailed test case,
that I think is typical of the needs of data mining applications:
Select a t distribution with 10  1000 degrees of freedom,
or an F distribution where the first number of degrees
of freedom is 210 and the second 101000.
1. Preparation  generate test data:
Generate 100000 random numbers with uniform [0,1] distribution.
Apply the quantile function (i.e. inverse of cdf) to each number.
This gives the test data.
2. The test
Apply the cdf or cdf complement to the test data from Step 1.
Is this the kind of suggestion you were asking for?
Johan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk