|
Boost : |
From: Johan Råde (rade_at_[hidden])
Date: 2008-04-24 10:46:04
John Maddock wrote:
> Johan Råde wrote:
>> Does the project plan include an optimized SSE implementation
>> of the cumulative distribution function
>> for the Student t and Fisher F distributions?
>> That would be very useful for data-mining applications.
>
> Since those ultimately rely on the incomplete beta: and that's one of the
> functions that I hope we'll be focusing on, then yes up to a point :-)
>
> I'm not sure there's all that much scope for SSE optimisation of that
> function (unless we can optimise the infinite series and continued fractions
> used to make use of parrellel evaluation). There are some opportunities for
> task-based (ie OpenMP or similar) parrellelism during the computation, and I
> believe Gautam has some other ideas which might work, but this is all very
> much in the realm of "investigate whether it's worth while" at present.
>
> BTW if you (or anyone else for that matter) have any real world test cases
> that Gautam can look into that would be great - I realise there may well be
> confidentiality issues in some cases - but there's not much we can do about
> this I guess?
A typical data mining scenario might be to calculate the cdf for the t- or F-distribution for
each value in an array of say 100,000 single or double precision floating point numbers.
(I tend to use double precision.)
Anything that could speed up that task would be interesting.
SEE parallelism, if possible, would be interesting.
Multi-core parallelism is less interesting,
that is already easy to do, for instance using OpenMP.
Then there are the issues brought up by Stephen Nuchia:
http://article.gmane.org/gmane.comp.lib.boost.devel/173840
--Johan Råde
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk