|
Boost : |
From: Stephen Nuchia (snuchia_at_[hidden])
Date: 2008-04-14 09:15:38
> The task of evaluating the beta function for each value
> in a container is of course parallelizable. But doing
> that requires parallelized STL algorithms, and has nothing
> to do with the Boost.Math library.
Doing vectorized math fast often requires that the per-value calculation
be explicitly coded to help the compiler with software pipelining or
even manually pipelined. That issue is indeed orthogonal to
multithreading which requires only reentrancy on the part of the
function implementations.
On processors with deep pipelines, multiple execution units and
plentiful registers the difference between monolithic loop body and one
that can be software pipelined can easily be 2:1.
I'm not contradicting the point quoted, just saying that if you want the
container-oriented interfaces to the math functions to be fast it will
require cooperation from the function implementations on most compilers.
Parallelizing the calls to black-box functions will exploit multiple
cores at low cost but you'll be leaving a lot of performance on the
table.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk