From: Stephen Nuchia (snuchia_at_[hidden])
Date: 2008-04-02 13:21:43
In my experience, there are very few instances where parallelism can be
usefully concealed behind a library interface. OpenMP has very high
overhead and will help only for very long-running functions -- at least
under Microsoft's compiler on x86 and x64 platforms. Algorithms that
are likely to be applied to very large datasets could have the OMP
pragmas inserted optionally but they would need to be protected by
#ifdef logic (or given distinct names) because otherwise the overhead
will destroy programs that make more frequent calls on smaller datasets.
In VS2005 a parallel-for structure with a literal 0 enabling flag
parameter adds all the overhead of one that is enabled; you can't use
the enable logic in the OMP syntax to do effective data-size algorithm
selection without replicating the code.
Identifying interfaces that can be usefully altered to more readily
permit applications to exploit parallelism may be harder but is a lot
more likely to pay off. Also consider that, in an application that is
already parallelized, there are no extra cores for the library to use
On a smaller scale, adding "vectorized" and/or "streaming
producer-consumer" interfaces for selected libraries may help a lot by
encouraging use of vector instruction / execution units, unrolling of
loops and improving instruction and data locality.
I must also repeat the age-old, time-tested capital-T Truth about
optimization: if you do something that is not suggested by and validated
against careful analysis of realistic use cases you are wasting your
time. I strongly advise you to not start hacking without solid data.
Gathering real-world use cases into a "library" of performance-oriented
application code and datasets would be, in my opinion, a pretty good
summer's work. Produce a final report that others can later mine for
performance improvement opportunities. Wrap the use-case library up as
a performance regression test suite; plumb the test into the boost
automated testing infrastructure.
Another good use for a library of smaller use cases is as fodder for
"profile guided optimization" offered by many modern compilers. If you
still have time left over you could add support for PGO to boost.build.
For some chips, notably Itanium, PGO makes a very noticeable difference.
Hard to see how that would help with header-only libs though, except by
giving the application programmers an example to go by.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk