On 24 Jan 2016, at 16:32, Michael Lehn <michael.lehn@uni-ulm.de> wrote:

On 24 Jan 2016, at 16:17, Oswin Krause <Oswin.Krause@ruhr-uni-bochum.de> wrote:

Hi,

the obvious solution to this is to link to the c-cindings (cBLAS, LAPACKE). While BLAS is truly a PITA, most systems have quite sane cBLAS bindings (notable exception is OpenBLAS...) and often it is enough to check whether the system has some libcBLAS.so somewhere in the path.

the big issue of having a header only BLAS are the long, long, long compile times.

Two solutions to this:

- precompiled headers
- from a header only C++BLAS library it is trivial to create a compiled library for the common cases where all elements have the same type.

Also the compile time depends on the optimization level.  Maybe somebody else could try this, when I compile my demos with “-O1” it achieves
the same performance.  So this would be something new in the C++ world: reduce the optimization level :-)

I never tried, but it seems that at least with gcc you can do this on a file level

https://gcc.gnu.org/wiki/FunctionSpecificOpt

It also seems that only the two pack-functions need some optimization.  So the rest can even go lower.