|
Ublas : |
Subject: Re: [ublas] Matrix multiplication performance
From: Michael Lehn (michael.lehn_at_[hidden])
Date: 2016-01-24 10:51:20
On 24 Jan 2016, at 16:32, Michael Lehn <michael.lehn_at_[hidden]> wrote:
> On 24 Jan 2016, at 16:17, Oswin Krause <Oswin.Krause_at_[hidden]> wrote:
>
>> Hi,
>>
>> the obvious solution to this is to link to the c-cindings (cBLAS, LAPACKE). While BLAS is truly a PITA, most systems have quite sane cBLAS bindings (notable exception is OpenBLAS...) and often it is enough to check whether the system has some libcBLAS.so somewhere in the path.
>>
>> the big issue of having a header only BLAS are the long, long, long compile times.
>
> Two solutions to this:
>
> - precompiled headers
> - from a header only C++BLAS library it is trivial to create a compiled library for the common cases where all elements have the same type.
Also the compile time depends on the optimization level. Maybe somebody else could try this, when I compile my demos with -O1 it achieves
the same performance. So this would be something new in the C++ world: reduce the optimization level :-)
I never tried, but it seems that at least with gcc you can do this on a file level
https://gcc.gnu.org/wiki/FunctionSpecificOpt
It also seems that only the two pack-functions need some optimization. So the rest can even go lower.