Boost logo

Ublas :

Subject: Re: [ublas] Matrix multiplication performance
From: Michael Lehn (michael.lehn_at_[hidden])
Date: 2016-01-24 10:51:20


On 24 Jan 2016, at 16:32, Michael Lehn <michael.lehn_at_[hidden]> wrote:

> On 24 Jan 2016, at 16:17, Oswin Krause <Oswin.Krause_at_[hidden]> wrote:
>
>> Hi,
>>
>> the obvious solution to this is to link to the c-cindings (cBLAS, LAPACKE). While BLAS is truly a PITA, most systems have quite sane cBLAS bindings (notable exception is OpenBLAS...) and often it is enough to check whether the system has some libcBLAS.so somewhere in the path.
>>
>> the big issue of having a header only BLAS are the long, long, long compile times.
>
> Two solutions to this:
>
> - precompiled headers
> - from a header only C++BLAS library it is trivial to create a compiled library for the common cases where all elements have the same type.

Also the compile time depends on the optimization level. Maybe somebody else could try this, when I compile my demos with “-O1” it achieves
the same performance. So this would be something new in the C++ world: reduce the optimization level :-)

I never tried, but it seems that at least with gcc you can do this on a file level

        https://gcc.gnu.org/wiki/FunctionSpecificOpt

It also seems that only the two pack-functions need some optimization. So the rest can even go lower.