|
Ublas : |
Subject: [ublas] Matrix multiplication performance
From: palik imre (imre_palik_at_[hidden])
Date: 2016-01-23 12:53:18
Hi All,
what's next? I mean what is the development process for ublas?
Now we have a C-like implementation that outperforms both the mainline, and the branch version (axpy_prod). What will we do with that?
As far as I see we have the following options:
1) Create a C++ template magic implementation out of it. But for this, at the least we would need compile-time access to the target instruction set. Any idea how to do that?
2) Create a compiled library implementation out of it, and choose the implementation run-time based on the CPU capabilities.
3) Include some good defaults/defines, and hope the user will use them.
4) Don't include it, and do something completely different.
What do you think?
Cheers,
Imre