From: Jack Nguyen (bluekite2000_at_[hidden])
Date: 2005-06-07 16:14:01
I have intel xeon 1.7ghz w/ 3 level cache and have g++ on -O4 w/
-DNDEBUG. My matrices are 1000x1000 squares filled w/ random complex
float values. A,C are row-major. B is column-major.
Ublas takes around 17s for the folllowing matrix multiplication methods:
- noalias(C)=block_prod< matrix<std::complex<float>, row_major >, 64> (A, B);
A C implementation that calls blas3 takes around 1.74s
What is going on?