Hello there.
I am implementing matrix multiplication and am trying to have my stuff run as fast as possible.
But then I noticed something strange.
Using Xcode,
resultmatrix = prod(matrix1,matrix2);
is 10 to 16 times slower (with 100x100 matrixes) than simply doing something like that:
for(...)
for(...)
for(..)
multiply_things();
And 20 to 40 times slower than a simple threaded approach.
Using visual studio, it's 2 times faster than naive multiplication.
This is on a release build, with
#define BOOST_UBLAS_NDEBUG 1
#define NDEBUG 1
So what gives?