Hello there.

I am implementing matrix multiplication and am trying to have my stuff run as fast as possible.
But then I noticed something strange. 
Using Xcode,

resultmatrix = prod(matrix1,matrix2);

is 10 to 16 times slower (with 100x100 matrixes) than simply doing something like that: 


for(...)
for(...)
for(..)
multiply_things();

And 20 to 40 times slower than a simple threaded approach.


Using visual studio, it's 2 times faster than naive multiplication.


This is on a release build, with 
#define BOOST_UBLAS_NDEBUG 1
#define NDEBUG 1

So what gives?