Vardan,
Wow, what a fail :( damn. Thanks for pointing that out. Now I'm getting correct timings, it was wrong before.
Gunter,
I am not going to solve big ass FE problems, it is a real time application, and that's why I want the highest possible performance.
As usual, I can't get to compile that code when I include vector_of_vector.h, couldn't find any solution yet. I always get
E:\Boost\boost_1_42_0\boost/numeric/ublas/vector_of_vector.hpp(301):
error C2668: 'boost::numeric::ublas::ref' : ambiguous call to
overloaded function
Dense matrices are performing better than any other, but the problem with them is that they may take way too much memory. Also, during the assembly, the elements of the global stiffness matrix are accessed randomly, because of the local to global mapping of the indices of the nodes of the element, this may lead to cache issues I guess. If I try the opposite way, I will have to access tetrahedrons' randomly too. Oh well...I ran out of ideas now =)
In my original code, it runs a bit slower than that simple sample code. It is x2 times slower, and I think that the operations performed and access patterns are pretty much the same (I did the sample code with that in mind). I don't know exactly why, perhaps cache issues? I couldn't find a profiler for Windows which will profile that. Anyway a x2 speedup is not enough I think :)
x