On Wed, Apr 21, 2010 at 6:55 PM, xiss burg <xissburg@gmail.com> wrote:
Sunil,

I tried the mapped_matrix, and it performed as bad as compressed_matrix. I can't understand why because everyone says it is faster and I could see it myself in a simple app I wrote, like that sparse fill samples. I can't understand what is wrong with my code. Guys recommended me to use a generalized_vector_of_vector to build the global stiffness matrix and then copy it to the compressed_matrix (which is a good choice for multiplications later), then I got something like this http://codepad.org/1Jwx1Lgb.

Just to explain that code better, t->getGlobalIndex(j) and t->getCorotatedStiffness0(j, k) are just regular gets, nothing special happens inside them. t->computeCorotatedStiffness() is not something computationally expensive, it performs one Gram-Schimdt orthonormalization on a 3x3 matrix and then 32 3x3 matrix multiplies (not using ublas matrices there). No matter what matrix I use for RKR_1 and RK I get the same poor performance. One proof that makes the code slow is that if I comment those two lines where I perform the element wise sum, one of my samples runs 15-20 times faster (in this specific sample, m_tetrahedrons.size()==617). Then, I'm really lost there, I wouldn't like to just throw all my work with ublas away because of this. There must be a solution to this problem.


Xiss,

Are you saying that it's slow even if you use simple matrix<double>? Are you using high enough level of optimization flags to make sure inline code actually gets inlined? Do you have NDEBUG defined? If yes to all the above, then the slowness is not in ublas. Perhaps you should try to profile your application and see where the real bottleneck is. Or come up with a simple enough and self contained example, that we can run and test too.

-Vardan