In my algorithms I have to read/write from/to each individual element ofа a matrix, and this is making my application really slow. More specifically, I'm assembling a stiffness matrix in Finite Element Method. The code is like this:

ааа for(int i=0; i<m_tetrahedrons.size(); ++i)
ааа {
ааа ааа btTetrahedron* t = m_tetrahedrons[i];
ааа ааа t->computeCorotatedStiffness();

ааа ааа for(unsigned int j=0; j<4; ++j)
ааа ааа ааа for(unsigned int k=0; k<4; ++k)
ааа ааа ааа {
ааа ааа ааа ааа unsigned int jj = t->getNodeIndex(j);
ааа ааа ааа ааа unsigned int kk = t->getNodeIndex(k);

ааа ааа ааа ааа for(unsigned int r=0; r<3; ++r)
ааа ааа ааа ааа ааа for(unsigned int s=0; s<3; ++s)
ааа ааа ааа ааа ааа {
ааа ааа ааа ааа ааа ааа m_RKR_1(3*jj+r, 3*kk+s) += t->getCorotatedStiffness0(3*j+r, 3*k+s);
ааа ааа ааа ааа ааа ааа m_RK(3*jj+r, 3*kk+s) += t->getCorotatedStiffness1(3*j+r, 3*k+s);
ааа ааа ааа ааа ааа }
ааа ааа ааа }
ааа }

Where m_RKR_1 and m_RK are both compressed_matrix<float>, t->getCorotatedStiffness0/1 just returns the (i,j) element of a 12x12 compressed_matrix<float>. If I don't compute the co-rotated matrices the simulation still works but incorrectly (linear strain only), but very fast (basically, by commenting the element wise operations in that code). Whenever I turn the co-rotational stuff on, it gets damn slow, and those element wise operations are guilty.

What am I doing wrong? Is there any faster technique to do that? Well, there must be one...