On Thu, Apr 22, 2010 at 2:36 PM, xiss burg <xissburg@gmail.com> wrote:
Vardan,

I tried matrix<float>. In that same sample, with matrix I get 48ms/step no matter I run that element-wise ops section of the code or not. With compressed_matrix I get 25ms/step with the element wise ops and without them I get 2ms/step.


Are you sure you don't have the numbers swapped? matrix should be faster than compressed_matrix.
In any case, 2 times performance penalty of compressed_matrix vs matrix is not surprising. But clearly the bulk of the hit is coming from your code that calculates the values. If you're still in doubt, instead of simply commenting out these lines, just change them to something like:
                       m_RKR_1(3*jj+r, 3*kk+s) += 1.0f;
                       m_RK(3*jj+r, 3*kk+s) += 1.0f;
and see how much difference that makes.
BTW, why why not assign 3*jj+r and 3*kk+s to something and reuse them, so you don't have to recalculate them twice (same for 3*j+r and 3*k+s)?

-Vardan