Dear ublas-developers,
 
    I switched from compressed_matrix storage to mapped_matrix and for a range of problems; mostly I saw 
very good results (whether it was to fill-in or traverse) in terms of performance improvement (factor of ~200
improvement). And I am talking here about a problem size of about ~2.5 million unknowns approx (still small
for my application..but that is another matter).
 
    However, for some degenerate cases, for example, with a majority (~75-80% or more) of the grid consisting of
"inactive" elements, fill-in still takes very less time (same as with other cases without so many inactive elements),
but traversal blows up suddenly by a factor of ~50 compared to problems where there aren't so many "inactive"
elements (~1-5% or less). Ofcourse, one solution is to simply not include "inactive" elements in the solution - a
very valid solution, and I will do that in my experiments to follow..but this still suprised me a bit that equal sized 
problems would have such a huge difference in traversal times just due to a huge % of inactive elements!!! But,
then again, it maybe due to the fact that this grid has so many disconnected pieces of elements (like "islands")
that traversal becomes computationally expensive.
 
My fill-in code follows something like this (relevant part pasted):
 
for (unsigned_int ic = 0; ic < faces.size(); ++ic) {

      //Assemble the stiffness matrix A in Ax = b for the strict interior of the problem grid from element connectivity

      double coeff = <"some double">; //coefficient that enters matrix 

      uic1 = (

unsigned int) c_gid1;  // global cell id of first cell owning face

      uic2 = (

unsigned int) c_gid2;  // global cell id of second cell owning face

      //Skip innactive connections (physically, there is no flow across them)

      if( check_active_faces && !active_faces->test(ic) ) {

            continue;

      }

      //Main diagonal elements

      matrix_A(uic1, uic1) += -coeff;

      matrix_A(uic2, uic2) += -coeff;

      //Off-diagonal elements

      matrix_A(uic1, uic2) += coeff;

      matrix_A(uic2, uic1) += coeff;

}

// end - loop on connections ic

// Assemble RHS

for (unsigned int i = 0; i < vector_b().size(); ++i) {

      vector_b(i) = 0.0;

      if( check_active_cells && !active_cells->test(i) ) {

            matrix_A(i, i) = -1.0;

            vector_b(i) = 1.0;

      }

}

(skipping boundary conditions...not relevant here)

 
My traversal code follows something like this (relevant part pasted):
 
Allocate cols, vals;
 
for(itm1 i1 = mat_A.begin1(); i1 != mat_A.end1(); ++i1) {

      int nnz = 0;

      itm2 i2 = i1.begin();

      //Loop over each row's non-zero elements

      for(; i2 != i1.end(); ++i2) {

           cols[nnz] = (

int) i2.index2();

           vals[nnz] = *i2;

           ++nnz;

      }

      //Pass along cols, vals to "solver-package"...irrelvant here.. (infact commented out when noting performance times).

}

For most problems without too many inactive elements, time taken was about ~5-6 seconds.. for the degenerate case,

it was ~250 seconds!!!  I am hoping someone can give me a reason why this happens and if possible, identify something

I am doing wrong? It was recommended that I try generalized_vector_of_vector - will that choice resolve such issues?

Greatly appreciate whatever help I can get! Once again, thanks a lot in advance for any useful help/advice...

Thanks,

Sunil Thomas.