Boost logo

Ublas :

Subject: [ublas] Strange performance hits noticed - help!
From: Sunil Thomas (sgthomas27_at_[hidden])
Date: 2010-05-22 03:47:03


Dear ublas-developers,

    I switched from compressed_matrix storage to mapped_matrix and for a
range of problems; mostly I saw
very good results (whether it was to fill-in or traverse) in terms of
performance improvement (factor of ~200
improvement). And I am talking here about a problem size of about ~2.5
million unknowns approx (still small
for my application..but that is another matter).

    However, for some degenerate cases, for example, with a majority
(~75-80% or more) of the grid consisting of
"inactive" elements, fill-in still takes very less time (same as with other
cases without so many inactive elements),
but traversal blows up suddenly by a factor of ~50 compared to problems
where there aren't so many "inactive"
elements (~1-5% or less). Ofcourse, one solution is to simply not include
"inactive" elements in the solution - a
very valid solution, and I will do that in my experiments to follow..but
this still suprised me a bit that equal sized
problems would have such a huge difference in traversal times just due to a
huge % of inactive elements!!! But,
then again, it maybe due to the fact that this grid has so many disconnected
pieces of elements (like "islands")
that traversal becomes computationally expensive.

*My fill-in code follows something like this (relevant part pasted):*

for (unsigned_int ic = 0; ic < faces.size(); ++ic) {

      //Assemble the stiffness matrix A in Ax = b for the strict interior of
the problem grid from element connectivity

      double coeff = <"some double">; //coefficient that enters matrix

      uic1 = (unsigned int) c_gid1; // global cell id of first cell owning
face

      uic2 = (unsigned int) c_gid2; // global cell id of second cell owning
face

      //Skip innactive connections (physically, there is no flow across
them)

      if( check_active_faces && !active_faces->test(ic) ) {

            continue;

      }

      //Main diagonal elements

      matrix_A(uic1, uic1) += -coeff;

      matrix_A(uic2, uic2) += -coeff;

      //Off-diagonal elements

      matrix_A(uic1, uic2) += coeff;

      matrix_A(uic2, uic1) += coeff;

} // end - loop on connections ic

// Assemble RHS

for (unsigned int i = 0; i < vector_b().size(); ++i) {

      vector_b(i) = 0.0;

      if( check_active_cells && !active_cells->test(i) ) {

            matrix_A(i, i) = -1.0;

            vector_b(i) = 1.0;

      }

}

(skipping boundary conditions...not relevant here)
**
*My traversal code follows something like this (relevant part pasted):*

Allocate cols, vals;

for(itm1 i1 = mat_A.begin1(); i1 != mat_A.end1(); ++i1) {

      int nnz = 0;

      itm2 i2 = i1.begin();

      //Loop over each row's non-zero elements

      for(; i2 != i1.end(); ++i2) {

           cols[nnz] = (int) i2.index2();

           vals[nnz] = *i2;

           ++nnz;

      }

      //Pass along cols, vals to "solver-package"...irrelvant here.. (infact
commented out when noting performance times).

}

For most problems without too many inactive elements, time taken was about
~5-6 seconds.. for the degenerate case,

it was ~250 seconds!!! I am hoping someone can give me a reason why this
happens and if possible, identify something

I am doing wrong? It was recommended that I try generalized_vector_of_vector
- will that choice resolve such issues?

Greatly appreciate whatever help I can get! Once again, thanks a lot in
advance for any useful help/advice...

Thanks,

Sunil Thomas.