|
Ublas : |
Subject: [ublas] Strange performance hits noticed - help!
From: Sunil Thomas (sgthomas27_at_[hidden])
Date: 2010-05-22 03:47:03
Dear ublas-developers,
I switched from compressed_matrix storage to mapped_matrix and for a
range of problems; mostly I saw
very good results (whether it was to fill-in or traverse) in terms of
performance improvement (factor of ~200
improvement). And I am talking here about a problem size of about ~2.5
million unknowns approx (still small
for my application..but that is another matter).
However, for some degenerate cases, for example, with a majority
(~75-80% or more) of the grid consisting of
"inactive" elements, fill-in still takes very less time (same as with other
cases without so many inactive elements),
but traversal blows up suddenly by a factor of ~50 compared to problems
where there aren't so many "inactive"
elements (~1-5% or less). Ofcourse, one solution is to simply not include
"inactive" elements in the solution - a
very valid solution, and I will do that in my experiments to follow..but
this still suprised me a bit that equal sized
problems would have such a huge difference in traversal times just due to a
huge % of inactive elements!!! But,
then again, it maybe due to the fact that this grid has so many disconnected
pieces of elements (like "islands")
that traversal becomes computationally expensive.
*My fill-in code follows something like this (relevant part pasted):*
for (unsigned_int ic = 0; ic < faces.size(); ++ic) {
//Assemble the stiffness matrix A in Ax = b for the strict interior of
the problem grid from element connectivity
double coeff = <"some double">; //coefficient that enters matrix
uic1 = (unsigned int) c_gid1; // global cell id of first cell owning
face
uic2 = (unsigned int) c_gid2; // global cell id of second cell owning
face
//Skip innactive connections (physically, there is no flow across
them)
if( check_active_faces && !active_faces->test(ic) ) {
continue;
}
//Main diagonal elements
matrix_A(uic1, uic1) += -coeff;
matrix_A(uic2, uic2) += -coeff;
//Off-diagonal elements
matrix_A(uic1, uic2) += coeff;
matrix_A(uic2, uic1) += coeff;
} // end - loop on connections ic
// Assemble RHS
for (unsigned int i = 0; i < vector_b().size(); ++i) {
vector_b(i) = 0.0;
if( check_active_cells && !active_cells->test(i) ) {
matrix_A(i, i) = -1.0;
vector_b(i) = 1.0;
}
}
(skipping boundary conditions...not relevant here)
**
*My traversal code follows something like this (relevant part pasted):*
Allocate cols, vals;
for(itm1 i1 = mat_A.begin1(); i1 != mat_A.end1(); ++i1) {
int nnz = 0;
itm2 i2 = i1.begin();
//Loop over each row's non-zero elements
for(; i2 != i1.end(); ++i2) {
cols[nnz] = (int) i2.index2();
vals[nnz] = *i2;
++nnz;
}
//Pass along cols, vals to "solver-package"...irrelvant here.. (infact
commented out when noting performance times).
}
For most problems without too many inactive elements, time taken was about
~5-6 seconds.. for the degenerate case,
it was ~250 seconds!!! I am hoping someone can give me a reason why this
happens and if possible, identify something
I am doing wrong? It was recommended that I try generalized_vector_of_vector
- will that choice resolve such issues?
Greatly appreciate whatever help I can get! Once again, thanks a lot in
advance for any useful help/advice...
Thanks,
Sunil Thomas.