
Boost : 
From: Joerg Walter (jhr.walter_at_[hidden])
Date: 20030208 05:37:03
Hi Johan,
you wrote:
[snip]
> Let's look at just assembly for now. We start by creating an 1e5 x 1e5
> sparse matrix. We then successively insert 1e5 elements in the diagonal.
> What happens is that the insertion speed is linear for the first ~50000
> elements, and then grinds to a halt. The initial insertion speed is
> somewhere at 1e5 elements/s, after 50000 elements it sharply falls to
> 1000 elements/s (to compare, our naive implementation gets 1e6 elements /
> s).
On 32bit systems the upper dimensions for sparse_matrix are 65535 x 65535
due to the internal mapping of (i, j) > (i * size2+j) (or (i + j * size1).
I've added a corresponding check to the debug version.
For larger dimension consider to use compressed_matrix (FORTRAN CRS/CCS
format) or coordinate_matrix (FORTRAN COO format) please. I'm also assuming
you're checking the boost_1_29_0 release, so you'll probably have to look
into the current CVS version for these (many changes).
> This is quite bizarre.
>
> With an 1e6 x 1e6 matrix, the insertion speed is smoother, but slow
> throughout (on the order 1000 elements / s).
>
> I've tested this on two different computers, one dual Athlon and one PII
> laptop, and the result is identical (aside from the absolute speed
> numbers). Memory is not at all full, so that can't be an issue.
>
> The code for this simple test is at the end.
>
> The compiler used was g++2.95 (in Debian) and Boost 1.29 (also Debian).
>
> I've also observed a quadratic time complexity (in the nonzero elements)
> in sparse matrixvector multiplication. I think this has been brought up
> before though.
Using the experimental (means undocumented ;( axpy_prod() functions one
achieves the correct complexity.
> We've also tested MTL, and while it doesn't produce these kinds of wild
> irregularities, the performance is a factor 2 or 3 worse than our naive
> implementation, and the memory usage is a factor 1.52 worse.
>
> This makes me question the claim that genericity does not add overhead (in
> practice). 1020% overhead is acceptable, but when we have 100200%
> overhead, both in performance and memory, then it makes it impossible to
> justify its use.
Alexei Novakov's benchmarks show an overhead of around 100% for the better
ublas sparse matrix vector multiply implementations currently. He proposed
an improvement of sparse matrix iterators already, which we'll tackle
immediately after the upcoming 1_30_0 release.
Thanks for your feedback,
Joerg
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk