Nice one Jörn,
I will check it later and maybe provide some feedback. I did a quick
run and I wonder if you run the tests with optimization flags on (although with them on I get the same speedup but about 10 times faster overall), usually I use:
-O2 -DNDEBUG -msse3 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=1
Also in your test you may prefer:
(double) (clock() - start)/CLOCKS_PER_SEC
(that will give you seconds of execution), instead of straight clock() - start
There are tests in boost-trunk/libs/numeric/ublas/test, but I think they will need a rewrite some time in the future. An assertion over an L-n norm (like you have) would probably be a good candidate for such tests.
Very Very nice speedup overall, looking forward to do some more tests!
> From: j.ungermann@fz-juelich.de
> To: ublas@lists.boost.org
> Date: Wed, 17 Mar 2010 16:42:25 +0100
> Subject: [ublas] faster inplace_solve implementation (triangular.hpp)
>
> Hello all!
>
> Abstract:
> One of my use-cases is the solving of equations-systems defined by
> sparse triangular matrices.
> I found the performance of the inplace_solve routines in triangular.hpp
> lacking for compressed_matrix<row_major> matrices. Upon inspection I
> found that row_major matrices are currently parsed column-wise. Attached
> is a fixed implementation and a test file used by me for testing
> purposes
>
> Details:
> The inplace_solve routine has to cope for 24 different cases:
> a) row_major / column_major
> b) upper / lower
> c) dense / packed / sparse
> d) A * x = b / x * A = b
>
> The case
> row_major / lower_tag / A * x = b is currently mapped to row_major /
> upper_tag / x * A = b
> and the case
> row_major / upper_tag / A * x = b is currently mapped to row_major /
> lower_tag / x * A = b
> This is very inefficient for compressed_matrix types!
> The cases column_major / [lower_tag | upper_tag] / x * A = b are
> seemingly handled, but I can't get it to complile.
> The cases of row_major / [lower_tag | upper_tag] / x * A = b *do* have
> an efficient implementation, which is *not* used to a programming bug,
> i.e. making the choice of the function depending on the storage type of
> the vector and not on the matrix.
>
> I rewrote triangular.hpp to a) handle *all* cases of A * x = b and to b)
> "redirect" the x * A = b cases to the aforementioned ones.
>
> This keeps the code about the same size as it currently is. It costs
> some efficiently for the redirected cases, but previously certain cases
> were also redirected (just different ones). It is trivial to add the
> missing cases, but adds quite a bit of code.
> I am not sure what the "packed_storage_tag" means. It previously was
> handled identically to the unknown storage type, and I did it likewise.
> It can probably be removed. If someone could give me a hint...
>
> I checked the correctness of my implementation using some randomized
> matrices as can be seen in test.cc.
>
> The performance has been increased dramatically for compressed_matrix
> and is similar for the matrix type. the timing was done using clock()
> and only once, which is very unscientific, but serves the purpose. The
> test matrices are 10000x10000 matrices with two non-zero diagonals.
>
> old (dense) new (dense) old (sparse) new (sparse)
> col_low x: 3240000 3240000 10000 10000
> row_low x: 5100000 3720000 2590000 10000
> col_upp x: 3320000 3340000 10000 10000
> row_upp x: 5120000 3750000 2790000 10000
> x col_low: - 4660000 - 10000
> x row_low: 3330000 4060000 3750000 20000
> x col_upp: - 5110000 - 20000
> x row_upp: 3230000 3910000 3750000 10000
>
> with dense=ublas::matrix and sparse=ublas::compressed_matrix
>
> The performance improvement for sparse matrices should justify the
> change. The performance for dense matrices (lower four rows) could be
> improved upon, as it is affected by the proxy introduced when applying
> the trans() operation.
>
> I did not find any testcases, if someone could give me a hint, I would
> implement some simple unit-tests.
>
> I checked out boost-trunk today and based my version on that
> triangular.hpp. If someone could review my version and check it in (or
> discuss with me what to improve) I would be thankful.
>
> Kind regards,
> Joern Ungermann
>
> PS: As the files are too large for the reflector I attached a compressed
> zip file.
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
Hotmail: Trusted email with Microsoft’s powerful SPAM protection. Sign up now.