|
Ublas : |
Subject: [ublas] faster inplace_solve implementation (triangular.hpp)
From: Jörn Ungermann (j.ungermann_at_[hidden])
Date: 2010-03-17 11:42:25
Hello all!
Abstract:
One of my use-cases is the solving of equations-systems defined by
sparse triangular matrices.
I found the performance of the inplace_solve routines in triangular.hpp
lacking for compressed_matrix<row_major> matrices. Upon inspection I
found that row_major matrices are currently parsed column-wise. Attached
is a fixed implementation and a test file used by me for testing
purposes
Details:
The inplace_solve routine has to cope for 24 different cases:
a) row_major / column_major
b) upper / lower
c) dense / packed / sparse
d) A * x = b / x * A = b
The case
row_major / lower_tag / A * x = b is currently mapped to row_major /
upper_tag / x * A = b
and the case
row_major / upper_tag / A * x = b is currently mapped to row_major /
lower_tag / x * A = b
This is very inefficient for compressed_matrix types!
The cases column_major / [lower_tag | upper_tag] / x * A = b are
seemingly handled, but I can't get it to complile.
The cases of row_major / [lower_tag | upper_tag] / x * A = b *do* have
an efficient implementation, which is *not* used to a programming bug,
i.e. making the choice of the function depending on the storage type of
the vector and not on the matrix.
I rewrote triangular.hpp to a) handle *all* cases of A * x = b and to b)
"redirect" the x * A = b cases to the aforementioned ones.
This keeps the code about the same size as it currently is. It costs
some efficiently for the redirected cases, but previously certain cases
were also redirected (just different ones). It is trivial to add the
missing cases, but adds quite a bit of code.
I am not sure what the "packed_storage_tag" means. It previously was
handled identically to the unknown storage type, and I did it likewise.
It can probably be removed. If someone could give me a hint...
I checked the correctness of my implementation using some randomized
matrices as can be seen in test.cc.
The performance has been increased dramatically for compressed_matrix
and is similar for the matrix type. the timing was done using clock()
and only once, which is very unscientific, but serves the purpose. The
test matrices are 10000x10000 matrices with two non-zero diagonals.
old (dense) new (dense) old (sparse) new (sparse)
col_low x: 3240000 3240000 10000 10000
row_low x: 5100000 3720000 2590000 10000
col_upp x: 3320000 3340000 10000 10000
row_upp x: 5120000 3750000 2790000 10000
x col_low: - 4660000 - 10000
x row_low: 3330000 4060000 3750000 20000
x col_upp: - 5110000 - 20000
x row_upp: 3230000 3910000 3750000 10000
with dense=ublas::matrix and sparse=ublas::compressed_matrix
The performance improvement for sparse matrices should justify the
change. The performance for dense matrices (lower four rows) could be
improved upon, as it is affected by the proxy introduced when applying
the trans() operation.
I did not find any testcases, if someone could give me a hint, I would
implement some simple unit-tests.
I checked out boost-trunk today and based my version on that
triangular.hpp. If someone could review my version and check it in (or
discuss with me what to improve) I would be thankful.
Kind regards,
Joern Ungermann
PS: As the files are too large for the reflector I attached a compressed
zip file.
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------