|
Ublas : |
From: Christophe Prud'homme (prudhomm_at_[hidden])
Date: 2005-08-24 16:33:48
[ Mercredi 24 Août 2005 22:06 ]
| Hallo,
|
| one of our students looked at the performance of the cholesky decomposition
| using ublas. The result is quite surprising:
|
| $ for i in 100 200 400 800 1600; do ./cholesky_test $i ; done
| 0: 0 (deco: 0 sec) (prod: 0.02 sec) 100
| 0: 0 (deco: 0.01 sec) (prod: 0.01 sec) 100
| 0: 0 (deco: 0 sec) (prod: 0.01 sec) 100
|
| 0: 0 (deco: 0.03 sec) (prod: 0.13 sec) 200
| 0: 0 (deco: 0.06 sec) (prod: 0.06 sec) 200
| 0: 0 (deco: 0.02 sec) (prod: 0.01 sec) 200
|
| 0: 0 (deco: 0.18 sec) (prod: 1.03 sec) 400
| 0: 0 (deco: 0.43 sec) (prod: 0.41 sec) 400
| 0: 0 (deco: 0.03 sec) (prod: 0.03 sec) 400
|
| 0: 0 (deco: 1.56 sec) (prod: 8.14 sec) 800
| 0: 0 (deco: 3.42 sec) (prod: 3.22 sec) 800
| 0: 0 (deco: 0.09 sec) (prod: 0.05 sec) 800
|
| 0: 0 (deco: 12.28 sec) (prod: 65.42 sec) 1600
| 0: 0 (deco: 26.68 sec) (prod: 25.48 sec) 1600
| 0: 0 (deco: 0.28 sec) (prod: 0.09 sec) 1600
|
| The table lists the times for performing the cholesky decomposition of
| different matrix types and sizes. The first line of each block shows the
| decomposition time of a dense matrix (only accessing the lower triangle).
| The second line uses a (lower) triangular_matrix. This should at least be
| as fast as the dense case but takes 3 times as long. The third line show
| the times for a banded matrix with (at most) 50 bands below the diagonal.
| (If the banded matrix has all maximal possible bands it is even slower than
| the triangular matrix.)
|
| I suspect the matrix rows/columns and vector ranges to be the source of the
| slow down. So I have to do more research on that.
|
| see http://www.bauv.unibw-muenchen.de/~winkler/ublas/examples/
|
| Do you have an idea how to reduce the slow down?
on a 2 years old pentium m I get these times with ublas/cvs
and gcc version 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)
g++ -o cholesky_test cholesky_test.cpp -DNDEBUG -O2
for i in 100 200 400 800 1600; do ./cholesky_test $i ; done
0: 0 (deco: 0 sec) (prod: 0 sec) 100
0: 0 (deco: 0.01 sec) (prod: 0 sec) 100
0: 0 (deco: 0 sec) (prod: 0 sec) 100
0: 0 (deco: 0.01 sec) (prod: 0.02 sec) 200
0: 0 (deco: 0.01 sec) (prod: 0.02 sec) 200
0: 0 (deco: 0.01 sec) (prod: 0.01 sec) 200
0: 0 (deco: 0.04 sec) (prod: 0.38 sec) 400
0: 0 (deco: 0.12 sec) (prod: 0.12 sec) 400
0: 0 (deco: 0.02 sec) (prod: 0.02 sec) 400
0: 0 (deco: 0.52 sec) (prod: 3.38 sec) 800
0: 0 (deco: 1 sec) (prod: 0.94 sec) 800
0: 0 (deco: 0.05 sec) (prod: 0.04 sec) 800
1570: 1.41535e+13 (deco: 5.08 sec) (prod: 24.27 sec) 1600
1570: 1.41535e+13 (deco: 8.07 sec) (prod: 7.54 sec) 1600
0: 0 (deco: 0.22 sec) (prod: 0.07 sec) 1600
Something is fishy as you can see with ublas/cvs for 1600 but otherwise the
timings don't look too bad. Comparing with ublas/lapack binding would be
interesting.
cu
C.
-- MIT Affiliate