|
Ublas : |
From: Michael Stevens (mail_at_[hidden])
Date: 2005-06-17 08:23:39
On Thursday 16 June 2005 22:35, christopher diggins wrote:
> ----- Original Message -----
> From: "Gunter Winkler" <guwi17_at_[hidden]>
> To: "ublas mailing list" <ublas_at_[hidden]>
> Sent: Thursday, June 16, 2005 4:24 PM
> Subject: Re: [ublas] Aliasing for += and -=
>
> > uBLAS is surprisingly slow ...
>
> Makes you wonder if all of that expression template machinery is really
> worth it, doesn't it ;-)
The expression template machinery is rather hairy! Mind you it is only there
to allow the nice expression syntax without the even larger overhead of
standard C++ matrix library where temporaries occur at every step.
> I'd like to point how few lines of code are in the implementation I
> supplied (when compared to the ublas monstrosity), and the fact that there
> are many optimizations which weren't done (which is why I kept calling it a
> "naive" implementation).
The naive implementation is not faster! I have appended results for the
'bench1 100'. This compares uBLAS dense vector matrices with naive
implementations. In this case I have the results for vector/matrix size 100.
'fast' means without a temporary, 'safe' with. Might be worth running this
yourself to see if you get divergent results. I have just committed changes
to boost_CVS which make the bench1-4 results much easier to read.
Here the uBLAS 'c_vector' and 'vector<unbounded>' array show almost no
overhead when compared to directly manipulated fixed size C arrays. For
smaller sizes there is some overhead, with unbounded_array suffering the most
as expected.
Of course this does not mean the these operations in uBLAS are comparable with
the best ATLAS or manufacture BLAS speeds.
There also looks to me to be a big performance problem with the implementation
of axpy multiply in uBLAS.
Michael
DOUBLE, 100
bench_1
inner_prod
C array
elapsed: 0.64 s, 889.599 Mflops
c_vector
elapsed: 0.66 s, 862.642 Mflops
vector<unbounded_array>
elapsed: 0.66 s, 862.642 Mflops
vector + vector
C array
elapsed: 0.64 s, 894.07 Mflops
c_vector safe
elapsed: 1.14 s, 501.934 Mflops
c_vector fast
elapsed: 0.66 s, 866.977 Mflops
vector<unbounded_array> safe
elapsed: 1.23 s, 465.207 Mflops
vector<unbounded_array> fast
elapsed: 0.66 s, 866.977 Mflops
bench_2
outer_prod
C array
elapsed: 0.5 s, 1144.41 Mflops
c_matrix, c_vector safe
elapsed: 1.91 s, 299.584 Mflops
c_matrix, c_vector fast
elapsed: 0.71 s, 805.922 Mflops
matrix<unbounded_array>, vector<unbounded_array> safe
elapsed: 0.89 s, 642.927 Mflops
matrix<unbounded_array>, vector<unbounded_array> fast
elapsed: 0.57 s, 1003.87 Mflops
prod (matrix, vector)
C array
elapsed: 0.65 s, 875.913 Mflops
c_matrix, c_vector safe
elapsed: 0.66 s, 862.642 Mflops
c_matrix, c_vector fast
elapsed: 0.65 s, 875.913 Mflops
matrix<unbounded_array>, vector<unbounded_array> safe
elapsed: 0.66 s, 862.642 Mflops
matrix<unbounded_array>, vector<unbounded_array> fast
elapsed: 0.66 s, 862.642 Mflops
matrix + matrix
C array
elapsed: 0.96 s, 596.046 Mflops
c_matrix safe
elapsed: 2.44 s, 234.51 Mflops
c_matrix fast
elapsed: 1.21 s, 472.896 Mflops
matrix<unbounded_array> safe
elapsed: 1.3 s, 440.157 Mflops
matrix<unbounded_array> fast
elapsed: 1 s, 572.205 Mflops
bench_3
prod (matrix, matrix)
C array
elapsed: 0.7 s, 813.348 Mflops
c_matrix safe
elapsed: 0.75 s, 759.125 Mflops
c_matrix fast
elapsed: 0.73 s, 779.923 Mflops
matrix<unbounded_array> safe
elapsed: 0.76 s, 749.136 Mflops
matrix<unbounded_array> fast
elapsed: 0.76 s, 749.136 Mflops
>
> Christopher Diggins
> http://www.cdiggins.com
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
-- ___________________________________ Michael Stevens Systems Engineering 34128 Kassel, Germany Phone/Fax: +49 561 5218038 Navigation Systems, Estimation and Bayesian Filtering http://bayesclasses.sf.net ___________________________________