From: Michael Stevens (mail_at_[hidden])
Date: 2005-06-14 14:57:45
I think there are many issues here. I think most of the important stuff has
already been mentioned:
1. Comparing uBLAS operation with vendor BLAS is not always realistic as the
libraries often take numerical shortcuts.
2. Using uBLAS and ATLAS combined is the way to do. Many people use the
binding library so they can use uBLAS containers and expressions and still
explictly call ATLAS. Generally only a very small fraction of code is time
critical and so the extra coding required is not usual a big issue.
3. uBLAS expressions are far more flexible then BLAS. Sometimes you need this
flexibility and so you can expect to take a performance hit.
On Wednesday 08 June 2005 22:49, Matthias Troyer wrote:
> On Jun 8, 2005, at 10:35 PM, Gunter Winkler wrote:
> > Yes, ublas can not (and will not) compete with atlas. You can play
> > with
> > different matrix products and matrix sizes with the attached sample
> > program.
> > You see atlas is at least 4 times faster than ublas. For more details,
> > please, look at the source.
> During the ublas review at boost, it was mentioned that
> competitiveness with BLAS/ATLAS, e.g. by using those libraries was an
> essential goal. With a factor of four in performance, ublas is
> unfortunately *completely useless* for any serious high performance
> work. With CPU time costs (total cost of ownership) for some
> calculations that we perform of the order off several ten thousand US
> $, we cannot afford spending even a factor 4 more.
Given point 2. above , I think *completely useless* is rather an over
statement. The factor of four is also very compiler and library dependant.
For dense matrix ops we should get closer, but optimising code for many
compilers is hard. ATLAS does a fantastic job here. For time critical inner
loops on big problems even a factor of 1.5 would be un-exceptable. But you
can still use BLAS calls.
> It is thus a pity that ublas will not make use of atlas, although it
> would be so easy: just use blas for any expression (such as dense
> matrix multiplication) that can be mapped to a blas call.
The problem is it is not easy at all. BLAS only provides a small subset of the
operations uBLAS requires. They can also require explicit results as
workspace which does not fit with a general expression syntax. Also there are
many choices, which operation should be called to convert A = prod(B,C) into
Mostly explicit BLAS invocation is the way to go. However sometimes domain
specific libraries on top of uBLAS that provide automatic BLAS invocation can
be very helpful.
All the best,