|
Boost : |
From: walter_at_[hidden]
Date: 2001-11-20 10:33:28
--- In boost_at_y <mailto:boost_at_y>..., Peter Schmitteckert (boost) <
boost_at_s <mailto:boost_at_s>...> wrote:
> Salut,
>
> I just found libs_ublas.tgz in a previous posting in this list.
>
> In the document section the performance is compared
> to 'C' arrays. From the sources (bench32.cpp) I conclude that
> this means simple for-loops for matrix multiplication.
The various benchmark cases are
bench11.cpp:
inner product: inner_prod (v1, v2)
vector addition: - (v1 + v2)
bench12.cpp
outer product: - outer_prod (v1, v2)
matrix vector product: prod (m, v)
matrix addition: - (m1 + m2)
bench13.cpp
matrix product: prod (m1, m2)
Performance is always compared to fully inlined C loops on C arrays
without any function call.
> Are there any blocking techniques in the current ublas library ?
One of the objectives for ublas is to reduce the abstraction penalty
when using vector and matrix abstractions. Therefore we use Todd
Veldhuizen's expression template technique to eliminate temporaries
and fuse loops and the Barton-Nackman trick to avoid virtual function
call overhead.
For large matrices it's indicated (as you mention) to go the other
way: reintroduce temporaries and split loops. So I imagine, that the
usage of blocking techniques is a possible extension of ublas.
BTW, netlib BLAS doesn't implement blocking, but LAPACK does AFAIK ;-)
> I'm asking since I would need a performance which is comparable
> to BLAS/ATLAS.
Do you want to get netlib (reference) BLAS performance? Do you want
to get ATLAS performance without any platform specific optimization?
If we tune for a certain platform, which compiler/operating system
combination do you prefer?
Regards
Joerg
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk