**Subject:** [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products

**From:** Boost C++ Libraries (*noreply_at_[hidden]*)

**Date:** 2010-03-22 21:38:28

**Next message:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Previous message:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4030: read_graphviz() reads too much"**Next in thread:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Reply:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Reply:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"

#4033: optimized matrix products

------------------------------------------+---------------------------------

Reporter: guwi17 | Owner: guwi17

Type: Patches | Status: new

Milestone: Boost 1.43.0 | Component: uBLAS

Version: Boost 1.42.0 | Severity: Optimization

Keywords: matrix multiply, performance |

------------------------------------------+---------------------------------

proposal from JÃ¶rn Ungermann:

'''Abstract'''

The (lacking) performance of sparse-matrix products was quite often

noted on this list.

Currently there are several functions which implement matrix products.

Each is efficient under certain conditions only and some of them are not

even documented.

I think it important for users to have one (near-)optimal function only.

The attached file contains an improved axpy_prod that matches the

performance of "prod", "axpy_prod" and "sparse_prod" and is thereby

optimal for all matrix types.

'''Details'''

The optimal choice of kernel for a matrix product depends on the

properties of all involved matrices. The current implementations

specialize only on the type of target matrix. By also determining the

kernel depending on the source matrices, one can choose the most

efficient kernel.

My aim was to create a single function to match the performance of prod,

axpy_prod and sparse_prod for all combinations of

compressed_matrix/matrix and column/row-majority.

The other matrix types should also be handled efficiently by my product,

too, but I did check it only spuriously, as it should be obvious that

those types are more suited for matrix setup, not for actual

calculation.

My axpy_prod implementation (called axpy_prod2 in the attached file)

does not offer support for the undocumented triangular_expression stuff

contained in the original axpy_prod, which however seems to be buggy in

the current code for dense matrices.

The kernels are largely based on existing kernels with one or two new

ones being thrown in. They are as abstract as possible to handle

arbitrary expressions efficiently. Specializing, e.g. directly on

compressed_matrix would give another very significant performance boost,

but also many more additional kernels, possibly more than could be

maintained, especially as the transposedness might have to be handled

explicitly, too.

It would be very nice, if someone could rewrite prod/prec_prod to handle

matrix products in the same way as my axpy_prod2 does, but I did not

look deep enough into the expression-templates to do this myself or to

even know if this were possible.

In fact, I'd propose to have two sets of interfaces:

1) One convenient one, possibly compromising efficiency

[[BR]]

2) One modeled closely after C-BLAS, delivering utmost efficiency.

The latter one could then be *very easily* overloaded by the numeric

bindings for dense matrices. I added a possible generic implementation

for a gemm call that could be trivially overloaded for dense matrices

and forwarded to, e.g., ATLAS numeric bindings.

If one could achieve the same efficiency and automatic (i.e. by

including a header) coupling to numeric bindings using only *one*

interface, I'd prefer that naturally. However currently, we have not

just two, but too many product functions (prod, prec_prod, axpy_prod,

sparse_prod, opb_prod, block_prod).

The following table gives the result for all 64 combinations of

compressed_matrix/matrix and row/column-majorities for the three

involved in this case 2000x2000 matrices.

com_rm is a compressed_matrix of row_major type.

[[BR]]

den_cm is a matrix of column_major type.

[[BR]]

The Â 4th column indicates the used kernel.

[[BR]]

The Â 5th column gives the runtime for axpy_prod2 Â ( clock()/1000 )

[[BR]]

The Â 6th column gives the runtime for sparse_prod ( clock()/1000 )

[[BR]]

The Â 7th column gives the runtime for axpy_prod Â ( clock()/1000 )

[[BR]]

The Â 8th column gives the runtime for prod Â Â Â Â ( clock()/1000 )

[[BR]]

The 10th column gives the speedup of axpy_prod2 compared to sparse_prod.

[[BR]]

The 11th column gives the speedup of axpy_prod2 compared to axpy_prod.

[[BR]]

The 12th column gives the speedup of axpy_prod2 compared to prod.

Larger matrix sizes result in prohibitive runtimes for the "slow"

products, but can be used to analyse pure-sparse products.

The runtime shall be taken only qualitatively.

One can see that the only cases where the new implementation is slower

are of relatively small runtime, so it may be negligible. sparse_prod

uses an optimization that is very efficient if the target matrix has few

off-diagonal elements, but is very inefficient if it does.

The results will therefore vary depending on the test matrices.

It is also obvious to see, why some people complain about product

performance, as especially axpy_prod and prod are sometimes ridiculously

slow for sparse matrices and sparse_prod does not seem to be documented

in the special products section.

My favorite case is "com_cm, com_cm, den_rm" one, which actually

occurred in the diagnostics part of our application and was the reason

why we started looking into this topic.

-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/4033> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.

**Next message:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Previous message:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4030: read_graphviz() reads too much"**Next in thread:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Reply:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"**Reply:**Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #4033: optimized matrix products"

*
This archive was generated by hypermail 2.1.7
: 2017-02-16 18:50:02 UTC
*