Boost logo

Ublas :

From: Michael Stevens (mail_at_[hidden])
Date: 2005-09-23 04:23:27


On Donnerstag 22 September 2005 22:56, Frank Astier wrote:
> Actually, I've now turned to ATLAS, and I want to see if I can get
> better performance there. C or C++, about the same deal for me, I can
> wrap C ATLAS in C++ classes easily, I think.
>
> To reproduce your results, do I need to worry about
> BOOST_UBLAS_CHECK? I think I tried axpy_prod and dismissed it (too
> quickly), as on a par with prod for my specific problem. But I'd like
> to try again with your settings.

You should fix BOOST_UBLAS_CHECK if you use 1.33.0.

I have attached my test code. It uses dense uBLAS vectors which makes more
sense. The results I posted yesterday were for the GENERIC 'axpy_prod'
function in uBLAS. My test code still had some code I used to find the
problem with BOOST_UBLAS_CHECK. This prevented the fast 'axpy_prod'
specialisation for compressed_matrix being used.

Corrected results from gcc-4.01 are:

0 0 0.55
0.1 0.03 0.42
0.2 0.05 0.42
0.3 0.07 0.43
0.4 0.09 0.42
0.5 0.11 0.43
0.6 0.15 0.42
0.7 0.19 0.42
0.8 0.25 0.42
0.9 0.32 0.43
1 0.37 0.42

These results are rather nice!! The uBLAS axpy_prod is always faster then the
naive dense implementation using std::vector, even if there are no zeros. The
compressed format actually allows a very efficient axpy_prod to be
implemented.

It is worth noting the actual implementation of axpy_prod uBLAS uses in this
case. In can work directly using the internal structure of the compressed
matrix thus:

    template<class V, class T1, class IA1, class TA1, class E2>
    BOOST_UBLAS_INLINE
    V &
    axpy_prod (const compressed_matrix<T1, row_major, 0, IA1, TA1> &e1,
               const vector_expression<E2> &e2,
               V &v, row_major_tag) {
        typedef typename V::size_type size_type;
        typedef typename V::value_type value_type;

        for (size_type i = 0; i < e1.filled1 () -1; ++ i) {
            size_type begin = e1.index1_data () [i];
            size_type end = e1.index1_data () [i + 1];
            value_type t (v (i));
            for (size_type j = begin; j < end; ++ j)
                t += e1.value_data () [j] * e2 () (e1.index2_data () [j]);
            v (i) = t;
        }
        return v;
    }

This is what make it much more efficient then prod(). There are similar
implemenations for coordinate matices.

Michael

-- 
___________________________________
Michael Stevens Systems Engineering
34128 Kassel, Germany
Phone/Fax: +49 561 5218038
Navigation Systems, Estimation  and
                 Bayesian Filtering
    http://bayesclasses.sf.net
___________________________________