Boost logo

Ublas :

Subject: Re: [ublas] Optimize: how to speed up prod?
From: oswin krause (oswin.krause_at_[hidden])
Date: 2013-10-04 14:15:35


Hi,

On 04.10.2013 18:52, Reich, Darrell wrote:
>
> Hi---
>
> So far my code below has been unable to beat the Fortran 77 program
> benchmark using a quad core i7.
>
This can be especially when you ar eusing optimized blas routines in
Fortran or don't compile with the proper release mode defines (maximum
optimizations and #define BOOST_UBLAS_NDEBUG).
>
> While debugging this, I get unexpected results from M.filled1(),
> M.filled2() when compared to M.size1() and M.size2(). I expected that
> filled/size * 100 = % filled but filled2 > size2? v.filled and v.size
> are what I expected.
>
filled2 is the number of non zero elements in your matrix this is
between 0 and size1()*size2().
>
> When I switch compressed_vector to vector, the code crashes. I had to
> remove the debug print for v.filled since vector does not implement
> it. I assume here filled = size.
>
there are a few things wrong with your code. Let's start with the most
obvious:

v1[i].vectors.resize(n);

should be

v1[i].vectors[j].resize(n);

this explains completely your problem with noalias: you have to
initialize the target vector accordingly.

also use axpy_prod instead of prod. Still it is likely that sparse
vectors and matrices will give you horrile results. dnse vectors should
be reasonable, though.

Greetings,
Oswin
>
> When I switch to column major form (code commented out below) it takes
> longer. I suspect that is no surprise.
>
> I'm sure it is time to go from basic boost to advanced. Thanks! We're
> using version 1.54.
>
> Extra credit: I'm sure there is an optimized way to do the += and the
> copy too?
>
> I tried noalias(v2) = prod(M,v1) but it crashes with the example below.
>
> I tried writing my own product function using iterators but it crashes
> too.
>
> Boost FOREACH looks interesting but I can't find an example for a
> Matrix and why rewrite prod if we don't have to
>
> I think Plan A get boost prod working faster versus Plan B replace
> prod with rewritten product if the best if you have any suggestions on
> what I could do better. Thanks!
>
> Visual Studio C++ 2012 compiler settings for Win32 build on Windows 7
> 64-bit:
>
> /Yu"StdAfx.h" /MP /GS /GL /analyze- /W3 /Gy- /Zc:wchar_t /Zi /Gm /O2
> /Ob2 /Fd"Release\vc100.pdb" /fp:fast /D "WIN32" /D "NDEBUG" /D
> "_WINDOWS" /D "_USRDLL" /D "TEST_EXPORTS" /D "_WINDLL" /D "_UNICODE"
> /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oy- /Oi /MT
> /Fa"Release\" /EHsc /nologo /Fo"Release\" /Ot /Fp"Release\Test.pch"
>
> #include "stdafx.h" // copied all boost includes into this
> pre-compiled header file
>
> #include <boost/numeric/ublas/matrix.hpp>
>
> #include <boost/numeric/ublas/matrix_sparse.hpp>
>
> #include <boost/numeric/ublas/matrix_expression.hpp>
>
> #include <boost/numeric/ublas/matrix_proxy.hpp>
>
> #include <boost/numeric/ublas/vector.hpp>
>
> #include <boost/numeric/ublas/vector_sparse.hpp>
>
> #include <boost/numeric/ublas/vector_expression.hpp>
>
> #include <boost/numeric/ublas/vector_proxy.hpp>
>
> #include <boost/numeric/ublas/lu.hpp>
>
> #include <boost/numeric/ublas/io.hpp>
>
> // make release build faster
>
> #ifdef NDEBUG
>
> #define BOOST_UBLAS_NDEBUG
>
> #endif
>
> struct VECTOR
>
> {
>
> std::vector<boost::numeric::ublas::compressed_vector<float>> vectors;
>
> };
>
> std::vector<boost::numeric::ublas::compressed_matrix<float>> theMatrix;
>
> //std::vector<boost::numeric::ublas::compressed_matrix<float,
> boost::numeric::ublas::column_major>> theMatrix;
>
> std::vector<VECTOR> v1;
>
> std::vector<VECTOR> v2;
>
> // ...
>
> ilen = 22;
>
> jlen = 2;
>
> xlen = 250;
>
> ylen = 250;
>
> n = xlen * ylen; // size = 64,000
>
> theMatrix.reserve(ilen);
>
> v1.reserve(ilen);
>
> v2.reserve(ilen);
>
> // ...
>
> v2[i].vectors.reserve(jlen);
>
> v2[i].vectors.reserve(jlen);
>
> // ...
>
> theMatrix[i].resize(n,n); // sparse ~7 diagonals filled
>
> v1[i].vectors.resize(n); // starts sparse, ends filled
>
> v2[i].vectors.resize(n);
>
> // ...
>
> theMatrix[i](j,k) = x; // load matrix once
>
> // ...
>
> v2[i].vectors[j] = prod(theMatrix[i], v1[i].vectors[j]);
>
> // ...
>
> v2[i].vectors[j](k) += datapoint; // add more to vector each time
>
> // ...
>
> // copy values back to save for next time step...
>
> for (int i = 0; i < ilen; i++)
>
> {
>
> for (int j = 0; j < jlen; j++)
>
> {
>
> v1[i].vectors[j] = v2[i].vectors[j];
>
> }
>
> }
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]