Boost logo

Ublas :

Subject: Re: [ublas] pointer to matrix structure
From: Marcel Rehberg (Marcel.Rehberg_at_[hidden])
Date: 2011-08-08 08:14:34


On Mon, 08 Aug 2011 11:41:31 +0200, Kraus Philipp
<philipp.kraus_at_[hidden]> wrote:

>
> Am 08.08.2011 um 11:20 schrieb Marcel Rehberg:
>
>> Right, this would copy the data. What I meant was to use
>> ublas::<T,ublas::column_major> for your original matrix, right from the
>> beginning.
>>
>> It does not make a difference from the ublas interface point of view.
>> So it wouldn't require any code changes except in the definition of the
>> matrices*. To make things easier you could use a typedef like
>
> I think there is a difference: I read the matrix data often in the
> row-order, a loop over the rows and a inner loop over the columns, so if
> I use a column-major the loops must / should be changed. If I use a
> column-major matrix and iterate first over the rows and with the inner
> loop over the columns, it can be create a poor performance, so I would
> like to have a row-major matrix.

Ah sorry, I wasn't even aware of the performance penalty* and I don't know
a way to avoid it. My approach would be to choose the ordering based on
what is the most time or memory consuming operation. In my case I use
lapack quite allot and so I went with ublas::column_major just to avoid
the copying. Sorry that I can't help more.

regards

Marcel

* I wrote a little test program:

#include <iostream>
#include "boost/numeric/ublas/matrix.hpp"
#include "boost/numeric/ublas/io.hpp"
#include "boost/progress.hpp"

int main() {
   namespace ublas = boost::numeric::ublas;

   typedef ublas::matrix<double, ublas::column_major> myMatrixColMaj;
   typedef ublas::matrix<double> myMatrixRowMaj;

   unsigned n_rows=5000;
   unsigned n_cols=5000;

   unsigned i,j;

   myMatrixColMaj M1(n_rows,n_cols);
   myMatrixRowMaj M2(n_rows,n_cols);

   // fill both matrices before testing, for more reliable timings.
   for (j=0; j<n_cols;j++) {
     for (i=0; i<n_rows;i++) {
       M1(i,j)=i+j+1;
       M2(i,j)=i+j+1;
     }
   }

   {
     boost::progress_timer t;
     for (i=0; i<n_rows;i++)
       for (j=0; j<n_cols;j++)
        M1(i,j)=i+j+1;
   }

   {
     boost::progress_timer t;
     for (j=0; j<n_cols;j++)
       for (i=0; i<n_rows;i++)
        M1(i,j)=i+j+1;
   }

   {
     boost::progress_timer t;
     for (i=0; i<n_rows;i++)
       for (j=0; j<n_cols;j++)
        M2(i,j)=i+j+1;
   }

   {
     boost::progress_timer t;
     for (j=0; j<n_cols;j++)
       for (i=0; i<n_rows;i++)
        M2(i,j)=i+j+1;
   }
   return 0;
}

Output on my box is:

0.30 s

0.08 s

0.08 s

0.30 s

So traversing the wrong order nearly takes four times as long as
traversing the right order. (in all cases M1==M2).

Regards

Marcel