Boost logo

Boost :

From: Joerg Walter (jhr.walter_at_[hidden])
Date: 2003-05-17 06:47:59


Hi Patrick,

you wrote:

> I just want to send you a small table with time needed for multiplications
> in several cases. The first column represent the used data_types
> R=row_major,C=column_major. The 6 last columns are classical
multiplications
> with loops. rkc->rck is just a differnent order of the loops. I used uBlas
> there as the container and I think the result is pretty close to the use
of
> valarray or double*.

Never checked that so completely, so thanks for doing the work.

> Big surprise: Take the right configuration and uBlas runs extremly!!!
fast,
> but taking the wrong one.....
> I will try it for 1500x1500 now as well, but as far as I have seen, it
will
> produce more or less the same result.

Probably.

> size of metrices - 500x500
> ublas rkc rck krc kcr crk ckr
> RRR 2.453 2.218 4.141 2.515 5.344 3.343 4.937
> RRC 1.093 4.36 2.329 3.235 5.375 2.359 4.844
> RCR 3.922 2.391 4.984 2.437 4.5 4.782 2.859
> RCC 1.86 4.235 3.406 3.14 4.437 4.25 2.813
> CRR 1.906 2.875 4.203 4.422 3.282 3.25 4.25
> CRC 1.078 5.109 2.453 5.437 3.125 2.407 4.172
> CCR 3.922 2.937 4.875 4.531 2.563 4.828 2.422
> CCC 2.438 4.89 2.984 5.25 2.516 4.062 2.375

I've added another test for blocked products and get the following results:

size of metrices - 500x500

      prod block rkc rck krc kcr crk ckr
RRR 2.44 1.2 2.1 4.32 2.56 6.7 3.44 6.15
RRC 1.06 0.55 4.34 2.07 3.45 6.84 2.11 6.11
RCR 4.18 3.43 2.16 6.11 2.52 4.64 6.2 2.9
RCC 1.73 1.2 4.38 3.35 3.43 4.67 4.52 2.85
CRR 1.73 1.27 3 4.39 4.77 3.33 3.3 4.4
CRC 1.03 0.54 5.67 2.13 6.55 3.4 2.12 4.37
CCR 4.15 3.41 3.07 6.09 4.74 2.53 6.06 2.13
CCC 2.45 1.12 5.61 3.46 6.28 2.55 4.44 2.07

RCR and CCR possibly could be the reason to add another couple of
axpy_prod() overloads.

Tests ran on my Intel 1.7 GHz P4 box under Linux with GCC.3.2.1. Best result
is around 500 MFlops and not half as fast as ATLAS dgemm IIRC.

> YES, I FEEL SORRY TO POST CODE SNIPPETS WITH THESE MACROS IN
> THE C++ COMMUNITY, ... anyway.

I've seen incredible things done with the preprocessor here, so probably no
need to worry.

<snip some code>

> #define mul2(xtype) \
> { \
> initmatrix(A); \
> initmatrix(B); \
> t.restart(); \
> X = ublas::prod< xtype >(A,B); \
> time = t.elapsed(); \
> cout << setw(8) << time; \
> }

Changed/added:
----------
#define mul2(xtype) \
{ \
  zeromatrix(X); \
  initmatrix(A); \
  initmatrix(B); \
  t.restart(); \
  X.plus_assign(prod(A,B)); \
  time = t.elapsed(); \
  cout << setw(8) << time; \
  cerr << equals(X,prod(A,B)) << endl; \
}

#define mul3(xtype) \
{ \
  zeromatrix(X); \
  initmatrix(A); \
  initmatrix(B); \
  t.restart(); \
  X.plus_assign(ublas::block_prod<xtype>(A,B,64)); \
  time = t.elapsed(); \
  cout << setw(8) << time; \
  cerr << equals(X,prod(A,B)) << endl; \
}
----------

[snip more code]

Thanks again,

Joerg


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk