Subject: Re: [ublas] uBLAS parallelization
From: Gunter Winkler (guwi17_at_[hidden])
Date: 2009-04-03 16:57:08
Am Friday 03 April 2009 schrieb Riccardo Rossi:
> Hola gunther,
> what do you mean with dividing in independent working threads?
I made my tests mainly on my Athlon X2 and on a (4 year old) 4 way Xeon
System. I tested axpy_prod of a compressed_matrix and dense vectors.
With a single matrix the speedup was noticable but far from linear.
Multiplying two/four matrices in parallel worked much better. However I
did not do any deep investigation because my program spent a lot more
time in pre- and post-processing than in solving linear systems. (These
operations could be parallelized easily.)
> i have been testing with mpi over multicores and i also do not get
> any speedup of matrix-vector products due to the limits of the memory
> bus! (speedup is almost linear on multi-cpu systems which do not
> share the memory ... multicores suck!!!!)
I guess you are using dense matrix vector products? I agree that these
products are limited by memory bandwidth.