Subject: Re: [ublas] uBLAS parallelization
From: Riccardo Rossi (rrossi_at_[hidden])
Date: 20090403 03:16:17
Hola gunther,
what do you mean with dividing in independent working threads?
i have been testing with mpi over multicores and i also do not get any
speedup of matrixvector products due to the limits of the memory bus!
(speedup is almost linear on multicpu systems which do not share the
memory ... multicores suck!!!!)
greetings
Riccardo
On Fri, 20090403 at 00:52 +0200, Gunter Winkler wrote:
> Am Thursday 02 April 2009 schrieb JÃ¶rn Ungermann:
> > I assume that it is possible to do so at least for some sparse matrix
> > implementations, as certain specialized packages offer it (e.g.
> > PetSC). So:
> > 1) Is there some readytouse solution for parallelizing uBLAS sparse
> > matrix operations?
> > 2) If not, is there some ongoing development effort, I could tap
> > into/get involved?
>
> I made some experiments using OMP inside axpy_prod. However the
> distribution of work did not work well, because the overhead of OMP
> took more time than expected.
>
> However I got very good results when I split the work at higher level: I
> distributed the finite elements to different worker threads and
> assembled independent matrices. As the result may big matrix was
> replaced a set of smaller (less nonzeros) matrices whose sum would give
> the original matrix. Then calling the axpy_prod in parallel gave a
> nearly linear speedup.
>
> I think, the adaption of the overall algorithm is always a better way to
> parallelize programs that simply replacing the backend. (Unfortunately,
> this is most time also the painful way ...)
>
> mfg
> Gunter
