Subject: Re: [ublas] uBLAS parallelization
From: Riccardo Rossi (rrossi_at_[hidden])
Date: 2009-04-03 03:16:17
what do you mean with dividing in independent working threads?
i have been testing with mpi over multicores and i also do not get any
speedup of matrix-vector products due to the limits of the memory bus!
(speedup is almost linear on multi-cpu systems which do not share the
memory ... multicores suck!!!!)
On Fri, 2009-04-03 at 00:52 +0200, Gunter Winkler wrote:
> Am Thursday 02 April 2009 schrieb JÃ¶rn Ungermann:
> > I assume that it is possible to do so at least for some sparse matrix
> > implementations, as certain specialized packages offer it (e.g.
> > PetSC). So:
> > 1) Is there some ready-to-use solution for parallelizing uBLAS sparse
> > matrix operations?
> > 2) If not, is there some ongoing development effort, I could tap
> > into/get involved?
> I made some experiments using OMP inside axpy_prod. However the
> distribution of work did not work well, because the overhead of OMP
> took more time than expected.
> However I got very good results when I split the work at higher level: I
> distributed the finite elements to different worker threads and
> assembled independent matrices. As the result may big matrix was
> replaced a set of smaller (less nonzeros) matrices whose sum would give
> the original matrix. Then calling the axpy_prod in parallel gave a
> nearly linear speedup.
> I think, the adaption of the overall algorithm is always a better way to
> parallelize programs that simply replacing the backend. (Unfortunately,
> this is most time also the painful way ...)
> ublas mailing list