Boost logo

Ublas :

Subject: Re: [ublas] Status of development /Benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2013-12-09 09:08:53

what do you have in mind in 2.2?

On 12/07/2013 04:10 PM, oswin krause wrote:
> Hi,
> I like your ideas. But i am not sure what you mean with 1. and 2.
> The problem I see is, that uBLAS right now doesn't offer the desired
> quality and speed for single core, let alone multi core. Therefore I
> will focus on the internal infrastructure instead of motivating new
> algorithms.
> short description:
> 1. Rock solid implementation of the BLAS algorithms( "compute kernels")
> 2. Rewriting of the expression mechanic.
> 2.1 automatic use of the compute kernels for the operations
> 2.2 automatic optimisation of expressions
> long description
> 1. Rock solid implementation of the BLAS algorithms: matrix x vector
> and matrix x matrix in all useful combinations (sparse/sparse
> dense/dense dense/sparse ... triangular/dense...etc there are a lot),
> but also: matrix assignment. uBLAS will not be compared in #Features
> but in runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a
> fast implementation of this is also crucial for multi-core and
> distributed computations. Also all higher level decompositions
> (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely on
> fast BLAS.
> 2. Rewriting of the expression mechanic. The way uBLAS is implemented
> right now, writing
> noalias(A)+=prod(B,C)+D
> or
> vec = prod(prod<matrix>(B,C),vec2);
> is in most cases a bad decision. The second expression wouldn't even
> compile without the template parameter.
> What I would like to have is the following:
> 2.1 automatic use of the fast algorithms. That is the above expression
> should be translated to
> axpy_prod(B,C,A,false);
> noalias(A) +=D;
> right now the matrix_assign algorithms are used which do a terrible
> job on prod(B,C).
> 2.2 automatic optimization of expressions. The second expression above
> is really inefficient. First the matrix-matrix prod is evaluated and
> the result is afterwards used for a single matrix-vector prod.
> now compare to this:
> prod(B,prod<vector>(C,d));
> only 2 matrix-vector products are performed which might save 99% of
> the FLOPS of the first expression. So it would be better if this would
> be transformed automatically. Normally the user could do that himself
> but sometimes it is not possible. Consider a Conjugate Gradient Solver:
> x=CG(A,b);
> a usual pattern of A is A=XX^T+c*Identity. CG uses internally only
> matrix-vector products - and possibly not many of them. Therefore you
> don't want to compute A because it is a) slow b) might require more
> memory than is available. Without expression optimizations it would be
> impossible to write a good CG that could do that.
> Greetings,
> Oswin
> On 07.12.2013 11:38, David Bellot wrote:
>> Hi,
>> it has been a long discussion we all had for many months now. Should
>> we rewrite ublas from scratch or simply improve it.
>> Joaquim and Oswin wants a brand new ublas
>> Nasos was more in favor of improving it.
>> I personally find very exciting the idea of writing something new,
>> but ublas is very well known now. On the other hand, Eigen and
>> Armadillo took the crown of the main C++ blas library in users' hearts.
>> On my todo list for ublas, there are things that will require ublas
>> to be deeply changed. At this stage, we can almost talk about a new
>> library.
>> Christmas is very close now, so maybe it's a good time to start
>> talking about the features we wish for ublas and see if they can be
>> implemented with the current version or if a new uBLAS 2.0 is necessary.
>> After all, Boost::signal did the same a few years ago. We can
>> definitively do the transition.
>> I begin:
>> - unified representation of vectors and matrices to represent the
>> fact that a vector IS a matrix. Matlab does the same
>> - automated use of different algorithm to let the compiler "chooses"
>> the best implementation (if possible) and switch on SSE, distributed
>> or whateve we need
>> - implementation of solvers and decompositions algorithms
>> and this is what Nasos and I think should be integrated too:
>> 1. Matrix multiplication
>> 2. Algorithms infrastructure (so that we can have real useful features)
>> 3. Matrix/vector views for interoperability <- I think this is ultra
>> critical because now ublas is monolithic in the sense that you have
>> to use it everywhere you manipulate data. This would really help into
>> letting people for example have a list of vectors (they are plotting)
>> and ublas working on top of that to do for example transformations
>> 4. NEW DOCUMENTATION - examples and the rest
>> 5. Incorporate some critical bindings (i.e. mumps bindings which is
>> currently probably the most efficient smp and distributed open source
>> linalg solver)
>> 6. matlab binding?
>> 7. distributed ublas
>> Please add and ESPECIALLY, please tell me your view on the current
>> infrastructure of uBLAS. It seems many people are not happy with the
>> current "expression template" grammar.
>> I'm open to everything
>> Best,
>> David
>> On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran <jduran.gm_at_[hidden]
>> <mailto:jduran.gm_at_[hidden]>> wrote:
>> I think that al stuff pending of merge listed by David, should be
>> merged and migrate to uBlas 2.0 and while uBlas 2.0 is in
>> development/maintenance then design from scratch uBlas 3.0.
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> Sent to:Oswin.Krause_at_[hidden]
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]