Boost logo

Ublas :

Subject: Re: [ublas] Status of development /Benchmarks
From: oswin krause (oswin.krause_at_[hidden])
Date: 2013-12-09 12:27:32


Essentially it is optimizing the expression-tree prior to evaluation to
allow for more general algorithms and expressions.

Let's take this example of a function compution A^k*x

vector exponential(matrix_expression<M> const& A,vector_expression<V>
const& x,unsigned int k){
     vector z = x;
     for(int i = 0; i != k; ++i){
         z = prod(matrix,z);
     return z;

And now let's assume we call it like

matrix X(...);//something big
vector x(...);
vector y = exponential(prod(X,trans(X)),c,3); //(XX^T)^3

With current ublas the internal function will produce an error because
there is a nested prod. Even if we fixed prod to return a temporary
this would have a real bad runtime behavior as we carry out the complete
matrix-matrix product of XX^T

How can we fix that?

(i am simplifying as i don't remember the original types)
in the simplest case, prod for vectors would create an optimized tree.
so if it detects that one of its arguments has the form
matrix_prod<E1,E2>, it would normally create

vector_prod< matrix_prod<E1,E2> ,V>

but in this case it would instead create

vector_prod< E1,vector_prod<E2,V> >

the real fun starts when we look at more complex operators like

vector y = exponential(prod(X,trans(X))+D,c,3);

so that the product is hidden by the matrix-matrix addition.

I don't know how you would solve it using boost::proto, but in general
you could use a traits class like

//default case
template<class M, class V>
struct vector_prod_constructor{
      typedef vector_prod<M,V> type;
      type create(M const& m, V const& v){
         return type(m,v);

//specialisation for expressions
//prod(A+B,x)=> prod(A,x)+prod(B,x)
template<class A, class B, class X>
struct vector_prod_constructor<vector_addition<A,B>,X >{
     typedef typename vector_prod_constructor<A,X>::type Atype;
     typedef typename vector_prod_constructor<B,X>::type Btype;
      typedef vector_addition<Atype,Btype> type;
      type create(vector_addition<A,B> const& e, X const& x){
         return type(Atype(e.expression1(),x),Btype(e.expression2(),x));


On 09.12.2013 15:08, Nasos Iliopoulos wrote:
> Oswin,
> what do you have in mind in 2.2?
> On 12/07/2013 04:10 PM, oswin krause wrote:
>> 2.2 automatic optimisation of expressions
>> long description
>> 1. Rock solid implementation of the BLAS algorithms: matrix x vector
>> and matrix x matrix in all useful combinations (sparse/sparse
>> dense/dense dense/sparse ... triangular/dense...etc there are a lot),
>> but also: matrix assignment. uBLAS will not be compared in #Features
>> but in runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a
>> fast implementation of this is also crucial for multi-core and
>> distributed computations. Also all higher level decompositions
>> (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely on
>> fast BLAS.
>> 2. Rewriting of the expression mechanic. The way uBLAS is implemented
>> right now, writing
>> noalias(A)+=prod(B,C)+D
>> or
>> vec = prod(prod<matrix>(B,C),vec2);
>> is in most cases a bad decision. The second expression wouldn't even
>> compile without the template parameter.
>> What I would like to have is the following:
>> 2.1 automatic use of the fast algorithms. That is the above
>> expression should be translated to
>> axpy_prod(B,C,A,false);
>> noalias(A) +=D;
>> right now the matrix_assign algorithms are used which do a terrible
>> job on prod(B,C).
>> 2.2 automatic optimization of expressions. The second expression
>> above is really inefficient. First the matrix-matrix prod is
>> evaluated and the result is afterwards used for a single
>> matrix-vector prod.
>> now compare to this:
>> prod(B,prod<vector>(C,d));
>> only 2 matrix-vector products are performed which might save 99% of
>> the FLOPS of the first expression. So it would be better if this
>> would be transformed automatically. Normally the user could do that
>> himself but sometimes it is not possible. Consider a Conjugate
>> Gradient Solver:
>> x=CG(A,b);
>> a usual pattern of A is A=XX^T+c*Identity. CG uses internally only
>> matrix-vector products - and possibly not many of them. Therefore you
>> don't want to compute A because it is a) slow b) might require more
>> memory than is available. Without expression optimizations it would
>> be impossible to write a good CG that could do that.
>> Greetings,
>> Oswin
>> On 07.12.2013 11:38, David Bellot wrote:
>>> Hi,
>>> it has been a long discussion we all had for many months now. Should
>>> we rewrite ublas from scratch or simply improve it.
>>> Joaquim and Oswin wants a brand new ublas
>>> Nasos was more in favor of improving it.
>>> I personally find very exciting the idea of writing something new,
>>> but ublas is very well known now. On the other hand, Eigen and
>>> Armadillo took the crown of the main C++ blas library in users' hearts.
>>> On my todo list for ublas, there are things that will require ublas
>>> to be deeply changed. At this stage, we can almost talk about a new
>>> library.
>>> Christmas is very close now, so maybe it's a good time to start
>>> talking about the features we wish for ublas and see if they can be
>>> implemented with the current version or if a new uBLAS 2.0 is necessary.
>>> After all, Boost::signal did the same a few years ago. We can
>>> definitively do the transition.
>>> I begin:
>>> - unified representation of vectors and matrices to represent the
>>> fact that a vector IS a matrix. Matlab does the same
>>> - automated use of different algorithm to let the compiler "chooses"
>>> the best implementation (if possible) and switch on SSE, distributed
>>> or whateve we need
>>> - implementation of solvers and decompositions algorithms
>>> and this is what Nasos and I think should be integrated too:
>>> 1. Matrix multiplication
>>> 2. Algorithms infrastructure (so that we can have real useful features)
>>> 3. Matrix/vector views for interoperability <- I think this is ultra
>>> critical because now ublas is monolithic in the sense that you have
>>> to use it everywhere you manipulate data. This would really help
>>> into letting people for example have a list of vectors (they are
>>> plotting) and ublas working on top of that to do for example
>>> transformations
>>> 4. NEW DOCUMENTATION - examples and the rest
>>> 5. Incorporate some critical bindings (i.e. mumps bindings which is
>>> currently probably the most efficient smp and distributed open
>>> source linalg solver)
>>> 6. matlab binding?
>>> 7. distributed ublas
>>> Please add and ESPECIALLY, please tell me your view on the current
>>> infrastructure of uBLAS. It seems many people are not happy with the
>>> current "expression template" grammar.
>>> I'm open to everything
>>> Best,
>>> David
>>> On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran <jduran.gm_at_[hidden]
>>> <mailto:jduran.gm_at_[hidden]>> wrote:
>>> I think that al stuff pending of merge listed by David, should
>>> be merged and migrate to uBlas 2.0 and while uBlas 2.0 is in
>>> development/maintenance then design from scratch uBlas 3.0.
>>> _______________________________________________
>>> ublas mailing list
>>> ublas_at_[hidden]
>>> Sent to:Oswin.Krause_at_[hidden]
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> Sent to:athanasios.iliopoulos.ctr.gr_at_[hidden]
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> Sent to: Oswin.Krause_at_[hidden]