Hi,

Essentially it is optimizing the expression-tree prior to evaluation to allow for more general algorithms and expressions.

Let's take this example of a function compution A^k*x

vector exponential(matrix_expression<M> const& A,vector_expression<V> const& x,unsigned int k){

vector z = x;

for(int i = 0; i != k; ++i){

z = prod(matrix,z);

}

return z;

}

And now let's assume we call it like

matrix X(...);//something big

vector x(...);

vector y = exponential(prod(X,trans(X)),c,3); //(XX^T)^3

With current ublas the internal function will produce an error because there is a nested prod. Even if we fixed prod to return a temporary this would have a real bad runtime behavior as we carry out the complete matrix-matrix product of XX^T

How can we fix that?

(i am simplifying as i don't remember the original types)

in the simplest case, prod for vectors would create an optimized tree. so if it detects that one of its arguments has the form matrix_prod<E1,E2>, it would normally create

vector_prod< matrix_prod<E1,E2> ,V>

but in this case it would instead create

vector_prod< E1,vector_prod<E2,V> >

the real fun starts when we look at more complex operators like

vector y = exponential(prod(X,trans(X))+D,c,3);

so that the product is hidden by the matrix-matrix addition.

I don't know how you would solve it using boost::proto, but in general you could use a traits class like

//default case

template<class M, class V>

struct vector_prod_constructor{

typedef vector_prod<M,V> type;

type create(M const& m, V const& v){

return type(m,v);

}

};

//specialisation for expressions

//prod(A+B,x)=> prod(A,x)+prod(B,x)

template<class A, class B, class X>

struct vector_prod_constructor<vector_addition<A,B>,X >{

typedef typename vector_prod_constructor<A,X>::type Atype;

typedef typename vector_prod_constructor<B,X>::type Btype;

typedef vector_addition<Atype,Btype> type;

type create(vector_addition<A,B> const& e, X const& x){

return type(Atype(e.expression1(),x),Btype(e.expression2(),x));

}

};

//etc...

Greetings,

Oswin

On 09.12.2013 15:08, Nasos Iliopoulos wrote:

Essentially it is optimizing the expression-tree prior to evaluation to allow for more general algorithms and expressions.

Let's take this example of a function compution A^k*x

vector exponential(matrix_expression<M> const& A,vector_expression<V> const& x,unsigned int k){

vector z = x;

for(int i = 0; i != k; ++i){

z = prod(matrix,z);

}

return z;

}

And now let's assume we call it like

matrix X(...);//something big

vector x(...);

vector y = exponential(prod(X,trans(X)),c,3); //(XX^T)^3

With current ublas the internal function will produce an error because there is a nested prod. Even if we fixed prod to return a temporary this would have a real bad runtime behavior as we carry out the complete matrix-matrix product of XX^T

How can we fix that?

(i am simplifying as i don't remember the original types)

in the simplest case, prod for vectors would create an optimized tree. so if it detects that one of its arguments has the form matrix_prod<E1,E2>, it would normally create

vector_prod< matrix_prod<E1,E2> ,V>

but in this case it would instead create

vector_prod< E1,vector_prod<E2,V> >

the real fun starts when we look at more complex operators like

vector y = exponential(prod(X,trans(X))+D,c,3);

so that the product is hidden by the matrix-matrix addition.

I don't know how you would solve it using boost::proto, but in general you could use a traits class like

//default case

template<class M, class V>

struct vector_prod_constructor{

typedef vector_prod<M,V> type;

type create(M const& m, V const& v){

return type(m,v);

}

};

//specialisation for expressions

//prod(A+B,x)=> prod(A,x)+prod(B,x)

template<class A, class B, class X>

struct vector_prod_constructor<vector_addition<A,B>,X >{

typedef typename vector_prod_constructor<A,X>::type Atype;

typedef typename vector_prod_constructor<B,X>::type Btype;

typedef vector_addition<Atype,Btype> type;

type create(vector_addition<A,B> const& e, X const& x){

return type(Atype(e.expression1(),x),Btype(e.expression2(),x));

}

};

//etc...

Greetings,

Oswin

On 09.12.2013 15:08, Nasos Iliopoulos wrote:

Oswin,

what do you have in mind in 2.2?

On 12/07/2013 04:10 PM, oswin krause wrote:

2.2 automatic optimisation of expressions

long description

1. Rock solid implementation of the BLAS algorithms: matrix x vector and matrix x matrix in all useful combinations (sparse/sparse dense/dense dense/sparse ... triangular/dense...etc there are a lot), but also: matrix assignment. uBLAS will not be compared in #Features but in runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a fast implementation of this is also crucial for multi-core and distributed computations. Also all higher level decompositions (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely on fast BLAS.

2. Rewriting of the expression mechanic. The way uBLAS is implemented right now, writing

noalias(A)+=prod(B,C)+D

or

vec = prod(prod<matrix>(B,C),vec2);

is in most cases a bad decision. The second expression wouldn't even compile without the template parameter.

What I would like to have is the following:

2.1 automatic use of the fast algorithms. That is the above expression should be translated to

axpy_prod(B,C,A,false);

noalias(A) +=D;

right now the matrix_assign algorithms are used which do a terrible job on prod(B,C).

2.2 automatic optimization of expressions. The second expression above is really inefficient. First the matrix-matrix prod is evaluated and the result is afterwards used for a single matrix-vector prod.

now compare to this:

prod(B,prod<vector>(C,d));

only 2 matrix-vector products are performed which might save 99% of the FLOPS of the first expression. So it would be better if this would be transformed automatically. Normally the user could do that himself but sometimes it is not possible. Consider a Conjugate Gradient Solver:

x=CG(A,b);

a usual pattern of A is A=XX^T+c*Identity. CG uses internally only matrix-vector products - and possibly not many of them. Therefore you don't want to compute A because it is a) slow b) might require more memory than is available. Without expression optimizations it would be impossible to write a good CG that could do that.

Greetings,

Oswin

On 07.12.2013 11:38, David Bellot wrote:

Hi,

it has been a long discussion we all had for many months now. Should we rewrite ublas from scratch or simply improve it.

Joaquim and Oswin wants a brand new ublas

Nasos was more in favor of improving it.

I personally find very exciting the idea of writing something new, but ublas is very well known now. On the other hand, Eigen and Armadillo took the crown of the main C++ blas library in users' hearts.

On my todo list for ublas, there are things that will require ublas to be deeply changed. At this stage, we can almost talk about a new library.

Christmas is very close now, so maybe it's a good time to start talking about the features we wish for ublas and see if they can be implemented with the current version or if a new uBLAS 2.0 is necessary.

After all, Boost::signal did the same a few years ago. We can definitively do the transition.

I begin:

- unified representation of vectors and matrices to represent the fact that a vector IS a matrix. Matlab does the same

- automated use of different algorithm to let the compiler "chooses" the best implementation (if possible) and switch on SSE, distributed or whateve we need

- implementation of solvers and decompositions algorithms

and this is what Nasos and I think should be integrated too:

1. Matrix multiplication

2. Algorithms infrastructure (so that we can have real useful features)

3. Matrix/vector views for interoperability <- I think this is ultra critical because now ublas is monolithic in the sense that you have to use it everywhere you manipulate data. This would really help into letting people for example have a list of vectors (they are plotting) and ublas working on top of that to do for example transformations

4. NEW DOCUMENTATION - examples and the rest

5. Incorporate some critical bindings (i.e. mumps bindings which is currently probably the most efficient smp and distributed open source linalg solver)

6. matlab binding?7. distributed ublas

Please add and ESPECIALLY, please tell me your view on the current infrastructure of uBLAS. It seems many people are not happy with the current "expression template" grammar.

I'm open to everything

Best,

David

On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran <jduran.gm@gmail.com> wrote:

I think that al stuff pending of merge listed by David, should be merged and migrate to uBlas 2.0 and while uBlas 2.0 is in development/maintenance then design from scratch uBlas 3.0.

_______________________________________________ ublas mailing list ublas@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/ublas Sent to: Oswin.Krause@ruhr-uni-bochum.de

_______________________________________________ ublas mailing list ublas@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/ublas Sent to: athanasios.iliopoulos.ctr.gr@nrl.navy.mil

_______________________________________________ ublas mailing list ublas@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/ublas Sent to: Oswin.Krause@ruhr-uni-bochum.de