
Ublas : 
Subject: Re: [ublas] Status of development /Benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 20131209 09:15:38
On 12/08/2013 04:18 AM, oswin krause wrote:
> Hi,
>
> I agree with you and I think a rewrite wouldn't change the
> "highlevel" syntax. I like the way expressions are formed right now.
> There is no real way to get rid of noalias as there is no real cheap
> solution for aliastests. And honestly i like prod() over op* because
> prod() scrams in your face "this is going to take a while".
As a matter of fact we can easily overcome the assignment noalias
issue by using rvalue references. The only slight problem there is
that in such cases compiler do not optimize the way they do without
rvalue references atm, but I think this will be worked out at some point.
For anything else (i.e +=) as you correctly point out we will need a
more expensive alias test mechanism.
>>
>> for this outer parallelization to be useful, one needs guarantees
>> that openmp is not used in the inner loop... otherwise problems may
>> arise with nested parallelism.
> I don't see the issue here. OpenMP has in most configurations a fixed
> size thread pool and would never spawn more threads than allowed.
> Further it is allowed for an OpenMP thread to spawn more threads.
>
> The real issue is, OpenMP does not and will never support C++. There
> is OpenMP support in gcc but in fact there is no guarantee that this
> works as the C++11 memory model is not compatible with the OpenMP
> model. AFAIK there are no plans for a C++11 compatible OpenMP
> standard. Therefore we can't expect that there ever is OpenMP support
> for clang or other compilers and I really think that uBLAS would need
> to use proper threading models, for example using asio thread_pool
>
>
>
> On 07.12.2013 23:35, Riccardo Rossi wrote:
>> Dear Oswin,
>>
>> my first point is really trivial. i do not doubt that the current
>> expression template mechanism is not optimal.
>>
>> the only thing i propose is to mantain a small backward compatibility
>> layer so that code that works with the old ublas is not unnecessarily
>> broken.
>>
>> for example if you assume that the novel syntax is
>>
>> C += A*B
>>
>> i believe that it could be useful to define some wraper between the
>> old and new syntax. the easiest way (dirty) would be to define two macros
>>
>> "noalias" and " prod"
>>
>> so that
>> noalias(C) += prod(A,B)
>>
>> expands to the new format... indeed using macros is not a good idea
>> but it gives the point
>>
>>
>>
>>
>>
>>
>> my second point is mostly about matrices of size say 30*30, whose
>> size is NOT known at compile time.
>> for such matrix size openmp is not useful, however one may perform in
>> parallel a lot of operations on different small matrices,
>> parallelizing some sort of outer iteration.
>> for this outer parallelization to be useful, one needs guarantees
>> that openmp is not used in the inner loop... otherwise problems may
>> arise with nested parallelism.
>> ...i only implied this...
>>
>>
>> ciao
>> Riccardo
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Dec 7, 2013 at 10:10 PM, oswin krause
>> <oswin.krause_at_[hidden]
>> <mailto:oswin.krause_at_[hidden]>> wrote:
>>
>> Hi,
>>
>> I like your ideas. But i am not sure what you mean with 1. and 2.
>> The problem I see is, that uBLAS right now doesn't offer the
>> desired quality and speed for single core, let alone multi core.
>> Therefore I will focus on the internal infrastructure instead of
>> motivating new algorithms.
>>
>> short description:
>> 1. Rock solid implementation of the BLAS algorithms( "compute
>> kernels")
>> 2. Rewriting of the expression mechanic.
>> 2.1 automatic use of the compute kernels for the operations
>> 2.2 automatic optimisation of expressions
>>
>>
>> long description
>> 1. Rock solid implementation of the BLAS algorithms: matrix x
>> vector and matrix x matrix in all useful combinations
>> (sparse/sparse dense/dense dense/sparse ...
>> triangular/dense...etc there are a lot), but also: matrix
>> assignment. uBLAS will not be compared in #Features but in
>> runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a fast
>> implementation of this is also crucial for multicore and
>> distributed computations. Also all higher level decompositions
>> (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely
>> on fast BLAS.
>>
>> 2. Rewriting of the expression mechanic. The way uBLAS is
>> implemented right now, writing
>>
>> noalias(A)+=prod(B,C)+D
>> or
>> vec = prod(prod<matrix>(B,C),vec2);
>>
>> is in most cases a bad decision. The second expression wouldn't
>> even compile without the template parameter.
>> What I would like to have is the following:
>>
>> 2.1 automatic use of the fast algorithms. That is the above
>> expression should be translated to
>> axpy_prod(B,C,A,false);
>> noalias(A) +=D;
>>
>> right now the matrix_assign algorithms are used which do a
>> terrible job on prod(B,C).
>>
>> 2.2 automatic optimization of expressions. The second expression
>> above is really inefficient. First the matrixmatrix prod is
>> evaluated and the result is afterwards used for a single
>> matrixvector prod.
>>
>> now compare to this:
>> prod(B,prod<vector>(C,d));
>>
>> only 2 matrixvector products are performed which might save 99%
>> of the FLOPS of the first expression. So it would be better if
>> this would be transformed automatically. Normally the user could
>> do that himself but sometimes it is not possible. Consider a
>> Conjugate Gradient Solver:
>>
>> x=CG(A,b);
>>
>> a usual pattern of A is A=XX^T+c*Identity. CG uses internally
>> only matrixvector products  and possibly not many of them.
>> Therefore you don't want to compute A because it is a) slow b)
>> might require more memory than is available. Without expression
>> optimizations it would be impossible to write a good CG that
>> could do that.
>>
>> Greetings,
>> Oswin
>>
>>
>>
>>
>> On 07.12.2013 11:38, David Bellot wrote:
>>> Hi,
>>>
>>> it has been a long discussion we all had for many months now.
>>> Should we rewrite ublas from scratch or simply improve it.
>>> Joaquim and Oswin wants a brand new ublas
>>> Nasos was more in favor of improving it.
>>>
>>> I personally find very exciting the idea of writing something
>>> new, but ublas is very well known now. On the other hand, Eigen
>>> and Armadillo took the crown of the main C++ blas library in
>>> users' hearts.
>>>
>>> On my todo list for ublas, there are things that will require
>>> ublas to be deeply changed. At this stage, we can almost talk
>>> about a new library.
>>>
>>> Christmas is very close now, so maybe it's a good time to start
>>> talking about the features we wish for ublas and see if they can
>>> be implemented with the current version or if a new uBLAS 2.0 is
>>> necessary.
>>> After all, Boost::signal did the same a few years ago. We can
>>> definitively do the transition.
>>>
>>>
>>> I begin:
>>>
>>>  unified representation of vectors and matrices to represent
>>> the fact that a vector IS a matrix. Matlab does the same
>>>  automated use of different algorithm to let the compiler
>>> "chooses" the best implementation (if possible) and switch on
>>> SSE, distributed or whateve we need
>>>  implementation of solvers and decompositions algorithms
>>>
>>> and this is what Nasos and I think should be integrated too:
>>> 1. Matrix multiplication
>>> 2. Algorithms infrastructure (so that we can have real useful
>>> features)
>>> 3. Matrix/vector views for interoperability < I think this is
>>> ultra critical because now ublas is monolithic in the sense that
>>> you have to use it everywhere you manipulate data. This would
>>> really help into letting people for example have a list of
>>> vectors (they are plotting) and ublas working on top of that to
>>> do for example transformations
>>> 4. NEW DOCUMENTATION  examples and the rest
>>> 5. Incorporate some critical bindings (i.e. mumps bindings which
>>> is currently probably the most efficient smp and distributed
>>> open source linalg solver)
>>> 6. matlab binding?
>>> 7. distributed ublas
>>>
>>>
>>> Please add and ESPECIALLY, please tell me your view on the
>>> current infrastructure of uBLAS. It seems many people are not
>>> happy with the current "expression template" grammar.
>>>
>>> I'm open to everything
>>>
>>> Best,
>>> David
>>>
>>>
>>> On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran
>>> <jduran.gm_at_[hidden] <mailto:jduran.gm_at_[hidden]>> wrote:
>>>
>>> I think that al stuff pending of merge listed by David,
>>> should be merged and migrate to uBlas 2.0 and while uBlas
>>> 2.0 is in development/maintenance then design from scratch
>>> uBlas 3.0.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ublas mailing list
>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>> Sent to:Oswin.Krause_at_[hidden] <mailto:Oswin.Krause_at_[hidden]>
>>
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to: rrossi_at_[hidden] <mailto:rrossi_at_[hidden]>
>>
>>
>>
>>
>> 
>>
>> Dr. Riccardo Rossi, Civil Engineer
>>
>> Member of Kratos Team
>>
>> International Center for Numerical Methods in Engineering  CIMNE
>> Campus Norte, Edificio C1
>>
>> c/ Gran Capitán s/n
>>
>> 08034 Barcelona, España
>>
>> Tel: (+34) 93 401 56 96
>>
>> Fax: (+34) 93.401.6517
>>
>> web:www.cimne.com <http://www.cimne.com/>
>>
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to:Oswin.Krause_at_[hidden]
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]