Boost logo

Ublas :

Subject: Re: [ublas] Status of development /Benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2013-12-09 09:15:38


On 12/08/2013 04:18 AM, oswin krause wrote:
> Hi,
>
> I agree with you and I think a rewrite wouldn't change the
> "high-level" syntax. I like the way expressions are formed right now.
> There is no real way to get rid of noalias as there is no real cheap
> solution for alias-tests. And honestly i like prod() over op* because
> prod() scrams in your face "this is going to take a while".
As a matter of fact we can easily overcome the assignment no-alias
issue by using r-value references. The only slight problem there is
that in such cases compiler do not optimize the way they do without
r-value references atm, but I think this will be worked out at some point.

For anything else (i.e +=) as you correctly point out we will need a
more expensive alias test mechanism.
>>
>> for this outer parallelization to be useful, one needs guarantees
>> that openmp is not used in the inner loop... otherwise problems may
>> arise with nested parallelism.
> I don't see the issue here. OpenMP has in most configurations a fixed
> size thread pool and would never spawn more threads than allowed.
> Further it is allowed for an OpenMP thread to spawn more threads.
>
> The real issue is, OpenMP does not and will never support C++. There
> is OpenMP support in gcc but in fact there is no guarantee that this
> works as the C++11 memory model is not compatible with the OpenMP
> model. AFAIK there are no plans for a C++11 compatible OpenMP
> standard. Therefore we can't expect that there ever is OpenMP support
> for clang or other compilers and I really think that uBLAS would need
> to use proper threading models, for example using asio thread_pool
>
>
>
> On 07.12.2013 23:35, Riccardo Rossi wrote:
>> Dear Oswin,
>>
>> my first point is really trivial. i do not doubt that the current
>> expression template mechanism is not optimal.
>>
>> the only thing i propose is to mantain a small backward compatibility
>> layer so that code that works with the old ublas is not unnecessarily
>> broken.
>>
>> for example if you assume that the novel syntax is
>>
>> C += A*B
>>
>> i believe that it could be useful to define some wraper between the
>> old and new syntax. the easiest way (dirty) would be to define two macros
>>
>> "noalias" and " prod"
>>
>> so that
>> noalias(C) += prod(A,B)
>>
>> expands to the new format... indeed using macros is not a good idea
>> but it gives the point
>>
>>
>>
>>
>>
>>
>> my second point is mostly about matrices of size say 30*30, whose
>> size is NOT known at compile time.
>> for such matrix size openmp is not useful, however one may perform in
>> parallel a lot of operations on different small matrices,
>> parallelizing some sort of outer iteration.
>> for this outer parallelization to be useful, one needs guarantees
>> that openmp is not used in the inner loop... otherwise problems may
>> arise with nested parallelism.
>> ...i only implied this...
>>
>>
>> ciao
>> Riccardo
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Dec 7, 2013 at 10:10 PM, oswin krause
>> <oswin.krause_at_[hidden]
>> <mailto:oswin.krause_at_[hidden]>> wrote:
>>
>> Hi,
>>
>> I like your ideas. But i am not sure what you mean with 1. and 2.
>> The problem I see is, that uBLAS right now doesn't offer the
>> desired quality and speed for single core, let alone multi core.
>> Therefore I will focus on the internal infrastructure instead of
>> motivating new algorithms.
>>
>> short description:
>> 1. Rock solid implementation of the BLAS algorithms( "compute
>> kernels")
>> 2. Rewriting of the expression mechanic.
>> 2.1 automatic use of the compute kernels for the operations
>> 2.2 automatic optimisation of expressions
>>
>>
>> long description
>> 1. Rock solid implementation of the BLAS algorithms: matrix x
>> vector and matrix x matrix in all useful combinations
>> (sparse/sparse dense/dense dense/sparse ...
>> triangular/dense...etc there are a lot), but also: matrix
>> assignment. uBLAS will not be compared in #Features but in
>> runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a fast
>> implementation of this is also crucial for multi-core and
>> distributed computations. Also all higher level decompositions
>> (Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely
>> on fast BLAS.
>>
>> 2. Rewriting of the expression mechanic. The way uBLAS is
>> implemented right now, writing
>>
>> noalias(A)+=prod(B,C)+D
>> or
>> vec = prod(prod<matrix>(B,C),vec2);
>>
>> is in most cases a bad decision. The second expression wouldn't
>> even compile without the template parameter.
>> What I would like to have is the following:
>>
>> 2.1 automatic use of the fast algorithms. That is the above
>> expression should be translated to
>> axpy_prod(B,C,A,false);
>> noalias(A) +=D;
>>
>> right now the matrix_assign algorithms are used which do a
>> terrible job on prod(B,C).
>>
>> 2.2 automatic optimization of expressions. The second expression
>> above is really inefficient. First the matrix-matrix prod is
>> evaluated and the result is afterwards used for a single
>> matrix-vector prod.
>>
>> now compare to this:
>> prod(B,prod<vector>(C,d));
>>
>> only 2 matrix-vector products are performed which might save 99%
>> of the FLOPS of the first expression. So it would be better if
>> this would be transformed automatically. Normally the user could
>> do that himself but sometimes it is not possible. Consider a
>> Conjugate Gradient Solver:
>>
>> x=CG(A,b);
>>
>> a usual pattern of A is A=XX^T+c*Identity. CG uses internally
>> only matrix-vector products - and possibly not many of them.
>> Therefore you don't want to compute A because it is a) slow b)
>> might require more memory than is available. Without expression
>> optimizations it would be impossible to write a good CG that
>> could do that.
>>
>> Greetings,
>> Oswin
>>
>>
>>
>>
>> On 07.12.2013 11:38, David Bellot wrote:
>>> Hi,
>>>
>>> it has been a long discussion we all had for many months now.
>>> Should we rewrite ublas from scratch or simply improve it.
>>> Joaquim and Oswin wants a brand new ublas
>>> Nasos was more in favor of improving it.
>>>
>>> I personally find very exciting the idea of writing something
>>> new, but ublas is very well known now. On the other hand, Eigen
>>> and Armadillo took the crown of the main C++ blas library in
>>> users' hearts.
>>>
>>> On my todo list for ublas, there are things that will require
>>> ublas to be deeply changed. At this stage, we can almost talk
>>> about a new library.
>>>
>>> Christmas is very close now, so maybe it's a good time to start
>>> talking about the features we wish for ublas and see if they can
>>> be implemented with the current version or if a new uBLAS 2.0 is
>>> necessary.
>>> After all, Boost::signal did the same a few years ago. We can
>>> definitively do the transition.
>>>
>>>
>>> I begin:
>>>
>>> - unified representation of vectors and matrices to represent
>>> the fact that a vector IS a matrix. Matlab does the same
>>> - automated use of different algorithm to let the compiler
>>> "chooses" the best implementation (if possible) and switch on
>>> SSE, distributed or whateve we need
>>> - implementation of solvers and decompositions algorithms
>>>
>>> and this is what Nasos and I think should be integrated too:
>>> 1. Matrix multiplication
>>> 2. Algorithms infrastructure (so that we can have real useful
>>> features)
>>> 3. Matrix/vector views for interoperability <- I think this is
>>> ultra critical because now ublas is monolithic in the sense that
>>> you have to use it everywhere you manipulate data. This would
>>> really help into letting people for example have a list of
>>> vectors (they are plotting) and ublas working on top of that to
>>> do for example transformations
>>> 4. NEW DOCUMENTATION - examples and the rest
>>> 5. Incorporate some critical bindings (i.e. mumps bindings which
>>> is currently probably the most efficient smp and distributed
>>> open source linalg solver)
>>> 6. matlab binding?
>>> 7. distributed ublas
>>>
>>>
>>> Please add and ESPECIALLY, please tell me your view on the
>>> current infrastructure of uBLAS. It seems many people are not
>>> happy with the current "expression template" grammar.
>>>
>>> I'm open to everything
>>>
>>> Best,
>>> David
>>>
>>>
>>> On Thu, Dec 5, 2013 at 11:14 AM, Joaquim Duran
>>> <jduran.gm_at_[hidden] <mailto:jduran.gm_at_[hidden]>> wrote:
>>>
>>> I think that al stuff pending of merge listed by David,
>>> should be merged and migrate to uBlas 2.0 and while uBlas
>>> 2.0 is in development/maintenance then design from scratch
>>> uBlas 3.0.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ublas mailing list
>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>> Sent to:Oswin.Krause_at_[hidden] <mailto:Oswin.Krause_at_[hidden]>
>>
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to: rrossi_at_[hidden] <mailto:rrossi_at_[hidden]>
>>
>>
>>
>>
>> --
>>
>> Dr. Riccardo Rossi, Civil Engineer
>>
>> Member of Kratos Team
>>
>> International Center for Numerical Methods in Engineering - CIMNE
>> Campus Norte, Edificio C1
>>
>> c/ Gran Capitán s/n
>>
>> 08034 Barcelona, España
>>
>> Tel: (+34) 93 401 56 96
>>
>> Fax: (+34) 93.401.6517
>>
>> web:www.cimne.com <http://www.cimne.com/>
>>
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to:Oswin.Krause_at_[hidden]
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]