Ublas :

Date view	Thread view	Subject view	Author view

Subject: Re: [ublas] CI setup
From: Stefan Seefeld (stefan_at_[hidden])
Date: 2018-05-03 13:40:43

Next message: Cem Bassoy: "[ublas] Design of the tensor template class"
Previous message: Cem Bassoy: "Re: [ublas] CI setup"
In reply to: Cem Bassoy: "Re: [ublas] CI setup"
Next in thread: David Bellot: "Re: [ublas] CI setup"

On 03.05.2018 09:16, Cem Bassoy via ublas wrote:
>
>
> On Wed, May 2, 2018 at 10:11 PM, David Bellot via ublas
> <ublas_at_[hidden] <mailto:ublas_at_[hidden]>> wrote:
>
> - Should we integrate smart expression templates? I think
> there was a gsoc project but I am not sure. What was the output?
>
>
> â€‹it was really good
>
>
> okay. I cannot see it in the development branch. Is it intended to be
> integrated into uBLAS?
> Â
>
> â€‹
>
> - Are (smart) expression templates really required?
>
>
> â€‹but after second thought, I wonder like you.
>
>
> I think smart expression templates could be beneficial in terms of
> selecting and executing high-performance kernels!
> Expression templates seem to be outdated. See
> https://epubs.siam.org/doi/abs/10.1137/110830125

One of the GSoC projects is adding GPU support to uBLAS. And while it
may be useful to have an API layer that lets users explicitly request
the GPU backend to be used (for the supported subset of BLAS functions
for which there are GPU kernels), we also may want to offer full
integration, whereby a user just uses generic uBLAS expressions, and
leaves the selection of the appropriate backend (GPU or other) to the
library itself.
But to be able to implement this selection mechanism, we need some
advanced dispatching technique. I'm not sure I understand enough about
"smart expression templates", but I did implement such dispatching
infrastructure in the past (http://openvsip.org/), which scales well to
a high number of backends (GPU, SIMD vectorization, TBB-style
parallelization, etc.).
I hope we can manage to get to the point where we can discuss what
techniques are most appropriate for Boost.uBLAS, and perhaps even start
to implement them over this summer. We shall see...

>
> â€‹
>
> - How often do expressions like A = B*C + D*D - ... occur in
> numerical applications?
> - Should we provide a fast gemm implementation of the
> Goto-Algorithm like in Eigen?
>
>
> â€‹why not.
>
>
> Because, tuning algorithms and getting them to nearly peak performance
> for standard processors is a nontrivial task to my mind. I am not sure
> but we could try to integrate and link to existing highly optimized
> kernels.

As with the above, the problem really is to pick the right backend(s)
for the current platform, as the optimal choice depends on many things
(available hardware, data layout, problem size, etc.). Coming up with a
good optimization strategy (and an architecture that supports it) is
non-trivial.

> Â
>
> â€‹
>
> And regarding the code infrastructure:
>
> - Do we need iterators within matrix and vector template
> classes? Or can we generalize the concepts?
>
>
> â€‹once there's been a discussion about that. Can we factorize all
> this code into one place, one generic concept ?
> This would make things so simple and efficient in the end.
>
>
> Yes, I will try to built iterators for tensors so we can discuss this
> by investigating my code.

Sounds good.
> Â
>
> â€‹
>
> - Can we maybe simplify/replace the projection function with
> overloaded brackets?
>
>
> â€‹Can we do that ? That would be awesome !â€‹
> Â
>
>
> Will try to show that it is possible.
>
> Â
>
> General questions:
> - Shall we build uBLAS a high-performance library?
>
>
> â€‹Yes, I suppose.
> What do you mean exactly by "high-performance" ?â€‹
>
>
> I wanted to say that uBLAS could serve as an optimizer and dispatcher
> between different types of existing high-performance libraries instead
> of providing high-performance functions.

I agree. The project I referred to above (OpenVSIP) started out as a
library implementing operations itself, until we realized how foolish an
idea that was, at which point it turned more and more into "middleware",
i.e. something like an "algorithm abstraction layer", which makes it
easy to plug in new backends (to support new hardware, say), without the
need for applications to change any code.

Boost has long prided itself reimplementing wheels. I hope we can
overcome this NIH syndrom and demonstrate how beneficial it is to reuse
existing know-how / technology. Focusing on C++ APIsÂ should be the goal
of Boost, while good optimizations are certainly helpful to increase the
rate of adoption.

Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

text/html attachment: attachment

Next message: Cem Bassoy: "[ublas] Design of the tensor template class"
Previous message: Cem Bassoy: "Re: [ublas] CI setup"
In reply to: Cem Bassoy: "Re: [ublas] CI setup"
Next in thread: David Bellot: "Re: [ublas] CI setup"

Date view	Thread view	Subject view	Author view