On 03.05.2018 09:16, Cem Bassoy via ublas wrote:


On Wed, May 2, 2018 at 10:11 PM, David Bellot via ublas <ublas@lists.boost.org> wrote:
- Should we integrate smart expression templates? I think there was a gsoc project but I am not sure. What was the output?

​it was really good

okay. I cannot see it in the development branch. Is it intended to be integrated into uBLAS?
 
- Are (smart) expression templates really required?

​but after second thought, I wonder like you.

I think smart expression templates could be beneficial in terms of selecting and executing high-performance kernels!
Expression templates seem to be outdated. See https://epubs.siam.org/doi/abs/10.1137/110830125

One of the GSoC projects is adding GPU support to uBLAS. And while it may be useful to have an API layer that lets users explicitly request the GPU backend to be used (for the supported subset of BLAS functions for which there are GPU kernels), we also may want to offer full integration, whereby a user just uses generic uBLAS expressions, and leaves the selection of the appropriate backend (GPU or other) to the library itself.
But to be able to implement this selection mechanism, we need some advanced dispatching technique. I'm not sure I understand enough about "smart expression templates", but I did implement such dispatching infrastructure in the past (http://openvsip.org/), which scales well to a high number of backends (GPU, SIMD vectorization, TBB-style parallelization, etc.).
I hope we can manage to get to the point where we can discuss what techniques are most appropriate for Boost.uBLAS, and perhaps even start to implement them over this summer. We shall see...


- How often do expressions like A = B*C + D*D - ... occur in numerical applications?
- Should we provide a fast gemm implementation of the Goto-Algorithm like in Eigen?

​why not.

Because, tuning algorithms and getting them to nearly peak performance for standard processors is a nontrivial task to my mind. I am not sure but we could try to integrate and link to existing highly optimized kernels.

As with the above, the problem really is to pick the right backend(s) for the current platform, as the optimal choice depends on many things (available hardware, data layout, problem size, etc.). Coming up with a good optimization strategy (and an architecture that supports it) is non-trivial.

 
And regarding the code infrastructure:

- Do we need iterators within matrix and vector template classes? Or can we generalize the concepts?

​once there's been a discussion about that. Can we factorize all this code into one place, one generic concept ?
This would make things so simple and efficient in the end.

Yes, I will try to built iterators for tensors so we can discuss this by investigating my code.

Sounds good.
 
- Can we maybe simplify/replace the projection function with overloaded brackets?

​Can we do that ? That would be awesome !​
 

Will try to show that it is possible.

 
General questions:
- Shall we build uBLAS a high-performance library?

​Yes, I suppose.
What do you mean exactly by "high-performance" ?​

I wanted to say that uBLAS could serve as an optimizer and dispatcher between different types of existing high-performance libraries instead of providing high-performance functions.

I agree. The project I referred to above (OpenVSIP) started out as a library implementing operations itself, until we realized how foolish an idea that was, at which point it turned more and more into "middleware", i.e. something like an "algorithm abstraction layer", which makes it easy to plug in new backends (to support new hardware, say), without the need for applications to change any code.

Boost has long prided itself reimplementing wheels. I hope we can overcome this NIH syndrom and demonstrate how beneficial it is to reuse existing know-how / technology. Focusing on C++ APIs  should be the goal of Boost, while good optimizations are certainly helpful to increase the rate of adoption.


Stefan
-- 

      ...ich hab' noch einen Koffer in Berlin...