From: stefan (stefan_at_[hidden])
Date: 2019-10-10 19:35:40

Hi Olzhas,

this is a very interesting topic ! As it happens, I have worked on
HP(Embedded)C software with similar features in the past,

with support for heterogeneous memory management, multiple compute
engines, etc. See

In fact, the idea for the memory management came from a tool I found
years ago, called StarPU:

The idea is that for a given user-facing object (A tensor, an image,
etc.), multiple mappings into different memory spaces exist, which are
updated on-demand, depending on where a computation is performed. In
OpenVSIP we used a "dispatcher" to decide what backend to use for a
given operation, and that dispatch logic was used on entire assignment
expressions (using C++ expression templates), so multiple unary and
binary operations could be fused together for maximum performance.

Of course, that's a whole lot of things to care for, and may be a bit
too much for your needs. I mentored a project for Boost.uBLAS summer '18
that added GPU support, using OpenCL, rather than CUDA (but the basic
idea would be the same).

So now there are two similar low-level APIs to handle operations on the
host or the device (in fact, given that ublas vectors and matrices
already are parametrized for storage, I simply added a "gpu" storage
type), so now you can:

* explicitly run a computation on the host
* explicitly run a computation on the device
* explicitly copy data from one side to the other

Something similar may work well for Boost.GIL, too, I suspect. Let me
know if you would like to chat in more details about any of this...


