From: stefan (stefan_at_[hidden])
Date: 2019-10-10 19:35:40
this is a very interesting topic ! As it happens, I have worked on
HP(Embedded)C software with similar features in the past,
with support for heterogeneous memory management, multiple compute
engines, etc. See http://openvsip.org/
In fact, the idea for the memory management came from a tool I found
years ago, called StarPU: http://starpu.gforge.inria.fr/
The idea is that for a given user-facing object (A tensor, an image,
etc.), multiple mappings into different memory spaces exist, which are
updated on-demand, depending on where a computation is performed. In
OpenVSIP we used a "dispatcher" to decide what backend to use for a
given operation, and that dispatch logic was used on entire assignment
expressions (using C++ expression templates), so multiple unary and
binary operations could be fused together for maximum performance.
Of course, that's a whole lot of things to care for, and may be a bit
too much for your needs. I mentored a project for Boost.uBLAS summer '18
that added GPU support, using OpenCL, rather than CUDA (but the basic
idea would be the same).
So now there are two similar low-level APIs to handle operations on the
host or the device (in fact, given that ublas vectors and matrices
already are parametrized for storage, I simply added a "gpu" storage
type), so now you can:
* explicitly run a computation on the host
* explicitly run a computation on the device
* explicitly copy data from one side to the other
Something similar may work well for Boost.GIL, too, I suspect. Let me
know if you would like to chat in more details about any of this...
-- ...ich hab' noch einen Koffer in Berlin...
Boost list run by Boost-Gil-Owners