|
Boost : |
From: Olzhas Zhumabek (anonymous.from.applecity_at_[hidden])
Date: 2019-10-10 11:45:14
Hi,
I would like to attempt GPU implementation of some image processing
algorithms for my university project. I would like to be able to call as
much of original GIL as possible, minus the io extension.
*Problem description*
Memory is no longer uniform when heterogeneous system is used. CPU cannot
write to GPU's on-board VRAM without experiencing slow down. This makes the
solution that involves only writing an allocator impossible, because some
code in e.g. std::vector does some writing. The reverse is true as well,
GPU cannot write into RAM without experiencing slow down (not sure if it is
even possible in this direction). One could create a function that would
copy the data around when needed, but there is additional problem.
Imagine copying std::vector into GPU memory. The naive approach would be
copy top-level std::vector contents and then copy whatever it points at as
well. The problem is that then top-level representation in GPU memory will
still point to RAM memory that is not supposed to be used. One has to
somehow rewrite that pointer when copying.
*Candidate solution*
I'm thinking about creating an allocator that would remember what pointer
it gave to the caller. Then, allocate GPU memory of size image heap size +
contents of object itself aligned at 16 byte boundary (this part goes first
in memory of course). Copy the object contents into the memory, then look
for the value of pointer received from allocator in byte view of the
object, then rewrite it to point to the image contents portion of GPU
memory. Then just copy the bytes of contents. In a sense, this tightly
packs gil::image representation in GPU memory, as arguments to CUDA kernels
are passed in pointers/references anyway.
Is there a better approach than this? Are there any other types besides
gil::image and gil::kernel that deal with memory?
Best regards,
Olzhas
Boost list run by Boost-Gil-Owners