Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [compute] Some questions
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-23 11:29:22

Next message: Mathias Gaunard: "[boost] [compute] kernels as strings impairs readability and maintainability"
Previous message: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
In reply to: Andrey Semashev: "[boost] [compute] Some questions"
Next in thread: Andrey Semashev: "Re: [boost] [compute] Some questions"
Reply: Andrey Semashev: "Re: [boost] [compute] Some questions"

On Tue, Dec 23, 2014 at 1:20 AM, Andrey Semashev
<andrey.semashev_at_[hidden]> wrote:
> Hi,
>
> I have no experience with OpenCL or GPU computing in general, so bear
> with me if my questions sound silly. I have a few questions regarding
> Boost.Compute:
>
> 1. When you define a kernel (e.g. with the BOOST_COMPUTE_FUNCTION
> macro), is this kernel supposed to be in C? Can it reference global
> (namespace scope) objects and other functions? Other kernels?

Yes, the source code for OpenCL kernels and functions is specified in
OpenCL C which is a dialect of C99 with extensions for vectorized
operations.

There are a few ways to specific kernel functions which reference
global C++ values. One is the BOOST_COMPUTE_CLOSURE() macro [1] which
works similarly to BOOST_COMPUTE_FUNCTION(), but also allows a
lambda-like capture list of C++ values.

Another option is to specify your function with extra arguments for
the global objects and then bind them to the function with
boost::compute::bind() [2].

> 2. When is the kernel compiled and uploaded to the device? Is it
> possible to cache and reuse the compiled kernel?

If writing a custom kernel, the kernel is built when the
"program::build()" method is called. Internally, the higher-level
algorithms compile programs when they're needed and store them in a
global program cache.

And yes, compiled program and kernel objects can be stored and re-used
(this is strongly recommended). Boost.Compute provides the
program_cache class [3] which is used stores frequently used programs
as compiled objects.

> 3. Why is the library not thread-safe by default? I'd say, we're long
> past single-threaded systems now, and having to always define the
> config macro is a nuisance.

I would very much like to have it thread-safe by default. This is a
problem however with keeping the library header-only and useable with
C++03 compilers. The BOOST_COMPUTE_THREAD_SAFE macro basically just
instructs Boost.Compute to use the C++11 "thread_local" specifier for
global objects instead of "static". With C++03 compilers, this will
use boost::thread_specific_ptr<> which then requires users to also
link to Boost.Thread.

That said, I still don't think it's ideal and I am very open to
ideas/patches which improve this.

> 4. Is it possible to upload the data to process to the device's local
> memroy from a user-provided buffer, without copying it to
> boost::compute::vector? Same for downloading. What I'd like to do is
> move some of data processing to the GPU while the rest is performed on
> the CPU (possibly with other libraries), and avoid excessive copying.

Yes, that is what the mapped_view class [4] is for. It maps a region
of host-memory to device-memory and provides a std::vector-like
interface on top of it so it may be used with Boost.Compute algorithms
or custom kernels.

> 5. Is it possible to pass buffers in the device-local memory between
> different processes (on the CPU) without downloading/uploading data
> to/from the CPU memory?

This is not supported by OpenCL (at least not in any standard or
portable way). Memory buffers belong to OpenCL contexts, and contexts
are created per-process without any mechanisms to share them with
other processes.

If anyone has any experience/ideas with sharing OpenCL contexts
between processes I'd be very interested in trying to get this work.

> 6. Is it possible to discover device capabilities? E.g. the amount of
> local memory (total/used/free), execution units, vendor and device
> name?

Yes, the device class [5] provides a number of methods for returning
information about the device including the generic get_info()
function.

Specifically for those cases you listed you could use:

* Local memory: device.local_memory_size()
* Execution units: device.compute_units()
* Vendor name: device.vendor()
* Device name: device.name()

Thanks for the questions. Let me know if I can explain anything better.

-kyle

[1] http://kylelutz.github.io/compute/BOOST_COMPUTE_CLOSURE.html
[2] http://kylelutz.github.io/compute/boost/compute/bind.html
[3] http://kylelutz.github.io/compute/boost/compute/program_cache.html
[4] http://kylelutz.github.io/compute/boost/compute/mapped_view.html
[5] http://kylelutz.github.io/compute/boost/compute/device.html

Next message: Mathias Gaunard: "[boost] [compute] kernels as strings impairs readability and maintainability"
Previous message: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
In reply to: Andrey Semashev: "[boost] [compute] Some questions"
Next in thread: Andrey Semashev: "Re: [boost] [compute] Some questions"
Reply: Andrey Semashev: "Re: [boost] [compute] Some questions"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk