Boost logo

Boost :

Subject: Re: [boost] [compute] GPGPU Library - Request For Feedback
From: Hartmut Kaiser (hartmut.kaiser_at_[hidden])
Date: 2013-03-02 21:15:28

> A while back I posted a message asking for interest in a GPGPU computing
> library and the response seemed positive. I've been slowly working on it
> for the last few months and it has finally reached a usable state. I've
> made an initial release on GitHub (details below) and would like to get
> feedback from the community.
> The Boost Compute library provides a partial implementation of the C++
> standard library for GPUs and multi-core CPUs. It includes common
> containers (vector<T>, flat_set<T>) and standard algorithms (transform,
> sort, accumulate). It also features a number of extensions including
> parallel-computing focused algorithms (exclusive_scan, scatter, reduce)
> along with a number of fancy iterators (transform_iterator,
> permutation_iterator). The library is built around the OpenCL framework
> which allows it to be portable across many types of devices (GPUs, CPUs,
> and accelerator cards) from many different vendors (NVIDIA, Intel, AMD).
> The source code and documentation are available from the links below.
> Code:
> Documentation:
> Bug Tracker:
> I've tested the library with GCC 4.7 and Clang 3.3 on both NVIDIA GPUs and
> Intel CPUs. However, I would not yet consider the library production-
> ready.
> Most of my time has been devoted to reaching a solid and well-tested API
> rather than on performance. Over time this will improve.
> Feel free to send any questions, comments or feedback.

Looks interesting. One question: what support does the library provide to
orchestrate parallelism, i.e. doing useful work while the GPGPU is executing
a kernel? Do you have something like:

int main()
    // create data array on host
    int host_data[] = { 1, 3, 5, 7, 9 };

    // create vector on device
    boost::compute::vector<int> device_vector(5);

    // copy from host to device
    future<void> f = boost::compute::copy_async(host_data,
                         host_data + 5,

    // do other stuff

    f.get(); // wait for transfer to be done

    return 0;


All libraries I have seen so far assume that the CPU has to idle while
waiting for the GPU, is yours different?

Regards Hartmut

Boost list run by bdawes at, gregod at, cpdaniel at, john at