Boost logo

Ublas :

Subject: Re: [ublas] cublas again
From: Rutger ter Borg (rutger_at_[hidden])
Date: 2011-03-21 03:48:15

On 03/19/2011 10:35 PM, Andrey Asadchev wrote:
> Hello.
> I have incorporated cublas bindings on top of blas bindings, such that
> you can use blas and cublas at once:


Hello Andrey,

cool stuff! I've been pondering about how to do GPU-based computation
for a while now.

I think we could greatly benefit if we'd use NVidia's support for
compiling C/C++ code directly for the GPU, through nvcc. I'm not too
much of an expert yet, but cuda supports running "kernels" which are
similar to functors.

Anyway, first things first, and IMHO that would be to choose the right
computational model to get the most out of both the CPU and GPU. I think
we should follow an asynchronous programming model to achieve that. To
get an idea, here's an example:

boost::asio::io_service ios;
ublas::vector< double > vec1;
cuda::vector< double > vec2( ios );

// this operation calls nvidia's asynchronous copy operation
// or vec2.async_assign( ... );
cuda::async_copy( begin( vec1 ), begin( vec2 ), end( vec2 ),
boost::bind( &copy_done, _1 ) );
.... do stuff on the CPU while the copy to the GPU is in progress ......
blas::gemm( .... );
.... do more stuff

void copy_done( const boost::system::error_code& error ) {
   if ( !error ) {
     bindings::cuda::async_run( some_kernel(), &kernel_done );

void kernel_done( const boost::system::error_code& error ) {

in some other source file some_kernel, compile with nvcc, ends up with
native GPU code:

class some_kernel() {
   operator()() {
       ... not 100% of C/C++ supported yet
       ... cublas::gemm( .... );

The next question would be on how to get this rolling. Although the
numeric bindings bind to external numeric libraries, the cublas/blas
algorithms might be a good fit. Containers, on the other hand, (and the
stuff shown above) might warrant a separate library, or an extension to

What do you guys think?