|
Boost Users : |
Subject: Re: [Boost-users] [proto] : Using proto with NVIDIA CUDA nvcc (redux)
From: Manjunath Kudlur (keveman_at_[hidden])
Date: 2010-11-02 13:39:04
>
> That does help, and works like a charm. Seems like the C-lowering of
> the device code is a nasty business, and it's hard to pin point what
> is busted. But this workaround should be sufficient for many use
> cases, I think.
I ran into one more problem with passing proto expression to code
running on the GPU. Once I make a proto::deep_copy(), the expression
gets copied faithfully to the device. But the problem is in evaluation
of an expression on the GPU. The "canonical" way for evaluating a
lambda like language in proto is to stash the parameters of the
operator() function in a fusion::vector and pass it down as state to
the evaluator. But the at_c functions of fusion vector don't have the
__device__ attributes. Ok, so this needs some background on CUDA : In
CUDA, all the functions that you intend to execute on the GPU device
have to be "decorated" with the __device__ attribute, like so:
__device__ void foo(int a) { ... }
This is so that the CUDA compiler can rip them out into a separate
file and compile them for the device. Obviously, a __device__ function
can only call other __device__ functions. Now, coming back to
evaluating a proto expression, I can easily make the operator()
function of the expression as __device__. But if I use fusion vectors
for passing around parameters, then I can't call the at_c functions
for accessing the parameters, because they don't have the
__device__attribute. Well, technically, I can modify the Boost headers
and add the __device__ attributes everywhere, but that is intrusive
and not desirable. So, is there a way to evaluate a lambda like
language in proto some other way? Also, one more problem I can see in
proto itself is access to the values stored in terminals. The
proto::left() or proto::child_c<0>() functions also don't have the
__device__ attributes, so I think I have to resort to direct access of
the fields of a proto expression. So, the basic question is, how I can
write an evaluator in such a way that I don't call internal proto or
other boost functions during any of the intermediate steps?
Manjunath
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net