
I remember there were postings in this mailing list before about trying to using proto with NVIDIA's CUDA compiler, and Joel Falcou has contributed some code to make proto work with nvcc. Well, I was trying few experiments with the most recent CUDA compiler release (3.2). I believe one of the exciting things about using proto with CUDA is the ability to write mini programs in different DSELs using proto and ship the mini programs over to the "device" side and execute it there. This idiom will look as follows : template<typename F> __global__ void kernel(F f) { f(...); } kernel<<<blocksize, gridsize>>>(a+b /* a big proto expression */); I tried to write the simplest form of the above use case : #include <boost/proto/proto.hpp> namespace proto=boost::proto; template<typename> struct expr; struct domain : proto::domain<proto::pod_generator<expr> > {}; template<typename E> struct expr { BOOST_PROTO_BASIC_EXTENDS(E, expr<E>, domain) }; template<typename F> __global__ void kernel(F f) { } struct term {}; expr<proto::terminal<term>::type> a = {{{}}}; expr<proto::terminal<term>::type> b = {{{}}}; expr<proto::terminal<term>::type> c = {{{}}}; int main() { kernel<<<1,1>>>(proto::deep_copy(b+c)); return 0; } Note the deep_copy when passing proto expression to kernel functions. Since proto captures terminals by reference, the host side pointers that are held in proto expressions won't make sense on the device side, so at the very least, we have to make copies of terminals before passing to the device. Now, compiling the above program with nvcc (CUDA 3.2, host compiler gcc 4.4.3, Ubuntu 10.04) results in the following error (besides spewing a bunch of warnings) : $ nvcc -v -keep t.cu -I $HOME/boost/boost-trunk/ ... t.cudafe1.stub.c: In function ‘void __device_stub__Z6kernelI4exprIN5boost5proto7exprns_10basic_exprINS2_3tag4plusENS2_7argsns_5list2IS0_INS4_IS6_NS8_IS0_INS4_INS5_8terminalENS7_4termI4termEELl0EEEESE_EELl2EEEES0_INS4_IS9_NSA_IiEELl0EEEEEELl2EEEEEvT_(_Z4exprIN5boost5proto7exprns_10basic_exprINS1_3tag4plusENS1_7argsns_5list2IS_INS3_IS5_NS7_IS_INS3_INS4_8terminalENS6_4termI4termEELl0EEEESD_EELl2EEEES_INS3_IS8_NS9_IiEELl0EEEEEELl2EEEE&)’: t.cudafe1.stub.c:8: error: invalid cast from type ‘const expr<boost::proto::exprns_::basic_expr<boost::proto::tag::address_of, boost::proto::argsns_::list1<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::plus, boost::proto::argsns_::list2<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::plus, boost::proto::argsns_::list2<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<term>, 0l> >, expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<term>, 0l> > >, 2l> >, expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<int>, 0l> > >, 2l> >&>, 1l> >’ to type ‘char*’ t.cudafe1.stub.c:8: error: invalid cast from type ‘const expr<boost::proto::exprns_::basic_expr<boost::proto::tag::address_of, boost::proto::argsns_::list1<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::plus, boost::proto::argsns_::list2<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::plus, boost::proto::argsns_::list2<expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<term>, 0l> >, expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<term>, 0l> > >, 2l> >, expr<boost::proto::exprns_::basic_expr<boost::proto::tag::terminal, boost::proto::argsns_::term<int>, 0l> > >, 2l> >&>, 1l> >’ to type ‘size_t’ # --error 0x1 -- The CUDA compiler works by splitting the original C++ code into "host" code and "device" code. The device code is completely lowered to C. So it seems like the above error is when passing a proto expression to the C-lowered version of device code. The proto expression is a simple POD type, so I am perplexed by the above error. (BTW, when the host compiler is gcc 4.1.3, the above program causes a segmentation fault in gcc, but that's for another forum). This might well be CUDA compiler's problem, but I dug a little deeper. I transformed the proto expression into a simpler POD type as follows : #include <boost/proto/proto.hpp> namespace proto=boost::proto; struct filler {}; template<typename tag, typename C0, typename C1=filler> struct myexpr { C0 c0; C1 c1; }; struct myadd {}; struct myterm {}; struct xform_callable : proto::callable { template<typename> struct result; template<typename This, typename LHS> struct result<This(LHS&)> { typedef myexpr<myterm, LHS> type; }; template<typename LHS> typename result<xform_callable(LHS&)>::type operator()(LHS &l) { myexpr<myterm, LHS> e = {l}; return e; } template<typename This, typename tag, typename LHS, typename RHS> struct result<This(tag, LHS, RHS)> { typedef myexpr<tag, LHS, RHS> type; }; template<typename tag, typename LHS, typename RHS> typename result<xform_callable(tag, LHS, RHS)>::type operator()(tag, LHS l, RHS r) { myexpr<tag, LHS, RHS> e = {l, r}; return e; } }; struct xform : proto::or_< proto::when<proto::plus<proto::_, proto::_>, xform_callable(myadd(), xform(proto::_left), xform(proto::_right))> , proto::when<proto::terminal<proto::_>, xform_callable(proto::_value)>
{}; template<typename> struct expr; struct domain : proto::domain<proto::pod_generator<expr> > {}; template<typename E> struct expr { BOOST_PROTO_BASIC_EXTENDS(E, expr<E>, domain) }; template<typename F> __global__ void kernel(F f) { } struct term {}; expr<proto::terminal<term>::type> a = {{{}}}; expr<proto::terminal<term>::type> b = {{{}}}; expr<proto::terminal<term>::type> c = {{{}}}; int main() { kernel<<<1,1>>>(xform()(b+c)); return 0; } The above program compiles fine with CUDA 3.2. That leaves me with the question : what does a proto expression contain more than myexpr above, that causes nvcc to fail? Again, this might be CUDA compiler's problem and they are the ones that should fix it. However, if someone in this forum can think of a simple fix to proto (or some other boost library that is actually causing the problem), that would be great too. The uniform template expansion across device and host code in CUDA programs is a powerful feature, and I envision some cool libraries that can be written using proto with CUDA. So making proto work with CUDA is a big win. Manjunath