Boost Users :
From: Joel FALCOU (joel.falcou_at_[hidden])
Date: 2008-06-17 13:41:56
Some more infos to get ideas straight.
Given a pool of user defined functions or functor, I'll be able to write
code like :
run( seq<Foo>() | seq<Bar>() );
and have this code generate a parallel pipeline using MPI.
To do so, I define a semantic rule for each parallel construction
(called skeleton) that turns a sub-expression into a process network.
A process_network is a static datatype defined by a triplet <P,I,O>
- P : a list of processus
- I : a list of input nodes
- O : a list of outputs node
All those 'list' are in fact mpl::vector of types OR mpl::long_<>
process_network code is roughly this :
template<class P,class I,class O>
typedef P process;
typedef I inputs;
typedef O outputs;
typedef typename bm::size<process>::type cardinal;
static inline void run();
So P helds a list of process. A process is formally defined by :
- the processus unique PID; the processus descriptor.</li>
- a descriptor which is a list of "macro-instructions" for the
processus to perform at runtime.
template<class ID,class DESC>
typedef ID pid;
typedef DESC descriptor;
typedef processus<ID,DESC> type;
typedef typename descriptor::type code;
static inline void run();
Finally, the descriptor is formally defined by :
- a lis of PID of preceding processus
- a list of PID of following processus
- a list of macro-instructions
template<class IPID, class OPID,class CODE>
typedef IPID i_pids;
typedef OPID o_pids;
typedef CODE macro;
Supported skeletons are :
parallel execution (operator&)
farming which si tied to a special function called farm adn used like :
farm<N>( some skeleton expression )
All of these have to be built from the expression above. To do so I have
a set of rules. A rules take 2 process network and do stuff with its
processus liss. For example, pipeline two process network concatenates
the list of processus, add a Send macro instructions to all 'outputs'
processus and a Receive macro instructions to all 'inputs' processus.
Finally, the global out/input of the new network is set accordingly.
Currently, I do the following (and yes it is ugly) :
template<class X, class ID>
struct apply_rule< bp::expr<tag::bitwise_or,X,2>, ID >
typedef expr<tag::bitwise_and,X,2> base;
typedef typename result_of::arg_c<base, 0>::type::expr arg1;
typedef typename result_of::arg_c<base, 1>::type::expr arg2;
typedef typename apply_rule<arg1,ID>::type pn1;
typedef typename apply_rule<arg1,ID>::pid next_pid;
typedef typename apply_rule<arg2,next_pid>::type pn2;
typedef typename apply_rule<arg2,next_pid>::pid pid;
typedef typename rule_seq<pn1, pn2>::type type;
basically, i extarct the arguments of a proto expression, evaluates each
child of the expression to get their respective
process_network,retrieves the next available PID and apply the
corresponding rule to those intermediate result.
Doing so works well for all my rules even in complex cases. When using
proto transform and fold_tree, it doesn't. In fact, it even looks like
the next PID and temporary network are badly computed, thus leading me
to thinks that I badly use state and such.
-- Joel FALCOU Research Engineer @ Institut d'Electronique Fondamentale Université PARIS SUD XI France
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net