Proto Compilers 101

 

The purpose of this doc is to bring you up to speed on what proto’s compilers do, and to make you proficient at applying transformations to expressions. Maybe someday we can figure out how to move this tree-manipulation logic into Fusion, but for now, proto’s compilers are the best (only?) tool for the job.

 

Expression Template Parse Tress

 

By now, you are probably familiar enough with the proto::expr<> tree data structure, and how it is generated by proto’s operator overloads. But just to recap, here is the forward declaration of struct expr<>:

 

template<typename Tag, typename Args, long Arity = Args::size>

struct expr;

 

Args is one of proto::argsN<>, and Tag can be anything, but is usually one of the operator tags defined by proto.

 

proto::argsN<> is a compile-time list of types. With one exception, the types are themselves “expression types,” where an expression type is either another instantiation of expr<> or an instantiation of proto::ref<expr<> >.  proto::ref<> is used to hold an expr<> by reference. The exception mentioned above is that when Tag is “tag::terminal,” Args is arg1<T> where T is allowed to be any type.

 

This framework allows us to build heterogeneous tree data structures where the nodes are stored by value or by reference. Often times, those trees are not in an ideal form for processing. Some tree transformations must be applied first. That’s where proto’s compilers come in.

 

Design Goals of Proto’s Compilers

 

The goal of proto’s compilers is to provide a general and somewhat declarative means for expressing transformations on proto parse trees. A secondary goal is to automate as much as possible and move common operations into the library.

 

The top-level function “proto::compile()” is used to apply a transformation to a parse tree. However, it is not an algorithm like “fusion::transform()”. “fusion::transform()” applies a transformation to each element in a flat(tened) sequence. But when transforming a parse tree, one transform is usually not enough. Rather, many small transforms must be applied, where each is only relevant to small parts of the tree. “proto::compile()” is just a shell that dispatches to user-specified transforms (a.k.a., compilers) that are looked up according to the properties of the tree currently being transformed.

 

The only concern of proto’s compilers is: “What transformation do I apply at a particular node in a parse tree?” There are two considerations:

 

  • Tag Dispatching: In all likelihood, the transformation to apply depends on the operator tag; e.g., for right shift, apply transformation A; for bitwise or, apply transformation B. The transformation to apply at any point in the parse tree should be dispatched based on the expression tag.
  • Domain Dispatching: A proto parse tree is an abstract entity. The same parse tree may mean one thing to Spirit and a different thing to xpressive, for instance. Therefore, all tree transformations must be associated with a domain, so that the parse tree is allowed to mean different things in different contexts. Therefore, the transformation to apply at any point in the parse tree should also be dispatched based on the domain tag.

 

These considerations lead naturally to a design where transformations are double-dispatched on the node type and domain tag. The exact form the dispatching has taken in proto is called “compiler<>” and it has the following forward declaration:

 

template<typename OpTag, typename DomainTag, typename EnableIf = void>

struct compiler;

 

You will direct proto to transform a parse tree by specializing struct compiler<> for the operator tags that appear in your DSEL and on your domain tag(s).

 

The Proto “compiler<>” Struct

 

Since proto parse trees are like Fusion data structures in that they are heterogeneous and statically typed, yet contain runtime data, proto compilers have two jobs:

 

  1. Compute a type by applying a compile-time transform
  2. Construct an object of the above type by applying a runtime transform

 

Therefore a compiler<> specialization has both a compile-time meta-function and a runtime function associated with it.

 

As with all statically bound, heterogeneous algorithms, compilers must operate in a “pure functional” environment where new values are computed from old, and state is accumulated, not mutated. Therefore, in addition to accepting the expression to transform, a compiler must also accept a “state” parameter, which represents the state of the transformation so far. Based on the requirements so far, our compiler<> might look like this:

 

template<>

struct compiler< tag::right_shift, spirit2 >

{

    template<typename Expr, typename State … >

    struct apply

    {

        typedef  … type;

    };

 

    template<typename Expr, typename State … >

    static typename apply<Expr, State …>::type

    call( Expr const &expr, State const & state … )

    {

        return …;

    }

};

 

The nested apply<> template is used for the type computation, and the static call() function does the runtime transformation.

 

For many reasons, it is often desirable to pass along a bundle of mutable data along with the immutable expression and state parameters. For this reason, proto compilers also accept a visitor parameter. You are free to use the visitor for anything. So the final form of the proto compilers is:

 

template<>

struct compiler< tag::right_shift, spirit2 >

{

    template<typename Expr, typename State, typename Visitor >

    struct apply

    {

        typedef  … type;

    };

 

    template<typename Expr, typename State, typename Visitor >

    static typename apply<Expr, State, Visitor>::type

    call( Expr const &expr, State const & state, Visitor & visitor )

    {

        return …;

    }

};

 

Note that the visitor is passed by non-const reference. This reflects its role at a bundle of mutable data.

 

The “proto::compile()” Function

 

This function is nothing more than a thin wrapper that dispatched to a appropriate user-specified compiler. Aside from some minor details regarding the handing of literals and references, what it does is essentially this:

 

template<typename Expr, typename State, typename Visitor, typename Domain>

typename meta::compile<Expr, State, Visitor, Domain>::type

compile(

            Expr const & expr,                  // The expr to compile

            State const & state,                 // The state of the transformation so far

            Visitor & visitor,                     // The visitor (is mutable)

            Domain /*tag*/                       // The domain tag

)

{

            typedef typename meta::tag<Expr>::type tag_type;

            return compiler<tag_type, Domain>::call( expr, state, visitor );

}

 

The Canned Compilers

 

The compilers are free to do anything they want. Usually, they rip the expression apart, recursively compile the constituents (possibly in a different domain), and then reassemble the results in some interesting way. Some transformations seem to show up again and again. It makes sense to factor these common patterns out. Here are the compilers that proto defines, along with high-level descriptions of their purpose:

 

fold_compiler:

For binary trees only. Compile the left sub-tree and use the result as “state” while compiling the right sub-tree. This is useful for flattening left-associative binary tree structures into lists.

reverse_fold_compiler:

For binary trees only. Compile the right sub-tree and use the result as “state” while compiling the left sub-tree. This is useful for flattening right-associative binary tree structures into lists.

transform_compiler:

Apply a user-specified transformation to a tree, then continue compiling the result with either the default compiler (the one proto::compile() would dispatch to), or with a user-specified compiler. In continuing with the “compiler” metaphor, the transform_compiler is like a preprocessor.

branch_compiler:

Compile the current tree as a “branch” and replace the branch with the result. This is accomplished in two steps. First, the “state” passed in is ignored, and the tree is compiled instead with a new user-specified “state” (and in a different domain). Then, the result (along with the original state and visitor) is passed to a user-specified compiler for further processing.

conditional_compiler:

Sometimes the double-dispatch on operator type and domain is not enough to pick a transform, and some other properties of the Expr, State and/or Visitor types must be queried in order to select the proper transform to apply. The conditional_compiler handles that. You pass it a compile-time predicate to evaluate against the (Expr,State,Visitor) 3-tuple, an IF compiler to use if the predicate evaluates to “true_” and an ELSE compiler to use otherwise.

pass_through_compiler:

For use when no transformation should be applied at a particular node. It recursively calls compile on the sub-trees and assembles the result back into an expr<>.

list_compiler:

Builds a fusion cons-list by using Expr as the head and State as the tail.

 

Examples

 

Example 1: Flattening a tree into a list

 

The following program uses the list_compiler and the reverse_fold_compiler<> to turn a tree of terminals in sequence with operator+ into a fusion cons-list.

 

#include <boost/xpressive/proto/proto.hpp>

#include <boost/xpressive/proto/compiler/list.hpp>

#include <boost/xpressive/proto/compiler/fold.hpp>

 

using namespace boost;

 

struct my_domain {};

 

namespace boost { namespace proto

{

    template<>

    struct compiler<tag::terminal, my_domain>

      : list_compiler

    {};

 

    template<>

    struct compiler<tag::add, my_domain>

      : reverse_fold_compiler<my_domain>

    {};

}}

 

int main()

{

    int visitor = 0;

    proto::meta::terminal<int>::type _1 = {1};

   

    //  cons<

    //      terminal<int>::type(1)

    //    , cons<

    //          terminal<double>::type(2.0)

    //        , cons<

    //              terminal<char>::type('3')

    //            , nil

    //          >

    //      >

    //  >

    proto::compile( _1 + 2.0 + '3', fusion::nil(), visitor, my_domain() );

 

    return 0;

}

 

 

Example 2: Flattening a tree into a list, with a transform

 

This example is like the above example, except that it uses the transform_compiler<> to pull the arguments out of the terminal nodes. The result is a cons-list of primitives.

 

#include <boost/xpressive/proto/proto.hpp>

#include <boost/xpressive/proto/compiler/fold.hpp>

#include <boost/xpressive/proto/compiler/list.hpp>

#include <boost/xpressive/proto/compiler/transform.hpp>

 

using namespace boost;

 

struct my_domain {};

 

namespace boost { namespace proto

{

    template<>

    struct compiler<tag::terminal, my_domain>

      : transform_compiler<

            arg_transform   // apply the "arg" transform, to extract the arg

          , my_domain       // stay in "my_domain"

          , list_compiler   // send the result to the list compiler

        >

    {};

 

    template<>

    struct compiler<tag::add, my_domain>

      : reverse_fold_compiler<my_domain>

    {};

}}

 

int main()

{

    int visitor = 0;

    proto::meta::terminal<int>::type _1 = {1};

   

    //  cons<

    //      int(1)

    //    , cons<

    //          double(2.0)

    //        , cons<

    //              char('3')

    //            , nil

    //          >

    //      >

    //  >

    proto::compile( _1 + 2.0 + '3', fusion::nil(), visitor, my_domain() );

 

    return 0;

}

 

Note that this example uses the canned “arg_transform” to extract the argument from the terminal nodes. “Arg_transform” is defined in proto as follows:

 

struct arg_transform

{

    template<typename Expr, typename, typename>

    struct apply

    {

        typedef typename meta::arg<Expr>::type type;

    };

 

    template<typename Expr, typename State, typename Visitor>

    static typename apply<Expr, State, Visitor>::type const &

    call(Expr const &expr, State const &, Visitor &)

    {

        return proto::arg(expr);

    }

};

 

Proto also provides a “left_transform” and a “right_trannsform”, as well as a “compose_transforms<>” for applying two transforms in sequence, and a “conditional_transform<>” for applying a transform depending on a compile-time predicate.

 

Example 3: Flattening a tree into lists of lists, with a transform

 

The following example is an extension of the previous. As before, we use the reverse_fold_compiler to flatten a tree of “plus” nodes, but we use a branch_compiler also to flatten nested trees of “multiply” nodes. The result is a list of lists.

 

#include <boost/xpressive/proto/proto.hpp>

#include <boost/xpressive/proto/compiler/fold.hpp>

#include <boost/xpressive/proto/compiler/list.hpp>

#include <boost/xpressive/proto/compiler/transform.hpp>

#include <boost/xpressive/proto/compiler/branch.hpp>

 

using namespace boost;

 

struct plus {};

struct multiply {};

template<typename T> struct my_domain {};

 

struct list_branch : proto::list_compiler

{

    typedef fusion::nil state_type;

};

 

namespace boost { namespace proto

{

    template<typename Domain>

    struct compiler<tag::terminal, my_domain<Domain> >

      : transform_compiler<

            arg_transform       // apply the "arg" transform, to extract the arg

          , my_domain<Domain>   // stay in the same domain

          , list_compiler       // send the result of the transform to the list compiler

        >

    {};

 

    template<>

    struct compiler<tag::add, my_domain<plus> >

      : reverse_fold_compiler<my_domain<plus> >

    {};

 

    template<>

    struct compiler<tag::multiply, my_domain<plus> >

      : branch_compiler<list_branch, my_domain<multiply> >

    {};

 

    template<>

    struct compiler<tag::multiply, my_domain<multiply> >

      : reverse_fold_compiler<my_domain<multiply> >

    {};

}}

 

int main()

{

    int visitor = 0;

    proto::meta::terminal<int>::type _1 = {1};

   

    //  cons<

    //      cons<int(1), cons<int(1), nil> >

    //    , cons<

    //          cons<int(1), cons<double(2.0), nil> >

    //        , cons<

    //              cons<int(1), cons<char('3'), nil> >

    //            , nil

    //          >

    //      >

    //  >

    proto::compile( _1*1 + _1*2.0 + _1*'3', fusion::nil(), visitor, my_domain<plus> () );

 

    return 0;

}

 

Note that when invoking the branch_compiler, we switch to another domain to compile the multiply operator. If we had stayed in the same domain, it would be an error because the branch_compiler would try to invoke itself recursively. When you want to use a branch_compiler to compile a sub-tree independently, you must switch to a different domain.

 

Future Directions

 

As we’ve discussed, the proto compilers are low-level and somewhat difficult to work with. The ultimate goal is to combine the tree transformation facilities with the proto meta-grammar facilities. The idea of “domains” is really just a hack to approximate different production rules within a grammar. In the future, tree transformations will be specified by writing a grammar and attaching transformations to the rules in your grammar.