Boost logo

Boost :

Subject: [boost] Interest in an LLVM library?
From: Andy Jost (Andrew.Jost_at_[hidden])
Date: 2015-11-17 12:35:05


Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)

The scope of this proposal is limited. The proposed library does not compile, link, optimize, save, or load anything, for example, since the LLVM API is suitable for that. Also, it may not support obscure features, even if they seem to be within its purview, since one can always make direct calls to the LLVM API.

I have written about 6500 lines of prototype code in the context of another project (a compiler for the language Curry, if anyone is interested). It is not a complete solution, but is by no means trivial, either. I'll discuss that next, but since the discussion is somewhat lengthy, let me first explain what I'm looking for by sending this. I think this EDSL would make a good Boost library. If others agree, then the next step is to review the design. I need second opinions and help from more knowledgeable people to improve the design. There are a few places in particular where I think a redesign is necessary. After that, I'm happy to implement the final library (and would gladly accept help, if anyone is interested).

Now, I'll show a few examples and provide links to the prototype code. First, here's how to create a module and put it in scope:

module const example_module("example");
scope _ = example_module;

The scope object sets the insertion point for types, functions, and global variables.

We can create a type as follows:

                type i32 = int_(32);

i32 is a 32-bit integer. It is easy to build more complex types, e.g., i32[2] is an array type, *i32 is a pointer type, and i32(i32,i32) is a function type. An expression like i32(42) produces a constexpr. As you might guess at this point, the prototype makes heavy use of SFINAE. I've gone to a lot of effort to make this work with everything I can think of, particularly multi-dimensional arrays and initializer_lists. So, an expression like (i32[3][2])({{1,2,3},{4,5,6}}) works! (Note: i32[3][2] is the C++ type int32_t[2][3] because that's the only sensible way to interpret successive calls to operator[]).

We can create a struct from a sequence of types like this:

                type foo = struct_("foo", {i32, *i32});

Or, we can leave out the sequence to get an opaque struct.

A function prototype can be created as follows:

                function const f = extern_(i32(i32), "f", {"i"});

This function, f, takes an i32 (named "i") and returns an i32. The function is placed in example_module, since the code appears in its scope. There are two other linkage-specifying functions: static_ and inline_. extern_ and static_ are also used to create global variables (i.e., if the type is not a function). To define the body, we create another scope:

                {
                  scope _ = f;
                  // code for f.
                }

To understand this, we have to discuss scopes in more depth. There are three nested levels of scope: the current module, the current function, and the current basic block. The code above sets the current function to f and the current basic block to f's entry point. Generally speaking, within a function body we write statements that generate instructions. Those instructions are inserted into the current function at the end of the current basic block. To define the body, we can declare values and operate on them. For example:

                  value i = arg("i");

Here, "arg" is a special function that fetches the named function parameter. We can add instructions to f using operators or special methods. For example:

                value j = i + 1;
                return_(j);

Branches manipulate the current basic block. Here's an if statement:

                label true_path, false_path;
                if_(i, true_path, false_path);
                {
                  scope _ = true_path;
                  // code for true.
                }
/* else */
                {
                  scope _ = false_path;
                  // code for false.
                }
                // new basic block here.

This generates instructions to test i, take the appropriate branch, and then continue to a new basic block in the right places (which depends on whether the paths terminate with a branch instruction). It also updates the scope so that statements following if_() are inserted into the new basic block. Believe me when I say it is not trivial to do this with the LLVM API. Although it is sometimes necessary to pre-declare labels as shown above, we can usually employ C++11 lambdas to simplify the encoding:

                if_(i
  , []{ /* code for true */}
  , []{ /* code for false */ }
  );

The prototype code can be found under https://github.com/andyjost/Sprite-3/tree/master/include/sprite/backend. Also, there are many examples under https://github.com/andyjost/Sprite-3/tree/master/examples. Other parts of the Sprite-3 project are unrelated to this proposal. A good example to start with generates the Fibonacci numbers (https://github.com/andyjost/Sprite-3/blob/master/examples/fib.cpp). Here's the interesting part:

  // Create a new module and associate it with this lexical scope.
  module const fib_module("fib");
  scope _ = fib_module;

  // Declare types.
  auto const char_ = types::char_();
  auto const i32 = types::int_(32);
  auto const i64 = types::int_(64);

  // Declare external functions.
  auto const printf = extern_(i32(*char_, dots), "printf");

  // Define the @fib function.
  auto const fib = extern_(i64(i64), "fib", {"n"});
  {
    scope _ = fib;
    value n = arg("n");
    if_(n <(signed_) (2), []{return_(1);});
    return_(fib(n-2) + fib(n-1));
  }

  // Define the @main function.
  auto const main = extern_(i32(), "main");
  {
    scope _ = main;
    value const x = fib(5);
    printf("fib(5)=%d\n", x);
    return_(0);
  }

After this code runs, fib_module contains an intermediate representation of the program. The LLVM API can be used to save that to disk, compile it to assembly, link in the C library (to get printf), or JIT compile any of the functions, among other things.

Thanks to everyone who made it this far :) I'm looking forward to your feedback.

-Andy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk