Boost logo

Boost :

Subject: [boost] [metaparse] review
From: Gordon Woodhull (gordon_at_[hidden])
Date: 2015-06-03 17:49:22


Hi all,

I believe that Metaparse should be accepted into Boost.

> - What is your evaluation of the potential usefulness of the library?

As mentioned by others, Metaparse is a library for library developers. In terms of functionality, it is closest to Boost.Proto, although it allows defining arbitrary EDSLs as strings, instead of using the built-in C++ operators.

In order to test the effectiveness of Metaparse, I built a parser for the Graphviz "dot" language [1]. This is of practical importance to me, because I have an interest in graph metaprogramming [2], specifically generating types and networks of types from graph metadata. Having a nice language for defining graphs in metadata, to support a syntax beyond angle brackets, would be very helpful.

One limitation of this class of EDSL is that it can produce arbitrarily complex types, but it can't refer to any existing types. E.g. you might have the string "int" in your EDSL, but in order for it to refer to the C++ int type, you are going to need an mpl::map from type-strings to types. I didn't get this far in my evaluation, but it's really beyond the scope of the library.

> - Are you knowledgeable about the problem domain?

Fairly knowledgeable. I have built an experimental parser for compile-time EDSLs over type expressions called "Angly" [4] (presented at C++Now 2012). I have learned from the considerably simpler design of Metaparse, and hope to apply it to my own library (in particular using recursive descent rather than trying to wrangle a deterministic finite automaton).

> - How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?

I spent about 5 hours getting set up and working through the tutorial, and about 7 hours getting up and running with my own parser. Once I "got it", it only took an hour or two to implement the rest of the grammar (and about 4 hours to write this review).

My results (grammar and a few basic tests) are in a Gist available here: [3].

My grammar is able to parse "dot" graph descriptions that look like this

"graph g { A [x=y]; \
 B->C [y=z, w=a]; \
 D->E; \
 F; \
 G->H [a=b, b=c, c=d, d=e, e=f, f=g]; \
 I->J; \
}"

Because I ran out of time, my parser only produces an AST; it does not yet produce MPL.Graph or Fusion.Graph types.

> - Did you try to use the library?
> - With what compiler?
> - Did you have any problems?

I mostly used metashell for my evaluation, but I also tested compilation times with clang 3.5 once I had something running.

Any problems I had were the usual difficulties of TMP.

I was not expecting the compile times to be tolerable, but even in metashell, the turnaround was usually around 15-20s. Compiling my longest test with clang, just shy of 256 characters, took only about 4-5 seconds.

> - What is your evaluation of the documentation?

Excellent. Working through the tutorial was enough for me to absorb the particular constructs needed for building grammars and semantic actions.

I also used the reference documentation when building my own parser. This documentation is more terse, very much in the style of the MPL documentation. But it functions.

I only skimmed the user manual, since the content looks a little bit stale (doesn't mention build_parser) and somewhat redundant with the tutorial.

As Andrzej mentions, there may need to be an introduction concerning the scope of the library and what you can and can't do with this kind of EDSL, where you would use Proto instead of Metaparse, etc.

I think that as the library evolves, the documentation will probably need more general parsing background, of the sort you will find in Spirit documentation. For example, the tutorial covers the unary `-` operator, but it doesn't mention why there is no ambiguity with the binary `-` operator.

Also, it seems (as mentioned in the design discussion below) that the grammar must be unambiguous, and this is mentioned nowhere.

Overall, the documentation is stronger on the meta- than the -parse.

> - What is your evaluation of the implementation?

I did not peer into the code very much, beyond finding some examples I needed. My impression from Abel's walk-through at C++Now 2012 is that the implementation is quite simple and straightforward, based on established techniques from parsers in Haskell.

> - What is your evaluation of the design?

Very good. However, I think the API might be fleshed out a little bit once the library gets more usage. In particular, I found myself writing a few helper metafunctions that looked like this:

template<typename EdgeDecl>
struct declare_edge {
  typedef edge_decl<typename mpl::at_c<EdgeDecl, 0>::type,
                    typename mpl::at_c<EdgeDecl, 2>::type,
                    typename mpl::at_c<EdgeDecl, 3>::type> type;
};

invoked from a transform like this:

using edge_stmt = transform<
  sequence<ID, arrow_token, ID, opt<attr_list, mpl::vector<>>>,
  mpl::quote1<declare_edge>
>;

It may be my inexperience, but it struck me that it would be helpful to have a "spread" or "apply" transform so that the items in the mpl::vector produced by sequence would be supplied as parameters to declare_edge, instead of having to pull them out using mpl::at_c. There may be some trick I don't know here.

I will mention two other points of confusion. These are probably not design flaws in the library, but I thought I would bring them up in case they are.

First, I started out defining my AST nodes like this:

template<typename Tail, typename Head, typename Attrs>
struct edge_decl {};

Seems like a reasonable basic type-holder to me. As shown above, this is instantiated and returned by a helper metafunction invoked by the transform which pulls stuff out of an mpl::vector.

But then somewhere in metaparse, it tries to again "invoke" this result with ::type when appending it to another mpl::vector.

Really? So my transform needs to return not the value, but a metafunction which returns a value?

I couldn't figure this out, so I gave each of my AST nodes a self-referential ::type in the way that mpl::vector has:

template<typename Tail, typename Head, typename Attrs>
struct edge_decl {
  typedef edge_decl type; //why?
};

Something is not right here!

The second problem I encountered is about ambiguous grammars. For the "dot" language it seems natural to define a statement this way:

using stmt = one_of<node_stmt, edge_stmt>;

where a node statement looks like

ID [attr=value, attr=value];

and an edge statement looks like

ID -> ID [attr=value];

But there's an ambiguity here, and once the parser successfully reads a node, it is going to get upset if it sees a `->`. So the rule has to be written the other way:

using stmt = one_of<edge_stmt, node_stmt>;

Similarly, the `;` separator for declarations is optional in the "dot" language, but I couldn't figure out how to implement this, because if you try to have an optional element at the beginning of a rule:

template<typename Parser, typename Default = void>
using opt = one_of<Parser, return_<Default>>;

using stmt_list = foldlp<
  sequence<opt<semi_token>, stmt>,
  transform<stmt, mpl::quote1<mpl::vector1>>,
  // ...

then the parser will greedily eat the epsilon and have nowhere to go (at least according to my amateur understanding of parsers).

I gave up here and made the separator mandatory. Again, this is probably not a flaw in the design, but the documentation should describe what kinds of grammars will work, and perhaps give pointers to other resources about disambiguating grammars. Probably there is a different construct which would help here.

Cheers,
Gordon

[1] http://www.graphviz.org/doc/info/lang.html
[2] http://metagraph.info/wp-content/uploads/2011/05/mpl.graph_.pdf
[3] https://gist.github.com/gordonwoodhull/dcbb332eea06dd3279b9
[4] https://github.com/gordonwoodhull/metagraph


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk