Boost logo

Boost :

From: Дмитрий Архипов (grisumbras_at_[hidden])
Date: 2024-02-29 13:16:31


This is my review of Boost.Parser.

First of all, a disclaimer: I am employed by the C++ Alliance.

# Are you knowledgeable about the problem domain?

I've had a college course on formal languages and parsing, so I still remember
some fundamentals. I've also played with Spirit.Qi years ago.

# How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

I've spent about 4 hours reading the documentation, and 10 more hours
experimenting with it.

# What is your evaluation of the potential usefulness of the library?

Parser combinators are useful for quite a lot of applications, and parser DSELs
have proven to be a popular approach. I agree with a previous reviewer that not
many people would prefer using an external tool to a parser library. I
definitely wouldn't.

# What is your evaluation of the documentation?

Currently documentation looks a bit undercooked. It needs some restructuring.
For example, on one page there is a table of attributes for library-provided
parsers. On another, there is a table of attributes for parser operators. On
yet another those two pages are repeated, and also there's a bunch of
extra rules for
how sequences and alternatives affect attributes, followed by a table with
examples. This creates some confusion at first on where to look up information
on attributes.

Another complaint I have is that the fact that the attribute for
parsers with semantic
actions is none is not advertised well enough. This is a very important piece of
information, because most parsers use actions. And yet, this is only mentioned
in operator tables, and never mentioned in the section on semantic actions.

Also, I had a lot of success using _locals. But the docs only mention it in
passing. I feel like overall the documentation would have benefited from more
examples which solve some specific tasks using various library components.

# What is your evaluation of the design?

As far as I can tell, this library is largely a new take on Spirit's design.
As such, it won't blow anyone's mind. On the other hand, Spirit's design is
battle tested, it is known which things work well and which don't. One of more
significant changes is the handling of Unicode. I can attest to the
simplicity of
using the library with Unicode inputs. More on this later.

Overall, I felt that using this library was easier than using Spirit.

One thing I want to note is that the library seems to have copied several
naming conventions from Spirit. I'm not convinced there's much value in naming
context accessors with a leading underscore. I'm also not convinced dependency
on Boost.Hana for the sole purpose of using its tuple's operator[] is
warranted.

# What is your evaluation of the implementation?

I didn't look at the implementation.

# Did you try to use the library? With what compiler? Did you have any problems?

I've written several test programs with it. I mostly used GCC 12, but also
tried building with Clang 17. I haven't encountered any issues when using GCC.
Clang build seemed to work originally, but later the build started to
fail. I suspect, this is related to the inclusion of transcode_view.hpp.

The first thing I tried to do was implementing a parser for arithmetic
expressions. Converting a BNF to C++ was very straightforward, which is a plus.

After that I thought of a task that requires Unicode support. So, I've made a
parser for Russian numerals. What makes this task interesting is that various
parts of a numeral have to be in the same gender, and some parts affect other
parts' plurality category (one, few, or many).

After finishing the second program I suddenly realised that the two parsers
can be combined to create a parser for arithmetic expressions written in
Russian. I managed to do this with only trivial changes in a couple of minutes.
This is a testament to the exceptional composability of the library's parsers.

Finally, I was looking for a way to avoid building and then accumulating
containers. This is where _locals came in handy.

The final program can be seen here:
https://gist.github.com/grisumbras/90d2e99b8eb8b6c82147188c8a6287f6.

While I was writing those parsers using trace::on was of great help for
debugging errors. I still had a bunch of several pages-long template errors,
particularly when parser attributes were not of the type I expected it to be.

A related issue is that I have been using the branch that allowed returning
values from semantic actions, and the library often couldn't correctly figure
out whether I wanted to pass a context to the action, or just the attribute.
In the end, I had to add explicit return type of void to all of the former.
With the latter the problem sometimes manifested when a lambda had several
return statements. I predict that this will be a significant usability issue.

Several people have asked for a way to define rules without a macro. I have a
different request: I think that there should be a macro which rather than
relying on a naming convention allows you to explicitly name the parser that
implements the rule. This would allow people to use their own conventions.

I tried using nocase and discovered that the symbols parser ignores it. Is this
by design? If so, there should be a separate parser that allows case-insensitive
symbol matching.

Finally, several people asked for benchmarks. Conveniently, the library
includes a JSON parsing example, and conveniently I am maintaining Boost.JSON
and I know how to add new parser implementations to its benchmark runner. The
results aren't very good: Boost.JSON with default allocator is approximately
500 times faster than Parser on apache_builds.json. Niels Lohmann's JSON
library is 100 times faster.

# Do you think the library should be accepted as a Boost library?

I recommend to ACCEPT the library on several CONDITIONS:

* The main requirement is to fix build failures on popular C++ implementations.
  Related to this would be a well-functioning CI setup.
* Either symbols should support case-insensitive mode, or an alternative parser
  should be added to the library.
* There should be a rule definition macro that allows explicitly specifying
  parser that implements the rule.

Overall, I enjoyed my experience with the library. Given that Spirit is being
sunsetted, Boost needs a replacement.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk