Boost logo

Boost :

From: Alan de Freitas (alandefreitas_at_[hidden])
Date: 2024-02-29 03:17:54


Boost.Parser Review

I'd like to thank Zach for his work. Here's my review.

> Does this library bring real benefit to C++ developers for real-world
use-case?

Yes. Boost.Spirit is already a great library that exemplifies what's
possible in C++. But it's pre-C++11 and no longer in active development.
So even if this were only a modern and actively maintained version of
Spirit, that's already good enough considering how important the problem is.
It seems like many people have issues with Spirit. For instance, we avoided
using it in Boost.URL and know of other people who have also been avoiding
it in their projects.

Some things I like about Boost.Parser:
- Boost.Parser uses the same conventions as Spirit to combine parsers.
- Being able to use C++17 vocabulary types in the API
- Unicode support
- Better error reporting
- The Boost.Spirit authors gave it their blessing
- The library name is very direct

The issue with Spirit people seem to mention most frequently for avoiding
it is compile times.
Unfortunately, there's not a lot that can be done by Boost.Parser here, but
users can still mitigate that by only including the library in their source
files if they want.

> Do you have an application for this library?

We have lots of libraries where we need to parse things.
I would consider Boost.Parse for new libraries once it's more stable.
Even if we intend to use custom parsers in the end, Boost.Parser could
provide a good starting point.

> Does the API match with current best practices?

Using the same conventions as Spirit to combine parsers seems like a good
design here.

I was a little confused about the convention to assume unicode for anything
other than char.
I'm not sure that's the convention that minimizes surprise.
I would expect the same convention (unicode or not) to apply to any data
type because:
- It's one more thing to remember.
- I often don't want to interpret things as code points regardless of
whether it's char or unsigned char.
- I noticed many people also don't care about unicode most of the time in
their use cases.
- It also seems to be the case that the relevant keywords in most grammars
are ASCII.
- The documentation says "the Unicode parsing API will not actually cost
you anything" but two sentences later it says "one caveat is that there may
be an extra branch on each char, if the input is UTF-8"
At least, the library documentation could make it very clear right at the
start how the user can explicitly specify if they want to enable/disable
unicode support for whatever data type they wish.

I find the macros to define rules a little unsettling.
- Isn't it possible to at least document what things would look like
without a macro so that readers can have a good idea of what the macro is
doing whenever they use it?
- A single macro to define the rules could also be useful.
- It would also be useful if the cases where macros are not as useful
(non-recursive rules?) could be specified in the documentation.

About Hana:
- I don't like optional dependencies in general, so I also don't like the
idea of supporting std::tuple and hana tuples separately.
- Since the library is boost/standalone and tuple/hana, then that's already
four combinations that are unlikely to be tested for all relevant compilers
and environments.
- That's more possibilities of conflicts for users compiling different
versions of the library.
- I especially don't like the idea of using another type for tuples when we
already have a vocabulary type from the standard.
- I feel like whatever Hana provides can be achieved with std::tuple and
maybe some other library like mp11 in extreme cases.
- It's probably possible to support Hana as a peer-dependency where the
rule would fill any tuple-like object according to some concept (the rules
can already do that for containers, variants and optionals if I'm not
wrong).
- Including all of <boost/hana.hpp> can increase compile times
significantly.
Maybe I'm missing something and Hana enables something great but I don't
see it. If it were that great and important, it wouldn't be an _optional_
dependency.

Maybe the parsers for numbers could use Boost.Charconv under the hood.
I don't know if that's feasible for all cases.

In our applications, we often have charsets defined by functions that
return whether a char is reserved in a grammar.
I couldn't find a good way to express that with Boost.Parser (although
operator|() combining charsets are often enough).
If that's correct, it would be good to have some alternatives here.
It would also be good to have rules for very common charsets if that
doesn't already exist.

> Is the documentation helpful and clear?

Yes. The documentation is clear. It's driven by goals and examples, which
is great.
Knowing Boost.Spirit helped me, so it might be useful to give more weight
to feedback from people who don't know Boost.Spirit yet.

I found the section on "Writing Your Own Parsers" very limited considering
that we know "some people are obsessed with writing everything for
themselves".
Very often, we have ad hoc rules that are hard to express as a combination
of other rules.
At other times, as a library evolves, custom rules can make things less
generic and more efficient even if they could be expressed with the more
generic rules.
For instance, many examples in the documentation would be more easily
represented as functions in my opinion.
So I find this insufficient: "It is a basic orientation to get you familiar
enough with all the moving parts of writing a parser that you can then
learn by reading the Boost.Parser code" because the documentation should be
exposing all the concepts and documenting everything.
Telling such a usual user to read the code to reverse engineer how it
should be done is no good. Especially because the number of concepts and
requirements involved is probably not much larger than what's already
documented.

Are there small examples of binary parsers somewhere?

The reference is just kind of ugly and it's difficult to find individual
symbols.

The difference between this library and Spirit versions should be more
explicit and justified in the documentation.
Some pros and cons only apply to certain versions of Spirit, so that should
also be explicit instead of just mentioning Spirit 2 _or_ Spirit X3.
I think some other people were also confused about that.

The section "The Parse Context" is a little confusing.
Unlike other sections, it's not goal-oriented with lots of examples.

The section "Alternative Parsers" is a little weird.
I don't see a reason to discuss operator|() individually in a single
section and not discuss other operators and the previous or next sections.
I would expect a section listing all operators (as exists later) and
operator|() to just be part of it.

I find the TOC overwhelming (reminds me of the Boost.Asio documentation a
little).
There are too many sections under "Tutorial".
Almost the whole thing is under "Tutorial", which makes categorization
somewhat pointless.
It would be good to use Miller's Law at each level here so that the reader
can always know where he is.
This whole "Tutorial" section doesn't even look much like a tutorial.

One thing I'm also missing is some cheat sheets with all the symbols
provided by the library related to what a given section is discussing.

> Did you try to use it? What problems or surprises did you encounter?

Yes. It worked fine with MSVC over here but CI needs to improve because
people have reported failures with other compiler versions.
It's not a reason to reject the library but it's something that needs to be
fixed and causes a bad first impression.
CI should also include coverage, more warnings, sanitizers, and probably
fuzz tests.
I believe Sam can also help with Drone if that's needed.

> What is your evaluation of the implementation?

The code is very well organized and properly documented.
I found some issues in the CMake scripts but I'll just open issues in the
repository for these.

> Are you knowledgeable about the problem domain?

We have been using custom parsers in almost all of our libraries. I've also
used Boost.Spirit before.
In most cases, we just write custom logic for the parsers now.
But something like Boost.Parser could save us a lot of time in new
libraries even if we intend to use custom parsers later.

> Please explicitly state that you either *accept* or *reject* the
inclusion of this library into boost.

I recommend accepting the library.

> Also please indicate the time & effort spent on the evaluation and give
the reasons for your decision.

I spent time on and off over the last weeks and this week.
In total, I spent something like 2 or 3 days evaluating the library.

> Disclaimer

I'm affiliated with the C++ Alliance.

-- 
Alan Freitas

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk