Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2024-02-23 01:38:39


On Thu, Feb 22, 2024 at 4:36 PM Christian Mazakas via Boost
<boost_at_[hidden]> wrote:
>
> plaintext dump as requested:

Thanks! I was going to have to do it to respond here. :)

[snip]

> ### The Feedback
>
> The differences between Spirit.X3 and Parser need to be made much more
> clear by the
> documentation, by which I mean:
> > * Spirit X3 has rules that do not compose well — the attributes produced
> by a rule can change depending on the context in which you use the rule.
> > * Spirit X3 is missing many of the convenient interfaces to parsers that
> Spirit 2 had. For instance, you cannot add
> parameters to a parser.
> > * All versions of Spirit have Unicode support, but it is quite difficult
> to get working.
>
> These need code samples showing the limitations of X3 and how Parser
> actually solves them. I think many,
> myself included, aren't overwhelmingly clear on what these differences are
> and how Parser solves them
> and _why_ this is a good thing.

Fair enough. The first one I'll need to try to write a good example
for, but the basic idea is that the loose matching of a parser's
default attribute to a user-provided attribute is not something you
can turn off. So when you define a Spirit X3 rule with the attribute
type vector<int>, it can still fill in a deque<int>. This means that
some errors you write don't get diagnosed until way later than they
could, and you can't really write a rule and test it in isolation,
because the context in which you use it can change its behavior -- the
loose matching rules are very loose. That's my memory of the problem
anyway... I'll have to investigate a bit.

The second one refers to parameterized rules, like you find in the YAML spec:

[70] l-empty(n,c) ::=
  (
      s-line-prefix(n,c)
    | s-indent-less-than(n)
  )
  b-as-line-feed

When you write the l_empty parser, you want to be able to take two
arguments to it, n and c, and then pass those to the subparsers like
s_line_prefix. You could do that with Spirit 2 parsers, but not with
Spirit X3 parsers. I wrote a YAML parser, and so felt the loss of
this pretty strongly. I'm not sure how many people write parsers like
this though.

The last one you have dealt with yourself -- I think it might be
enough to say that Unicode support requires you to repeat a lot of
your parser logic.

> Though I have attempted X3's Unicode before and I will say, Parser's
> unified interface is much better.
> For example, I wrote an entire RFC-compliant URL parser in X3, including
> Unicode support, for my library
> Foxy
> <https://github.com/cmazakas/foxy/blob/7f4ac0495ad2ed9cd0eca5994743d677ac1d2636/include/foxy/uri.hpp#L353-L356>

Yep. I had written Unicode-aware Spirit parsers before, and want to
get rid of that kind of repetition.

> Not a criticism but Parser's directives are a dramatic improvement over X3.
> By these, I'm thinking of
> `boost::parser::string_view[]`, `boost::parser::separate[]` and
> `boost::parser::merge[]`. These should be
> highlighted in the comparison vs X3 as I think they add a good bit of
> substantive value.

Glad you liked these. These came out of discussions with Andrzej
about his frustration at having to write rules for some of his
parsers. Rather than embrace the complexity of the attribute rules
from Spirit, I thought lexically-visible control via directives would
be easier to reason about.

> More praise for the library includes its parse API which is a dramatic
> improvement over X3 in terms of
> ergonomics. Personally, I can see myself main using `prefix_parse`,
> similarly to X3's original parse() but
> I think the value of `parse()` itself is obvious and doesn't warrant much
> explanation.

Thanks! It took me about 6 different iterations to settle on a
minimal overload set.

> ---
>
> The library generates a dramatic amount of warnings when compiled with
> `-Wall -Wextra -pedantic`. I had
> originally forgotten to `include_directorie()` Parser as a `SYSTEM`
> directory and was met with 172 warnings.
>
> I think being clean under `-Wall -Wextra -Werror` is reasonable criteria
> for acceptance.

I basically never turn on -Wextra, but a I have a ticket to make sure
Parser is -Wall-clean: https://github.com/tzlaine/parser/issues/107

> When writing the parser for the HTTP version line, I ran into a lot of
> issues separating the major version
> from the minor version when using:
>
> parser::digit >> "." >> parser::digit
>
> It was only when I was lazily browsing the docs that I found the
> `separate[]` directive which I think indicates
> the library's documentation has poor discoverability.

This is described in the page about attribute generation:
https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/tutorial/attribute_generation.html
. It's definitely difficult to know where to put things where people
will see them.

[snip]

> The documentation doesn't exactly give users any guidelines about when they
> should formally
> define rules with attributes vs semantic actions.
>
> This is one of those things where if users don't already have the intuition
> built up, using the
> library is an exercise in frustration.
>
> Namely, attributes don't compose well when you start nesting rules and want
> to have different out
> params at different levels of the parsers. I suppose this should be obvious
> because the rules themselves
> form a hierarchy so the attributes _must_ mirror that.
>
> I found that the best thing to do for my use-case was to abuse
> `_globals(ctx)` and semantic actions.
>
> I think a good rule of thumb for users can be expressed as something like:
> use rule attributes for simple
> grammars where all you really want is just the `std::vector<double>`. For
> something complex with domain-defined
> validation rules, prefer `_globals(ctx)` and semantic actions for
> validating everything.

I can add something to the Best Practices page about this. Ticket:
https://github.com/tzlaine/parser/issues/110

> I felt like I came up with working but unsatisfactory solutions to parsing
> characters in a specific range.
>
> Namely, I just wrote a simple semantic action around `+parser::char_[f]`
> but this made me feel like I was
> writing unidiomatic code. I'd appreciate some feedback here:
>
> // reason-phrase = 1*( HTAB / SP / VCHAR / obs-text )
> // VCHAR = %x21-7E
> auto const on_reason_phrase = [](auto &ctx) {
> auto ch = _attr(ctx);
> if (ch != '\t' && ch != ' ' && (ch < '\x21' || ch > '\x7e')) {
> _pass(ctx) = false;
> return;
> }

This part could just have been: char_('\x21', '\x7e') | char_('\t') |
char_(' '). You might want to reorder it for the most common stuff to
come first. Oh, since you put it in an omit below, it's even simpler:
char_('\x21', '\x7e') | '\y' | ' '.

> response::metadata &md = _globals(ctx).md_;
> std::ranges::subrange<char const *> sub = _where(ctx);
> if (!md.reason_phrase_begin_) {
> md.reason_phrase_begin_ = sub.begin();
> }
> md.reason_phrase_end_ = sub.end();
> md.status_line_end_ = sub.end();
> };
>
> auto const reason_phrase =
> parser::omit[+parser::char_[on_reason_phrase]];
>
> ---
>
> There should be a debugging utility for discovering what the synthesized
> attribute is for a rule. Perhaps
> this could exist as a semantic action that attempts to instantiate an
> incomplete template.
>
> template<class> struct diagnostic;
>
> auto const debug=[](auto& ctx){diagnostic<decltype(_attr(ctx))> d;};
>
> It took me a long time to realize what the attribute type for `digit >> "."
> >> digit` was (it was `std::string`).

Since you are not the only person asking for this:
https://github.com/tzlaine/parser/issues/111

> The library seems to really emphasize Boost.Hana's `tuple` type but I think
> in practice, this winds up
> being a mistake unless there's substantive compile-time savings by using
> Hana here.
>
> Namely, Hana's `tuple` doesn't support structured bindings. Because Parser
> is a C++17 library, I consider
> this a somewhat large defect of the interface. What's worse, Hana's
> literals aren't compatible with Parser's.
>
> A huge boon for Hana and its `tuple` is that it supports indexing via
> `tup[0_c]` but this doesn't work when
> Parser's literal are involved as it also defines a `_c` so an ambiguity is
> introduced and compilation fails.
>
> Between this and the lack of structured binding support, I think relying on
> Hana here is a strict downgrade unless
> there's compelling reasons I'm simply unaware of.
>
> https://github.com/boostorg/hana/milestone/4

When using Hana, why would you need to use parser::literals? I can
only think of uses in the non-Hana compatibility stuff; when using
Hana you don't need that stuff, I think.

> Stuff like `_globals` in the docs would 404 for me. I also think the docs
> are kind of just incomplete in general
> and because there's not much cross-referencing, discoverability is hurt so
> a user has to essentially read the entire
> docs front-to-back to understand the library which is a large ask for a
> parsing library.

I'll audit the links. I did that at some point, but I guess there's
been breakage since. Ticket:
https://github.com/tzlaine/parser/issues/112

> ### Conclusion
>
> Overall, I think that even with all of the problems I found in the library,
> it's still quite good and worthy
> of being brought into Boost as the successor for Spirit. I don't think
> anything I found was a deal-breaker and
> I think in a few iterations of Boost, this library can be something quite
> great.

Thanks for the review, Christian!

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk