Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2023-12-29 19:09:00


On Fri, Dec 29, 2023 at 10:35 AM Peter Dimov via Boost
<boost_at_[hidden]> wrote:
>
> Zach Laine wrote:
> ...
> > I'm calling my proposal Boost.Parser, and it follows many of the conventions
> > of Boost.Spirit 2 and X3, such as the operators used for overloading, the
> > names of many parsers and directives, etc. It requires C++17 or later.
> ...
>
> > The Github page is here: https://github.com/tzlaine/parser
> > The online docs are here: https://tzlaine.github.io/parser
>
> Some observations:
>
> I understand, in principle, the motivation behind asserting at runtime
> instead of failing compilation, but I don't think the same argument applies
> to rejecting *eps parsers. It seems to me that a static assert for any *p or
> +p where p can match epsilon (can succeed while consuming no input)
> would be clear enough. (E.g. +-p, *(p | q | eps), *attr(...), +&p, etc.)

Why? It may be better to static_assert, but it's not clear to me why

> Interestingly, this would reject **p and +*p, because these parsers can
> go into an infinite loop. The current behavior is to collapse them into *p,
> which is useful, but technically wrong. This raises the possibility of, instead
> of rejecting *p or +p when p can match epsilon, just 'fixing' its behavior so
> that when p matches epsilon, the outer parser just exits the loop. This will
> make the current collapsing behavior equivalent to the non-collapsed one.

At first, I thought this was a great idea. Now I'm ambivalent. The
way I might implement this is in repeat_parser (that's the only
looping parser, modulo its subclasses). I could then do a couple of
things:

1) detect that we have not eaten any of the input, but have matched
repeat_parser's subparser, and terminate the repetition; or
2) detect that we have matched repeat_parser's subparser, *and* that
the subparser is an unconditional match.

#1 is nice, because you don't need any way of tagging parser types as
being epsilon-like. Without this or some similar approach you could
end up with a closed set of types that trigger this short-circuiting.
This seems like a maintenance problem for me, but moreover an
extensibility problem for users. #2 suffers from this closed-set
problem.

To fix #2, I could add a template param (or constexpr static member,
same diff), that acts as a tag.

#1 is problematic though, and anything where the no-input-consuming
match is conditional is equally problematic. Each parser could have
arbitrary side effects, via semantic actions. So this parser:

*(if_(c)[p] | eps[a])

Could match the eps first, if 'c' evaluated to false, and later match
'p', depending on what 'a' does. If 'a' flips the value of 'c', then
the parse will always match 'p'. If 'a' increments a counter, then
the parse might eventually match 'p', but just take a long time to do
it; this case might also result in an infinite loop. In the case of
the increment that ends in a match, maybe 'a' increments a counter,
but also does some other important side effect. This may be a useful
pattern to someone, somewhere.

This is obviously contrived, but the point is that there are currently
some things that you can express that would become non-expressible.

tl;dr I like the idea, but I'm struggling with how to do it so that we
don't limit expressivity.

> Also, errors should definitely go to std::cerr by default, not std::cout. Errors
> aren't program output, and routing them to stdout is script-hostile.

Ach! Yeah, that's just an oversight. I've opened a ticket, thanks.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk