Boost logo

Boost :

From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2024-02-22 18:25:23


czw., 22 lut 2024 o 09:17 Zach Laine via Boost <boost_at_[hidden]>
napisał(a):

> On Wed, Feb 21, 2024 at 12:27 PM Phil Endecott via Boost
> <boost_at_[hidden]> wrote:
> >
> > Boost.Parser Review
> > ===================
>
> [snip]
>
> > - There's an ugly MACRO to define rules! That doesn't look very
> > 2024 to me.
>
> Addressed below.
>
> > Test Case
> > ---------
>
> [snip]
>
> Thanks for sharing the code, and marking it up as you have. It was
> really helpful to have this.
>
> > //const auto degrees_symbol = bp::no_case[
> bp::lit("degrees") | bp::lit("deg") | bp::lit('d') ];
> > //const auto minutes_symbol = bp::no_case[ bp::lit('\'') |
> bp::lit("minutes") | bp::lit("min") | bp::lit('m') ];
> > //const auto seconds_symbol = bp::no_case[ bp::lit('"') |
> bp::lit("seconds") | bp::lit("sec") | bp::lit('s') ];
> > // FIXME those rules don't work and I don't know why. They work if the
> symbol is
> > // the first alternative; it it's any of the other alternatives the
> double returned
> > // by e.g. decimal_degrees below is always zero.
> > // Use these instead:
> > const auto degrees_symbol = bp::lit('d');
> > const auto minutes_symbol = bp::lit('m');
> > const auto seconds_symbol = bp::lit('s'); // Beware, 's' can also mean
> south.
>
> Yes, this is indeed a bug! The sequence parser was wiping out the
> value of a previously-successful parser's attribute, when the next
> parser had an internal failure (like not matching "degrees", but then
> matching 'd'). This is now fixed on the boost_review_changes branch.
> This is code I was messing about with right before the review; sorry
> about that.
>
> > const auto uint_0_60_def = bp::uint_ [( [](auto ctx) { _pass(ctx) =
> _attr(ctx) < 60U; _val(ctx) = _attr(ctx); } )];
> > const auto double_0_60_def = bp::double_ [( [](auto ctx) { _pass(ctx) =
> _attr(ctx) < 60; _val(ctx) = _attr(ctx); } )];
> >
> > const auto decimal_degrees = bp::double_ >> -degrees_symbol;
> >
> > const auto degrees_decimal_minutes_def = (bp::uint_ >> -degrees_symbol
> > >> double_0_60 >>
> -minutes_symbol) [( [](auto ctx) {
> > auto d = _attr(ctx)[0_c];
> > auto m = _attr(ctx)[1_c];
> > _val(ctx) = d + m/60.0;
> > } )];
> > // FIXME this rule matches e.g. 150.5 as 150 .5; we want that input to
> > // be matched by the decimal degrees rule as 150.5. In the qi code, which
> > // has the same fundamental problem, it is avoided because the double_
> > // parser is configured to reject input with a leading decimal point.
> > // It would be more correct to require that either a degrees symbol or
> > // whitepace or both between the degrees and minutes numbers. That's
> > // a bit complicated because the whitespace is handled by the skipper.
> > // In a bottom-up parser I think the ambiguity is resolved by putting
> > // decimal_degrees first in degrees_def below. That doesn't work for a
> > // top-down parser.
> > // FIXME for similar reasons, "150d 0.5m n" fails to parse, because
> > // the degrees_minutes_seconds rule matches 150,0,.5 and then the next
> > // rule up fails to parse the 'm'.
>
> This is really interesting! I did not know that Spirit X3 had made
> this change, but the real/float parser I use is just the one from
> Spirit X3. And sure enough, it's policies class starts with this:
>
> template <typename T>
> struct ureal_policies
> {
> // trailing dot policy suggested by Gustavo Guerra
> static bool const allow_leading_dot = true;
> static bool const allow_trailing_dot = true;
> static bool const expect_dot = false;
>
> So, not only does it allow a leading dot, but it's a constant; you'd
> have to inherit from this policies type and hide the value with a new
> one in the derived class.
>
> Again, I don't know why this was done, but I can understand the desire
> to match .6, 0.6, and 6. with the same parser. I also see how in your
> case the 0.6-only match has more natural semantics. I was able to
> achieve the same goal by changing your parser not to match on double
> if the input starts with '.'. So this:
>
> > const auto degrees_minutes_seconds_def = (bp::uint_ >> -degrees_symbol
> > >> uint_0_60 >> -minutes_symbol
> > >> double_0_60 >>
> -seconds_symbol) [( [](auto ctx) {
> > auto d = _attr(ctx)[0_c];
> > auto m = _attr(ctx)[1_c];
> > auto s = _attr(ctx)[2_c];
> > _val(ctx) = d + m/60.0 + s/3600.0;
> > } )];
>
> Becomes this:
>
> > const auto degrees_minutes_seconds_def = (bp::uint_ >> -degrees_symbol
> > >> uint_0_60 >> -minutes_symbol
> > >> (double_0_60 - '.') >>
> -seconds_symbol) [( [](auto ctx) {
> > auto d = _attr(ctx)[0_c];
> > auto m = _attr(ctx)[1_c];
> > auto s = _attr(ctx)[2_c];
> > _val(ctx) = d + m/60.0 + s/3600.0;
> > } )];
>
> That fixed the parse.
>
> [snip]
>
> > const auto signed_latitude_def = signed_degrees [( [](auto ctx) { auto
> d = _attr(ctx); _pass(ctx) = -90 <= d && d <= 90; } )];
> > const auto signed_longitude_def = signed_degrees [( [](auto ctx) { auto
> d = _attr(ctx); _pass(ctx) = -180 <= d && d <= 180; } )];
>
> One thing I noticed here -- you forgot the "_val(ctx) = _attr(ctx);"
> statement, and so this path of the parse would match, but produce 0.
>
> With those three changes (the fix, the " - '.'", and this missing
> assignment), everything seems to be working as expected. I've turned
> your example into a test:
>
>
> https://github.com/tzlaine/parser/blob/boost_review_changes/test/parse_coords_new.cpp
>
> > BOOST_PARSER_DEFINE_RULES(uint_0_60);
> > BOOST_PARSER_DEFINE_RULES(double_0_60);
> > BOOST_PARSER_DEFINE_RULES(degrees_decimal_minutes);
> > BOOST_PARSER_DEFINE_RULES(degrees_minutes_seconds);
> > BOOST_PARSER_DEFINE_RULES(degrees);
> > BOOST_PARSER_DEFINE_RULES(latitude);
> > BOOST_PARSER_DEFINE_RULES(longitude);
> > BOOST_PARSER_DEFINE_RULES(signed_latitude);
> > BOOST_PARSER_DEFINE_RULES(signed_longitude);
> > BOOST_PARSER_DEFINE_RULES(latlon);
>
> This could be replaced with a single use of BOOST_PARSER_DEFINE_RULES,
> FWIW.
>
> [snip]
>
> > That Ugly Macro
> > ---------------
> >
> > bp::rule<struct latitude, double> latitude = "latitude";
> > const auto latitude_def = (degrees >> northsouth) .... ;
> > BOOST_PARSER_DEFINE_RULES(latitude);
> >
> > Is a macro really the best way to do this? Spirit seems to
> > manage without.
>
> About that. This revision to the way that Spirit 2 did rules is not
> actually my invention. It comes from Spirit X3:
>
>
> https://live.boost.org/doc/libs/1_68_0/libs/spirit/doc/x3/html/spirit_x3/tutorials/minimal.html#spirit_x3.tutorials.minimal.parser_definitions
>
> I liked the way it was done there better than the Spirit 2 way, so I
> opted for the Spirit 3 way. You can't get recursive rules, or even
> mutually recursive rules, without a way to forward declare parsers
> somehow. Rules are what does that. In Spirit 2, they did this by
> including their attribute type as part of their type, and then using
> type erasure to avoid requiring the type of the parser to be part of
> the rule's type as well. In Spirit X3, a rule is instead a function
> template. I like this better as we don't need to build memory
> allocation into the rule to get decoupling. I dislike it because the
> function template is only subtly different from rule to rule, and so
> that leads naturally to the macro we have. I think the macro is the
> lesser of these evils.
>

This should be put in the docs in a section "Design Rationale".

>
> > Despite having a macro this is still very verbose. I've had to
> > write the name of the rule FIVE times in three lines. Once there
> > is a macro you might as well put everything inside it:
> >
> > BOOST_PARSER_DEFINE_RULE(latitude, double, (degrees >> northsouth)
> ....);
> >
> > Of course if you need to declare the rules before defining them
> > due to recursion or other forward references you will need separate
> > macros for declaration and definition.
>
> Yeah, that's what Spriit X3 does. There are DECLARE and DEFINE
> macros. I was trying to simplify. You're not the first one to ask
> for a macro that reduces the repetition needed for a single
> rule/parser pair. If enough others agree, I'll add this. I
> personally tend not to use macros like that much, or when I do, to
> make ad hoc ones that I locally #undef after use. So the lack of the
> DECLARE/DEFINE pair is just my personal taste.
>
> > These rule declarations seem to need to be at global scope
> > (maybe namespace scope?). Is that fundamental? Can't I put them
> > in a class or a function?
>
> The macro defines a pair of function templates for each rule you pass
> to it; this is mentioned in the docs. The implication is that the
> macro should be used at namespace scope. As long as the tag type you
> use is in the same namespace where you write the macro, ADL will do
> the rest.
>

This explanation should go into docs.

Regards,
&rzej;

>
> > Semantic Actions
> > ----------------
> >
> > In Bison I can write semantic actions like this:
> >
> > { $$ = $1 + $2/60.0; }
> >
> > After expanding the $ variables that gets copied verbatim into
> > the bison output, so it can include any code.
> >
> > Spirit clearly wanted to replicate that sort of functionality,
> > but it pre-dates lambdas, so it includes a lot of amazing
> > functionality that allows me to write semantic actions like:
> >
> > [ _val = _1 + _2/60.0 ]
> >
> > This is nice and concise but its limitation is of course that I
> > cannot write arbitrary code in there.
> >
> > (BTW, are we strictly allowed to use identifiers starting with _
> > like _val?)
>
> Are you asking about the Spirit 2 use of Phoenix? There is a naming
> convention there and in Spirit X3, and in Boost.Parser. In the latter
> case though, it's just a lambda, so you can name it whatever you like.
>
> > What should this look like now that we have lambdas in the
> > language? Of course it's going to be a bit more verbose because
> > of the syntax overhead, but I was hoping that it would be
> > something like this:
> >
> > [ [](auto deg, auto min) { return deg + min/60.0; } ]
>
> This is the biggest thing I miss from Spirit 2 code vs. Spirit X3
> code. However, I don't really miss dealing with the verbosity of
> Phoenix in the non-trivial cases. Lambdas are a win on balance I
> find.
>
> > (Aside: That doesn't actually work because [[ can only appear in
> > attributes even if there is a space, so it needs to be written
> > [( [] ... )]. I did wonder if changing from operator[] to
> > operator() might make sense?)
>
> Sure. C++. I don't like the idea of changing the operator to
> operator(), because 1) everyone used to Spirit is used to operator[],
> and 2) operator() is already valid on many parsers and directives.
> Having to deal with a new, anything-goes operator() would be fraught
> at best.
>
> > That's longer than before, but I have given meaningful names to
> > the parameters (which I can also do in Bison BTW) and I can
> > include any code.
> >
> > But that's not the syntax that the current proposal needs. It
> > needs this:
> >
> > [( [](auto ctx) { _val(ctx) = _attr(ctx)[0_c] + _attr(ctx)[1_c]; } )]
>
> Yep. You could also write:
>
> auto [a, b] = _attr(ctx);
> _val(ctx) = a + b;
>
> ... if you were using std::tuple. Neither of those is as pithy as
> _val = _1 + _2. However, as soon as you do anything significantly
> more complicated than that, lambdas are better.
>
> > Frankly I think that combines the worst of all the other options
> > and adds some extra new verbosity of its own.
> >
> > One other advantage of having the lambda return a result could
> > be improving how the semantic type (aka attribute type) of a
> > rule is set. For example, if I have the rule
> >
> > degrees >> minutes
> >
> > where degrees and minutes return doubles, the semantic type of
> > the combined rule is something like tuple<double,double>. But
> > I'm going to add a semantic action that adds the two values as
> > shown above, returning one double. If the return value of the
> > semantic action's lambda were used as the semantic type of the
> > rule then we could write:
> >
> > auto rule = (degrees >> minutes)
> > [( [](auto d, auto m) { return d + m/60.0; } )];
> >
> > and rule would return a double to its parent rule. As it is, the
> > semantic action doesn't change the type and we have to
> > explicitly and verbosely declare the rule as returning a double.
>
> I had not thought of this before, but I really like it. I'll have to
> think pretty deeply about whether there is some reason this might not
> work, but I'm definitely going to try to implement this. I'm thinking
> that the rule would be: if std::apply(f, tup) is well-formed (where f
> is your invocable, and tup is your tuple of values returned by
> _attr()), and decltype(std::apply(f, tup)) is assignable to _val(),
> then do the semantic action as _val(ctx) = std::apply(f, _attr(ctx));
> otherwise, the semantic action gets called with the current context..
> There would be a special case when _attr() is not a tuple, but you get
> the idea. Ticket to track it:
> https://github.com/tzlaine/parser/issues/106
>
> > Of course the context includes more than just the child rule
> > semantic values and a slot for the result; for example it
> > includes the _pass boolean that I use to clamp the range of
> > values in my example:
> >
> > [( [](auto ctx) { _pass(ctx) = _attr(ctx)[0_c] <= 90; } )];
> >
> > I suggest that either a special lambda signature is used to
> > indicate that it is a predicate, or an additional "directive"
> > e.g. predicate[ ... ] is added.
>
> This idea I don't like as much. I think the previous idea makes
> things simpler in simpler cases, and something like _pass(ctx) should
> just use the ctx version of the semantic action.
>
> > Compilation Times and Errors
> > ----------------------------
> >
> > Compilation times for my example were:
> >
> > Boost.Parser: 28.6 s
> > Boost.Spirit: 30.1 s
>
> I get something pretty similar, except that it's about 5 seconds for each.
>
> > but the grammar is not quite identical, so the small difference
> > is not significant. Code size is somewhat larger for Parser than
> > for Spirit.
> >
> > (g++ 12.2.0, -O3, Linux, aarch64.)
> >
> > I did not particularly notice improved error messages compared
> > to Spirit, though they weren't worse. Consider for example this
> > code from my example:
> >
> > return parse(s, input, bp::ws|bp::lit(','));
> >
> > Originally I had this:
> >
> > auto skipper = bp::ws | bp::lit(',');
> > return parse(s, input, skipper);
>
> Oof. Yeah, that's a constraint problem. It's pretty easily fixed;
> see https://github.com/tzlaine/parser/issues/105 . If you make
> skipper const, the problem goes away. The problem is that the
> overload getting selected is the one that takes an out-param, so it's
> trying to assign your result to skipper.
>
> > But that failed to compile, and I got 15 screenfuls of
> > inpenetrable error messages. I fixed it more-or-less by randomly
> > changing things until it worked. If I now change it back I get
> > only two screenfuls that accurately say that I can't return a
> > bool from a function that's supposed to return an optional, but
> > I still don't know why the return type of parse() changes when I
> > move the declaration of the skipper.
> >
> > So I fear that the idea that compile times and error verbosity
> > are improved compared to Spirit are not borne out.
>
> Well, its a mixed bag. The kind of error you ran into is the worst
> kind. When you try to use an out-param for your parser's result that
> does not work, the errors are very messy. It sort of has to be this
> way though, because the loose-matching of parser attribute to
> out-param works becasue we defer any kind of checking until the last
> possible moment -- which is always dozens of instantion stack frames
> in.
>
> Where the library improves over Spiriit is when you write many kinds
> of errors in your own semantic action lambdas, especially when the
> error is some kind of call to ctx, like _attr(ctx). That can often be
> turned into a runtime error that is very quickly diagnosed.
>
> > Bugs
> > ----
> >
> > I found a couple of bugs while writing this review which Zach
> > has resolved. I suspect that this code has not had enough varied
> > use to flush out all the issues.
>
> This well might be. There are a lot of tests, but the combinatorics
> make it hard to catch all the corners.
>
> > Other Issues
> > ------------
> >
> > The double_ parser is not configurable in the way that the
> > Spirit double_ parser is. This is problematic; in my example
> > 3e6n means "3 east 6 north", not "3000000 north", so I need a
> > double_ parser that doesn't recognise exponents. I'd also like
> > to ignore values with leading .s, i.e. 10.1 should not match
> > int_ >> double_ as "10 0.1", and I really don't want a hacker to
> > sneak a nan or inf into my code if it recognises those.
>
> Addressed above.
>
> > The docs are not bad and are reminiscent of the Spirit docs.
> > Personally I would prefer to see an example early on that takes
> > a chunk of real BNF and translates it into a parser.
> >
> > I had issues with Apple Clang (15.0.0); it would not compile
> > "#include <boost/parser.hpp>" until I turned off concepts. Has
> > anyone tried with other Clangs? If this is not fixable maybe the
> > library should detect this and use non-concepts mode
> > automatically.
>
> This appears to be a Clang bug, or very nearly related to one; see
> https://github.com/tzlaine/parser/issues/89 . Andrzej has since said
> that some change I made seems to have fixed/side-stepped the problem.
> I don't have great access to Clang, so I did not verify this myself.
>
> > Compiling with -Wall resulted in lots of unused parameter
> > warnings. These can be fixed by commenting-out the parameter
> > name, leaving just the type. Boost libraries should compile
> > without warnings, some people use -Werror.
>
> I usually do compile with -Wall. Perhaps it got turned off at some
> point. Will fix: https://github.com/tzlaine/parser/issues/107
>
> > Conclusion
> > ----------
> >
> > I was hoping to find a library that was an improvement on
> > Spirit, but I regret to say that that's not what I've found
> > here.
> >
> > The only notable improvement compared to Spirit.Qi that I've
> > found is that no_case[] is propogated to nested rules, which
> > isn't the case with Qi.
> >
> > Rules themselves are essentially the same as Spirit. The method
> > for defining rules is more verbose and now needs an ugly macro.
> > Semantic actions are less readable.
> >
> > It is possible that there are significant advantages compared to
> > Spirit that I have missed, as a result of having used working
> > Spirit code as my starting point, but I have read the doc's
> > comparison with Spirit page.
> >
> > My conclusion is that this library should not be accepted.
> >
> > If others believe it should be accepted then I feel that should
> > be conditional on many of the issues I've mentioned above being
> > resolved, e.g. unused parameter warnings, configuration of the
> > double_ parser, etc.
>
> [snip]
>
> Thanks for spending the time, and for your thoughtful review, Phil.
>
> Zach
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk