Boost logo

Boost :

From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2023-12-29 09:52:55


czw., 28 gru 2023 o 22:05 Zach Laine via Boost <boost_at_[hidden]>
napisał(a):

> I'm trying to gauge interest in a parsing library to replace
> Boost.Spirit 2/Spirit X3. I'm also looking for endorsements.
>
> The library is intended to remedy some shortcomings of Boost.Spirit*.
> I think these are great libraries, but Spirit 2 was written in pre-11
> C++ (I think; certainly its dependencies were). Most-to-all of the
> downsides stem from that -- long compile times, inscrutable
> compilation failures, etc. (Boost.Parser compile times are quite
> low.)
>
> I'm calling my proposal Boost.Parser, and it follows many of the
> conventions of Boost.Spirit 2 and X3, such as the operators used for
> overloading, the names of many parsers and directives, etc. It
> requires C++17 or later.
>
> From the introduction in the online docs:
> """
> Boost.Parser is a parser combinator library. That is, it consists of a
> set of low-level primitive parsers, and operations that can be used to
> combine those parsers into more complicated parsers.
>
> There are primitive parsers that parse epsilon (the empty string),
> chars, ints, floats, etc.
>
> There are operations which combine parsers to create new parsers. For
> instance, the Kleene star operation takes an existing parser p and
> creates a new parser that matches zero or more occurrences of whatever
> p matches. Both callable objects and operator overloads are used for
> the combining operations. For instance, operator*() is used for Kleene
> star, and you can also write repeat(n)[p] to create a parser for
> exactly n repetitions of p.
>
> Boost.Parser also tries to accommodate the multiple ways that people
> often want to get a parse result out of their parsing code. Some
> parsing may best be done by returning an object that represents the
> result of the parse. Other parsing may best be done by filling in a
> preexisting data structure. Yet other parsing may best be done by
> parsing small sections of a large document, and reporting the results
> of subparsers as they are finished, via callbacks. Boost.Parser
> accommodates all these ways of working, and even makes it possible to
> do callback-based or non-callback-based parsing without rewriting any
> code (except by changing the top-level call from parse() to
> callback_parse()).
>
> All of Boost.Parser's public interfaces are sentinel- and
> range-friendly, just like the interfaces in std::ranges.
>
> Boost.Parser is Unicode-aware through and through. When you parse
> ranges of char, Boost.Parser does not assume any particular encoding —
> not Unicode or any other encoding. Parsing of inputs other than plain
> chars assumes that the input is Unicode. In the Unicode-aware code
> paths, all parsing is done by matching code points. This means that
> you can feed UTF-8 strings into Boost.Parser, both as input and within
> your parser, and the right sort of matching occurs. For instance, if
> your parser is trying to match repetitions of the char '\xcc' (which
> is a lead byte from a UTF-8 sequence, and so is malformed UTF-8 if not
> followed by an appropriate UTF-8 code unit), it will not match the
> start of "\xcc\x80" (UTF-8 for the code point U+0300). Boost.Parser
> knows that the matching must be whole-code-point, and so it interprets
> the char '\xcc' as the code point U+00CC.
>
> Error reporting is important to get right, and it is important to make
> errors easy to understand, especially for end-users. Boost.Parser
> produces runtime parse error messages that are very similar to the
> diagnostics that you get when compiling with GCC and Clang (it even
> supports warnings that don't fail the parse). The exact token
> associated with a diagnostic can be reported to the user, with the
> containing line quoted, and with a marker pointing right at the token.
> Boost.Parser takes care of this for you; your parser does not need to
> include any special code to make this happen. Of course, you can also
> replace the error handler entirely, if it doesn't fit your needs.
>
> Debugging complex parsers can be a real nightmare. Boost.Parser makes
> it trivial to get a trace of your entire parse, with easy-to-read (and
> very verbose) indications of where each part of the trace is within
> the parse, the state of values produced by the parse, etc. Again, you
> don't need to write any code to make this happen — you just pass a
> parameter to parse().
>
> Dependencies are still a nightmare in C++, so Boost.Parser can be used
> as a purely standalone library, independent of Boost.
> """
>
> Boost.Parser aims to be a superset of Boost.Spriit* in most ways.
> Major things missing from the set of features in Spirit 2 + Spirit X3
> are:
>
> - A separate lexer.
> - Binary parsers (meaning for parsing bits, not binary numbers written
> as text; the latter is fully supported).
>
> I've been in touch with Joel de Guzman, Hartmut Kaiser, and Michael
> Caisse, to make sure I was not toe-stomping, for those who are
> concerned about that. They gave this new library their blessing. One
> feature comes entirely from them: Boost.Parser is usable in a
> Boost-free environment -- as a standalone library -- at the user's
> option. They said that was the #1 request from users, which surprised
> me a bit.
>
> The Github page is here: https://github.com/tzlaine/parser
> The online docs are here: https://tzlaine.github.io/parser
>
> To see an extended example, here's a JSON parser that passes all the
> published JSON tests, including most of the optional ones, in only
> about 300 lines of code, go here:
>
>
> https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/extended_examples/parsing_json.html
>
> Finally, for those wanting to know how this lib differs from
> Boost.Spirit* without digging through the docs, here is the doc page
> that explains Boost.Parser's relationship to Boost.Spirit*:
> """
> Boost.Spirit is a library that is already in Boost, and it has been
> around for a long time.
>
> However, it does not suit user needs in some ways.
>
> Spirit 2 suffers from very long compile times.
> Spirit 2 has error reporting that requires a lot of user intervention to
> work.
> Spirit 2 requires user intervention, including a (long) recompile, to
> enable parse tracing.
> Spirit X3 has rules that do not compose well — the attributes produced
> by a rule can change depending on the context in which you use the
> rule.
> Spirit X3 is missing many of the convenient interfaces to parsers that
> Spirit 2 had. For instance, you cannot add parameters to a parser.
> All versions of Spirit have Unicode support, but it is quite difficult
> to get working.
> I wanted a library that does not suffer from any of the above
> limitations. It should be noted that while Spirit X3 only has a couple
> of flaws in the list above, the one related to rules is a
> deal-breaker. The ability to write rules, test them in isolation, and
> then re-use them throughout a complex parser is essential.
>
> Though no version of Boost.Spirit (Spirit 2 or Spirit X3) suffers from
> all those limitations, there also does not exist any one version that
> avoids all of them. Boost.Parser does so. However, there are a lot of
> great ideas in Boost.Spirit that have been retained in Boost.Parser.
> Both libraries:
>
> - use the same operator overloads to combine parsers;
> - use approximately the same set of directives to influence the parse
> (e.g. lexeme[]);
> - provide loosely-coupled rules that are separately compilable (at
> least for Spirit X3); and
> - are built around a flexible parse context object that has state
> added to and removed from it during the parse (again, comparing to
> Spirit X3).
> """
>

Hi Zach,
Thank you for writing and sharing this library. I intend to test it on my
mini-language early next year.
For now, let me dig a bit about the high-level differences between
Boost.Parser and Boost.SpiritX3.

Your introduction mentions "a separate lexer" as a feature that
Boost.Spirit is missing.
How does that square with the entire section for Spirit.Lex in Boost.Spirit
docs?

"Boost.Parser aims to be a superset of Boost.Spirit". But Boost.Spirit is
also a generator.

You mention that "Spirit X3 has rules that do not compose well". I
personally never experienced this. Is there an example somewhere that would
illustrate this problem?

What is the recommendation of Boost.Spirit authors to the programmers that
need to do parsing? Is Boost.Parser simply the newer and improved version,
or do they have disjoint sets of use cases?

Personally, skimming through the docs, I find the feature of producing
custom error and warning messages very attractive. This is what I was
always missing from the parsing libraries.

Thanks again for your effort.

Regards,
&rzej;


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk