Boost :

Date view	Thread view	Subject view	Author view

From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-03-02 10:47:13

Next message: Marshall Clow: "Reminder: Master branch closes for the 1.85.0 beta release on WEDNESDAY"
Previous message: Zach Laine: "Re: Late review of Boost.Parser"
Maybe in reply to: Peter Turcan: "Re: Reminder: Review for Zach's parsing library starts on Feb 19th"
Next in thread: Christopher Kormanyos: "Re: Reminder: Review for Zach's parsing library starts on Feb 19th"

Zach Laine wrote:
> On Fri, Mar 1, 2024 at 2:15â€¯AM Peter Dimov <pdimov_at_[hidden]> wrote:
> >
> > Zach Laine wrote:
> > > > > I'm not sure what the tests reported earlier were doing, but I
> > > > > definitely don't see orders of magnitude difference.
> > > >
> > > > I used this file:
> > > >
> > > > https://github.com/boostorg/json/blob/develop/bench/data/twitter.j
> > > > son
> > >
> > > Using that file, I get similar results to what I already posted (the
> > > numbers are different, but the ratios between them are the same).
> > >
> > > I managed to make a change that was much smaller -- no template
> > > parameter required. I consistenty see 1.5x slowdown for Parser vs.
> > > Boost.JSON for files around this size, and 2x or so for much larger
> > > files (~25MB).
> >
> > That's probably because you are measuring I/O in addition to parsing.
> >
> > https://github.com/cmazakas/parser-review/blob/main/test/json.cpp
> >
> > only measures the parsing part.
>
> Quite right! After building it locally, I do see an advantage to the template
> parameter approach -- that is, controlling the contents of scoped_trace_t with
> an NTTP that is true/false when trace is enabled/disabled. scoped_trace_t is
> specialized to be empty when the NTTP is false. I'll be committing the template
> param-based approach soon.
>
> For those keeping track, with Christian's test framework, I get 35X worse
> performance of the JSON parser example vs. Boost.JSON. This seems like a
> reasonable factor to me. I wrote that example in a single, and made no
> attempt to optimize it in any way; it is an example. Boost.JSON is a dedicated
> JSON parser emphasizing efficiency.

There's probably some room for Boost.Parser automatically applying certain
optimizations to "naively" written parsers.

For example, a lot of the time here is spent in skipping whitespace, using this
rule:

auto const ws_def = '\x09'_l | '\x0a' | '\x0d' | '\x20';

This generates an or_parser that contains four separate omit_parsers, each
with a char_ inside.

If I change it by hand to

auto const ws_def = bp::omit[ bp::char_( "\x09\x0a\x0d\x20" ) ];

which I assume is equivalent, time improves from 365ms to 316ms (*). And
that's still far from optimal, because (1) skipping whitespace can in principle
be done with the equivalent of `find_first_not_of` and (2) the characters are
a runtime value in the above, but something like `cxchar<'\0x09', '\0x0a',
'\0x0d', '\x20'>` could be even faster.

It should be possible for the library to automatically turn the "naive" ws_def
into something more optimal, and then special-case `skip` for this common
case.

In the general case this is complicated by the existence of attributes and
semantic actions; transforming parsers should take care to preserve those.
But in this specific case none are present.

(*) Under clang-cl. GCC -O3 has different views of the world and times
there are 165ms and 181ms, respectively.

Next message: Marshall Clow: "Reminder: Master branch closes for the 1.85.0 beta release on WEDNESDAY"
Previous message: Zach Laine: "Re: Late review of Boost.Parser"
Maybe in reply to: Peter Turcan: "Re: Reminder: Review for Zach's parsing library starts on Feb 19th"
Next in thread: Christopher Kormanyos: "Re: Reminder: Review for Zach's parsing library starts on Feb 19th"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk