Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-03-02 10:47:13


Zach Laine wrote:
> On Fri, Mar 1, 2024 at 2:15 AM Peter Dimov <pdimov_at_[hidden]> wrote:
> >
> > Zach Laine wrote:
> > > > > I'm not sure what the tests reported earlier were doing, but I
> > > > > definitely don't see orders of magnitude difference.
> > > >
> > > > I used this file:
> > > >
> > > > https://github.com/boostorg/json/blob/develop/bench/data/twitter.j
> > > > son
> > >
> > > Using that file, I get similar results to what I already posted (the
> > > numbers are different, but the ratios between them are the same).
> > >
> > > I managed to make a change that was much smaller -- no template
> > > parameter required. I consistenty see 1.5x slowdown for Parser vs.
> > > Boost.JSON for files around this size, and 2x or so for much larger
> > > files (~25MB).
> >
> > That's probably because you are measuring I/O in addition to parsing.
> >
> > https://github.com/cmazakas/parser-review/blob/main/test/json.cpp
> >
> > only measures the parsing part.
>
> Quite right! After building it locally, I do see an advantage to the template
> parameter approach -- that is, controlling the contents of scoped_trace_t with
> an NTTP that is true/false when trace is enabled/disabled. scoped_trace_t is
> specialized to be empty when the NTTP is false. I'll be committing the template
> param-based approach soon.
>
> For those keeping track, with Christian's test framework, I get 35X worse
> performance of the JSON parser example vs. Boost.JSON. This seems like a
> reasonable factor to me. I wrote that example in a single, and made no
> attempt to optimize it in any way; it is an example. Boost.JSON is a dedicated
> JSON parser emphasizing efficiency.

There's probably some room for Boost.Parser automatically applying certain
optimizations to "naively" written parsers.

For example, a lot of the time here is spent in skipping whitespace, using this
rule:

    auto const ws_def = '\x09'_l | '\x0a' | '\x0d' | '\x20';

This generates an or_parser that contains four separate omit_parsers, each
with a char_ inside.

If I change it by hand to

    auto const ws_def = bp::omit[ bp::char_( "\x09\x0a\x0d\x20" ) ];

which I assume is equivalent, time improves from 365ms to 316ms (*). And
that's still far from optimal, because (1) skipping whitespace can in principle
be done with the equivalent of `find_first_not_of` and (2) the characters are
a runtime value in the above, but something like `cxchar<'\0x09', '\0x0a',
'\0x0d', '\x20'>` could be even faster.

It should be possible for the library to automatically turn the "naive" ws_def
into something more optimal, and then special-case `skip` for this common
case.

In the general case this is complicated by the existence of attributes and
semantic actions; transforming parsers should take care to preserve those.
But in this specific case none are present.

(*) Under clang-cl. GCC -O3 has different views of the world and times
there are 165ms and 181ms, respectively.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk