Boost logo

Boost :

From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2020-09-19 20:47:19


Em sex., 18 de set. de 2020 às 05:57, Niall Douglas via Boost
<boost_at_[hidden]> escreveu:
> Firstly, that was a great essay on the backing theory Vinicius. I only
> wish more people who write parsers would read that first. I would urge
> you to convert it into a blog post or something similar, post it online
> so people can find it, and all that great explanation of theory doesn't
> get lost forever.

Thanks, Niall. You can share this link if you want:
https://gitlab.com/-/snippets/2016550

> In the very specific case of parsing JSON however, I'm not sure if the
> standard rules of evaluation apply. The author of sajson claims that
> most of his speed comes from not being a pull parser. What you do is
> zero copy DMA the incoming socket data into a memory mapped buffer,
> execute sajson's AST parse upon that known sized memory mapped buffer
> which encodes the AST directly into the source by modifying the buffer
> in place to avoid dynamic memory allocations completely, and voila bam
> there's your JSON parsed with a strict minimum of memory copied or cache
> lines modified. He claims, and I have no reason to doubt him, that
> because he can make these hard coded assumptions about the input buffer,
> he was able to make a very fast JSON parser (amongst the fastest
> non-SIMD parsers). By inference, a pull parser couldn't be as fast.
>
> I find that explanation by sajson's author compelling. The fact he
> completely avoids dynamic memory allocation altogether, and builds the
> AST inline into the original buffer of JSON, is particularly compelling.

Design-wise, sajson has at least 2 tricks worth discussing:

- It doesn't expose stream events to the user. So the pull/push taxonomy
  doesn't really apply here.
- It modifies the input stream. That's a destructive parsing
  technique. That's the section "a faster DOM tree" from my review. Thanks
  for bringing this project to our attention.

As for Boost.JSON, none of the above matters. Point 1 one could matter if
the parser was an implementation detail, but that's not the case.

Leaving the Boost.JSON review topic aside for a sentence, I share your
assessment.

> I haven't looked at Boost.JSON. But it seems to target a more C++
> idiomatic API, be pluggable for other formats like Boost.Serialisation,
> but retain most of the performance of JSON parsers such as sajson or
> simdjson. As Boost reviews primarily review API design, Boost.JSON's
> choice of approach fits well for the process here. Boost prefers purity
> over performance.

`json::value` can have integration to Boost.Serialization, but its parser
(this would be the archive concept) can't. I could write a detailed
explanation here like I've done for the pull/push taxonomy... maybe another
day.

> I suspect most users of JSON by far would have the exact same attitude
> as I do. For users like us, we really don't care what the parser does,
> or how it is designed, or whatever crappy API it might have, all we care
> about is maximum possible data extraction performance. Never ever
> calling malloc is an excellent sign of the right kind of JSON parser
> design, at least in my book.

A coworker of mine has thoughts similar to yours. Great guy.

Anyway, I feel like resuming the project that I've put on stall, so that's
my goodbye. I'll still keep an eye on the discussions, but I'll try to stay
mostly silent. Have fun, you all.

And nice talking to you again, Niall. How many years since we worked
together (even if only for a brief period)? :)

--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk