Boost logo

Boost :

Subject: Re: [boost] Push/pull parsers & coroutines
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2017-10-14 19:03:34


Vinnie Falco wrote:
> On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
> <boost_at_[hidden]> wrote:
>> A "push" parser,
>> which invokes client callbacks as tokens are processed, is easier to
>> implement but harder to use as the client has to track its state
>> between callbacks with e.g. an explicit FSM. On the other hand, a
>> "pull parser" (possibly using an iterator interface) is easier for
>> the client but instead now the parser may need the explicit state
>> tracking.
>
> That is generally true, and especially true for XML and other
> languages that have a similar structure. Specifically, that there are
> opening and closing tags which determine the validity of subsequent
> grammar, and have a recursive structure (like HTML).
>
> But this is not the case for HTTP. There are no opening and closing
> tags. There is no need to keep a "stack" of "open tags". It is quite
> straightforward. Therefore, when designing an HTTP parser we can place
> less emphasis on the style of parser and instead focus those energies
> to other considerations (as I described in my previous post, regarding
> the separation of concerns for stream algorithms and parser
> consumers).
>
> If you look at the Beast parser derived class, you can see that the
> state is quite minimal:
>
> template<bool isRequest, class Body, class Allocator>
> class parser
> : public basic_parser<isRequest, parser<isRequest, Body, Allocator>>
> {
> message<isRequest, Body, basic_fields<Allocator>> m_;
> typename Body::writer wr_;
> bool wr_inited_ = false;
> std::function<...> cb_h_; // for manual chunking
> std::function<...> cb_b_; // for manual chunking
> ...

You still have an explicit state machine, i.e. a state enum and a overview.html
switch statement in a loop; I'm looking at impl/basic_parser.ipp for
example.

But I don't want to dwell on this particular code. I'm just considering,
generally, whether this style of code is soon going to look "antique" -
in the way that 15-year-old code full of explicit new and delete looks
antediluvian now that we're all using smart pointers.

I think it's clear that often coroutines can make the code simpler to
write and/or easier to use. The question is what do we lose. The
issue of generator<T> providing only input iterators is the most
significant issue I've spotted so far. This is in some way related
to the whole ASIO "buffer sequence" thing; the code I posted before
read into contiguous buffers, but that was lost before the downstream
code saw it, so it couldn't hope to optimise with e.g. word-sized
copies or compares. Maybe this could be fixed with some sort of segmented
iterator, or something other than generator<T> as the coroutine type,
or something. Or maybe it's unfixable.

Do other languages have anything to teach us about this? What do
users of Boost.Coroutine think?

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk