Boost logo

Boost :

Subject: Re: [boost] Push/pull parsers & coroutines (Was: Boost.HTTPKit, a new library from the makers of Beast!)
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2017-10-13 19:24:11


On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
<boost_at_[hidden]> wrote:
> Dear All,
> A "push" parser,
> which invokes client callbacks as tokens are processed, is easier to
> implement but harder to use as the client has to track its state
> between callbacks with e.g. an explicit FSM. On the other hand, a
> "pull parser" (possibly using an iterator interface) is easier for
> the client but instead now the parser may need the explicit state
> tracking.

That is generally true, and especially true for XML and other
languages that have a similar structure. Specifically, that there are
opening and closing tags which determine the validity of subsequent
grammar, and have a recursive structure (like HTML).

But this is not the case for HTTP. There are no opening and closing
tags. There is no need to keep a "stack" of "open tags". It is quite
straightforward. Therefore, when designing an HTTP parser we can place
less emphasis on the style of parser and instead focus those energies
to other considerations (as I described in my previous post, regarding
the separation of concerns for stream algorithms and parser
consumers).

If you look at the Beast parser derived class, you can see that the
state is quite minimal:

    template<bool isRequest, class Body, class Allocator>
    class parser
        : public basic_parser<isRequest, parser<isRequest, Body, Allocator>>
    {
        message<isRequest, Body, basic_fields<Allocator>> m_;
        typename Body::writer wr_;
        bool wr_inited_ = false;
        std::function<...> cb_h_; // for manual chunking
        std::function<...> cb_b_; // for manual chunking
        ...

<https://github.com/boostorg/beast/blob/f09b2d3e1c9d383e5d0f57b1bf889568cf27c39f/include/boost/beast/http/parser.hpp#L45>

Callbacks don't need to store state used by subsequent callbacks to
interpret the incoming structured HTTP data, because HTTP is simple
compared to XML or HTML.

> Here's a very very rough sketch of what I have in mind, for the case
> of HTTP header parsing; note that I don't even have a compiler that
> supports coroutines yet so this is far from real code:

I think it is great that you're providing an example but you have
chosen the most simple, regular part of HTTP which is the headers. I
suspect that if you try to use the iterator model for the start-line
(which is different for requests and responses) and then try to
express the message body using iterators you will run into
considerable difficulty coming up with a design that is elegant and
feature-rich. Especially when you consider the need to transform the
chunk-encoding while providing the metadata to the caller. I know this
because I went through many iterations before settling on what is in
Beast currently.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk