Boost logo

Boost :

Subject: Re: [boost] Push/pull parsers & coroutines (Was: Boost.HTTPKit, a new library from the makers of Beast!)
From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2018-01-01 21:30:12


I'm excited about this subject (and ranges-TS). I believe these changes
will shape the future as how we design parsers in C++.

However, I can only focus on one project at a time. For now, this is this
C++03 parser.

2017-10-13 15:59 GMT-03:00 Phil Endecott via Boost <boost_at_[hidden]>:

> Dear All,
>
> This is related to the ongoing discussion of the Beast HTTP parser.
> I have been thinking in general about how best to implement parser
> APIs in modern and future C++. Specifically, I've been wondering
> whether the imminent arrival of low-overhead coroutines ought to
> change best practice for this sort of interface.
>
> In the past, I have found that there is a trade-off between parser
> implementation complexity and client code complexity. A "push" parser,
> which invokes client callbacks as tokens are processed, is easier to
> implement but harder to use as the client has to track its state
> between callbacks with e.g. an explicit FSM. On the other hand, a
> "pull parser" (possibly using an iterator interface) is easier for
> the client but instead now the parser may need the explicit state
> tracking.
>
> Now, with stackless coroutines due "real soon now", we can avoid
> needing explicit state on either side. In the parser we can
> co_yield tokens as they are processed and in the client we can
> consume them using input iterators. The use of co-routines doesn't
> need to be explicit in the API; the parser can be said to return a
> range<T>, and then return a generator<T>.
>
> Here's a very very rough sketch of what I have in mind, for the case
> of HTTP header parsing; note that I don't even have a compiler that
> supports coroutines yet so this is far from real code:
>
> generator<char> read_input(int fd)
> {
> char buf[4096];
> while (1) {
> int r = ::read(fd,buf,4096);
> if (r == 0) return;
> for (int i = 0; i < r; ++i) {
> co_yield buf[i];
> }
> }
> }
>
> template <typename INPUT_RANGE>
> generator< pair<string,string> > parse_header_lines(INPUT_RANGE input)
> {
> typedef INPUT_RANGE::const_iterator iter_t;
> iter_t i = input.begin(), e = input.end();
> while (i != e) {
> iter_t j = std::find(i,e,':');
> string k(i,j);
> // (That's broken, as iter_t is a single-pass input iterator. We
> // need to copy to the string and check for ':' at the same time.
> // It's trivial with a loop.)
> ++j;
> iter_t k = std::find(j,e,'\n');
> string v(j,k);
> ++k;
> i = k;
> co_yield pair(k,v);
> }
> }
>
> void parse_http_headers(int fd)
> {
> map<string,string> headers;
> auto g = parse_header_lines( read_input(fd) );
> for (auto h: g) {
> headers.insert(h);
> }
> }
>
> An "exercise for the reader" is to extend that to something that will
> parse headers followed by a body.
>
> Questions: how efficient is this in practice? Is this really simpler to
> write than a non-coroutine version? Will all of our code use this style
> in the (near?) future? How should we be writing code now so that it is
> compatible with this style in the future?
>
> Thanks for reading,
>
>
> Phil.
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman
> /listinfo.cgi/boost
>

-- 
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk