Subject: Re: [boost] [http] Formal Review
From: Lee Clagett (forum_at_[hidden])
Date: 2015-08-14 01:55:57
On Fri, Aug 14, 2015 at 1:06 AM, VinÃcius dos Santos Oliveira <
> 2015-08-13 22:29 GMT-03:00 Lee Clagett <forum_at_[hidden]>:
> > On Thu, Aug 13, 2015 at 11:43 AM, VinÃcius dos Santos Oliveira <
> > vini.ipsmaker_at_[hidden]> wrote:
> > > 2015-08-12 19:27 GMT-03:00 Lee Clagett <forum_at_[hidden]>:
> > >
> > > > Anyway - I was thinking along the same lines at various points.
> > a
> > > > function that pre-generates HTTP messages is very useful IMO, and
> > should
> > > > likely be included. Designing a good parsing concept would be a bit
> > more
> > > > work I think, but probably worth it too. I'm not sure how the author
> > > > intends to swap out parsers in the current design. Having a fixed
> > parser
> > > > seems acceptable, but the author almost seemed to suggest that it
> > > be
> > > > selectable somehow.
> > > >
> > >
> > > A parser doesn't make sense for all communication channels.
> > >
> > > Do you have an example of a communication channel where a parser
> > wouldn't work? They wouldn't necessarily always provide the same output
> > behave the same way, but a communication channel has a defined format.
> > implementation reading that format generally has _some_ output, which is
> > pretty much a parser IMO. Sorry for the bikeshedding on this, its not
> > really necessary, but this stuck out for some reason.
> Well, in my previous answer, I think I ended focusing on the wrong part of
> the proposal. Let me fix this issue in this email. And thank you for
> helping me figuring out my mistake (or "keep insisting" or "not losing hope
> on me", what you prefer).
> To handle a HTTP request, you read the metadata, progressively download the
> body and then the trailers. It's wise to avoid reading partial metadata
> because the request can only be handled after the whole metadata has been
> read. However, the body can be handled as it is received.
> You're arguing that you always (1) fill a buffer and then (2) parse it. CGI
> uses environment variables, not a contiguous chunk of memory or stream of
> bytes. It still could work, though. The headers would be serialized into
> the buffer (not nice).
> Using the buffer/view approach, messages are always in serialized format.
> Unless you store/cache the parsing result (doing allocation or some fixed
> size, as you do not know amounts ahead-of-time), this will consume more
> time to handle, as you'll need to reparse every time an information is
> asked ("give me header host", "give me header cookie"). If you do store
> parsing result, you're just storing the message using a
> masked-as-not-message-based-when-it's-not API. And unless your buffer is
> used to store more info than the real network traffic, the view needs to be
> get information from the socket too, not just the buffer. It's not a pure
> parser, there is state not found on the buffer (imagine how you'd handle
> progressive download where lots of the traffic was already discarded to
> handle the rest of the message), so the view needs the socket, which is
> storing information not present in the buffer. It's like wasting much more
> CPU usage to avoid some more memory usage. These are just the basic changes
> of impact.
> On a high-level side of thinking, buffer/view approach makes all requests
> immutable by default. You cannot fake or inject data in the headers while
> you pass the headers along a chain of handlers. There are workarounds.
> Also, if you cannot forge the HTTP message, you cannot create it and send
> to the socket. You always need to use the generator. The problem with the
> proposed generator is the lack of consideration for other HTTP backends.
> More care need to be given to capabilities like happens in Boost.Http (is
> chunking available? 100-continue? can I upgrade? ...?). Also, like I've
> stated in one of the previous emails, it's tricky to get interoperability
> right with these not explicit generators (is chunking going to be
> implicitly used?).
> I need to think more, maybe I'll send another email with more comments. You
> can have these meanwhile.
Sorry I wasn't being more clear on this - I am not advocating the
buffer/view approach mentioned in the OP for HTTP, I just happened to also
be thinking about protocol/messages/parsers. I thought your initial
response indicated that his idea wouldn't work (it should), not that it was
inefficient. The buffer/view approach provided above should require a
double-parse of the HTTP header since HTTP does not define a length field
at a fixed offset from the start of the message. Protocols where the length
field is at a fixed offset, which indicates the entire size of the message,
are better suited for that style of message handling since the first pass
is pretty low CPU.
I _was_ thinking about parsers that provided different artifacts. For
instance, a parser could be a Function Object with
"expected<optional<tuple<Header, Body, Trailer>>,
error_code>(Range<Bytes>&)", where the return optional would be empty until
the entire message was received. HTTP would then be split into smaller
parsers for each part, which could be swapped out depending on situation. I
was also thinking about potential re-uses between parsers of this style. I
am not suggesting this still half-baked idea during this review.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk