Boost logo

Boost :

Subject: Re: [boost] [http] Formal Review
From: Lee Clagett (forum_at_[hidden])
Date: 2015-08-15 13:32:28


On Fri, Aug 14, 2015 at 4:31 PM, Vinícius dos Santos Oliveira <
vini.ipsmaker_at_[hidden]> wrote:

> 2015-08-14 2:49 GMT-03:00 Lee Clagett <forum_at_[hidden]>:
>
> > > No. That's a way to avoid memory copies. That's not necessary to avoid
> > zero
> > > allocations.
> > >
> > > You can have a custom backend tied to a single type of message that
> will
> > do
> > > the HTTP parsing for you. It could be inefficient by maybe forcing the
> > > parsing every time you call headers(), but without an implementation I
> > > don't have much to say.
> > >
> > >
> > But this would only be able to handle a one HTTP message type? Or would
> > drop some useful information? I think it would be difficult to implement
> a
> > non-allocating HTTP parser unless it was SAX, or stopped at defined
> points
> > (essentially notifying you like a SAX parser).
> >
>
> As type of message, I was referring to the Message concept:
> https://boostgsoc14.github.io/boost.http/reference/message_concept.html
>
> And yes, this implementation would work with only one type.
>
> The idea is: the message object is just a buffer with an embedded parser
> and the socket will just transfer responsibility to the message. The user
> API stays the same. A buffer the same size would still be interesting in
> the socket to efficiently support HTTP pipelining (we cannot have data from
> different messages in the same message object, as it might be dropped at
> any time by the user).
>
> I'm not slightly worried about the problem you mention with the parser. I
> know it's possible. It won't show itself as a problem in the future.
>
> > Like you guessed, you pass a buffer to basic_socket. It won't read more
> > > bytes than the buffer size.
> > >
> >
> > But how can this be combined with higher order functions? For example a
> > `async_read_response(Socket, MaxDataSize, void(uint16_t, std::string,
> > vector<uint8_t>, error_code))`? However such a utility is defined, it
> will
> > have to be tied to a specific implementation currently, because theres no
> > way to control the max-read size via socket concept. Or would such a
> > function omit a max read size (several other libraries don't have one
> > either)? Or would it just overread a _bit_ into the container?
> >
>
> The problem isn't "how can this be combined with higher order functions?".
> The problem is "how can this feature be exposed portably among different
> HTTP backends?" and the answer is "it can't because it might not even make
> sense in all HTTP backends". Of course this comment is about the hacky
> solution (use a buffer of limited size in the HTTP backend).
>
> Both questions are identical. When a function makes a call to
`async_read_some` with a generic http::Socket concept, it has no way of
knowing or controlling how many bytes will be read into the container
provided by the message concept. It is currently implementation defined.

On the non-hacky front, some traits exposing extra API could be defined.
> The basic_socket could implement these traits without hampering the
> implementation of other backends that have different characteristics.
>

Ideally the argument to `async_read_some` would just be an ASIO buffer,
which implicitly has a maximum read size. This only appears possible if
the current C parser is abused a bit (moving states manually). However, I
think its worth providing the best interface, and then do whatever
necessary to make those details work. And I think accepting just an ASIO
buffer would be the best possible interface for `async_read_some`.

Adding a size_t maximum read argument should be possible at a minimum. I do
not see how this could hamper any possible backends, its only role is to
explicitly limit how many bytes are inserted to the back of the container
in a single call. With this feature, a client could at least reserve bytes
in the container, and prevent further allocations through a max_read
argument.

>
> > About the embedded device situation, it'll be improved when I expose the
> > > parser options, then you'll be able to set max headers size, max header
> > > name size, mas header value size and so on. With all upper limits
> figured
> > > out, you can provide a single chunk of memory for all data.
> > >
> > >
> > >
> > But what if an implementation wanted to discard some fields to really
> keep
> > the memory low? I think that was the point of the OP. I think this is
> > difficult to achieve with a notifying parser. It might be overkill for
> > Boost.Http, people under this durress can seek out existing Http parsers.
> >
>
> Filling HTTP headers is responsibility of the socket. The socket is the
> communication channel, after all. A blacklist of headers wouldn't work
> always, as the client can easily use different headers. A whitelist of
> allowed headers can work better. A solution that is more generic is a
> predicate. It can go into the parser options later.
>
>
A predicate design would either have to buffer the entire field which would
make it an allocating design, or it would have to provide partial values
which would make it similar to a SAX parser but with the confusion of being
called a predicate. The only point is that a system that needs ultimate
control over memory management would likely need a parser (push or pull)
that notifies the client of pre-defined boundaries.

I think the design of Boost.Http doesn't provide an interface suitable for
zero allocations because either large memory is being pre-allocating, or
certain _hard_ restrictions need to be placed on the header. Instead
Boost.Http leans towards ease-of-use a bit. I think this is an acceptable
tradeoff, because environments with extremely strict memory requirements
can use other solutions. Boost.Http is unlikely to suit the needs of
everyone.

But better memory management in a few areas would be helpful (as already
mentioned above). The predicate design that you mentioned, which would
likely buffer the entire field, is interesting. Rejecting a field would
allow for memory re-use by the implementation for the next field. Would be
worth investigating how to provide that interface, and the performance
benefits. Hopefully it would be a low effort way to help people who only
want to store _some_ HTTP fields. There are an unbelievable amount of HTTP
fields that generally get ignored anyway.

A trait could be defined to also expose the same API in different HTTP
> backends that might not need a parser.
>
> > A simple use case: You're not directly exposing your application to the
> > > network. You're using proxies with auto load balancing. You're not
> using
> > > HTTP wire protocol to glue communication among the internal nodes.
> You're
> > > using ZeroMQ. There is no HTTP wire format usage in your application at
> > all
> > > and Boost.Http still would be used, with the current API, as is. A
> > > different HTTP backend would be provided and you're done.
> > >
> > >
> > It makes no sense to enforce the use of HTTP wire format in this use case
> > > at all. And if you're an advocate of HTTP wire format, keep in mind
> that
> > > the format changed in HTTP 2.0. There is no definitive set-in-stone
> > > serialized representation.
> > >
> > >
> > Yes, if HTTP were converted into a different (more efficient) wire format
> > (as I've seen done in various ways - sandstorm/capnproto now does this
> > too), a new implementation of http::ServerSocket could re-read that
> format
> > and be compatible. It would be useful to state this more clearly in the
> > documentation, unless I missed it (sorry).
>
>
> You can already have any message representation you want: It's the message
> concept. And it was crafted **very** carefully:
> https://boostgsoc14.github.io/boost.http/reference/message_concept.html
>
>
I don't think we are talking about the same thing; the message concept
doesn't define the wire format. The implementation of the http::Socket
concept certainly does, which is what I thought the discussion was about
here. Either way my suggestion was that it might be worth noting
_somewhere_ in the documentation that different wire formats for HTTP can
be supported with a different http::Socket concept implementation. Although
until a different implementation is actually written (fastcgi seems like a
good candidate), its difficult to say whether the currently defined
abstractions are suitable for other (or even most/all) wire formats. So my
apologies for the bad suggestion.

Lee


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk