Boost logo

Boost :

Subject: Re: [boost] [http] Formal Review
From: Lee Clagett (forum_at_[hidden])
Date: 2015-08-15 22:14:52


On Sat, Aug 15, 2015 at 3:06 PM, Vinícius dos Santos Oliveira <
vini.ipsmaker_at_[hidden]> wrote:

> 2015-08-15 14:32 GMT-03:00 Lee Clagett <forum_at_[hidden]>:
>
> > Adding a size_t maximum read argument should be possible at a minimum. I
> do
> > not see how this could hamper any possible backends, its only role is to
> > explicitly limit how many bytes are inserted to the back of the container
> > in a single call. With this feature, a client could at least reserve
> bytes
> > in the container, and prevent further allocations through a max_read
> > argument.
> >
>
> This is not a TCP socket, it's an HTTP socket. An HTTP message is not a
> stream of bytes.
>
> HTTP has two streaming modes: chunked, and "connection:close" with no
chunked encoding or defined length. Its not always viewed as that, but
certain applications [1] take advantage of these "features" available in
HTTP.
The problem for many libraries (as you've partially mentioned before), is
that if data is being sent in that fashion, they will consume memory
without bounds if designed to read until the end of the payload.

There is an already max read size. It's the buffer size you pass to
> basic_socket.
>
>
But this is not defined by the http::Socket concept. Its not possible to
write a function that takes any http::Socket concept and limits the number
of bytes being pushed into the container. A conforming http::Socket
implementation is currently allowed to keep resizing the container as
necessary to add data (even until payload end), and I thought the
prevention of that scenario was being touted as a benefit of Boost.Http.
Adding a size_t parameter or a fixed buffer to `async_read_some` is a
strong signal of intent to implementors, and a weaker one would be a
statement in the documentation that a conforming implementation of the
concept can only read/insert an unspecified fixed number of bytes before
invoking the callback.

> > Filling HTTP headers is responsibility of the socket. The socket is the
> > > communication channel, after all. A blacklist of headers wouldn't work
> > > always, as the client can easily use different headers. A whitelist of
> > > allowed headers can work better. A solution that is more generic is a
> > > predicate. It can go into the parser options later.
> > >
> > >
> > A predicate design would either have to buffer the entire field which
> would
> > make it an allocating design, or it would have to provide partial values
> > which would make it similar to a SAX parser but with the confusion of
> being
> > called a predicate. The only point is that a system that needs ultimate
> > control over memory management would likely need a parser (push or pull)
> > that notifies the client of pre-defined boundaries.
> >
>
> You should also be able to choose a maximum header name size, so it's
> possible to use a stack-allocated buffer.
>
>
The memory requirements are affected by the max header size. With a
push/pull parser it is possible to rip out information in a fixed amount of
memory. The HTTP parser this library is using is a good example - it never
allocates memory, does not require the client allocate any memory, and the
#define for the max header size does _not_ change the size requirements of
its data structures. It keeps necessary state and information in a fixed
amount of space, yet is still able to know whether transfer-encoding:
chunked was sent, etc.

The initial source of my parser thoughts were how to combine ideas from
boost::spirit into a HTTP system. A client could do a POST/PUT/DELETE, and
then issue `msg_socket.async_read(http::parse_response_code(),
void(uint16_t, error_code))` which would construct a HTTP parser that
extracts the response code from the server, tracks a minimal set of headers
(content-length, transfer-encoding, connection), yet still operates in a
fixed memory budget even if max header / max payload were size_t::max. I
still don't see how this is possible without a notification parser exposed
somewhere in the design. Again, I'm not downvoting Boost.Http because it
lacks this capability. I'm not sure of the demand for such a complicated
library just to manipulate Http sockets.

[1] http://ajaxpatterns.org/HTTP_Streaming

Lee


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk