Boost logo

Boost :

Subject: Re: [boost] [http] Formal Review
From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2015-08-13 23:16:57


2015-08-13 22:19 GMT-03:00 Lee Clagett <forum_at_[hidden]>:

> > This idea is horrible. An HTTP message is not a std::string. HTTP does
> have
> > a string representation, the HTTP wire format from HTTP 1.1, which is
> > different than HTTP 2.0, which is different than FastCGI, which would be
> > different than a ZeroMQ-based approach and so on and so on.
> >
> >
> Why would this idea be limited to strings? The generator/parser could
> easily take another type. And std::strings can store binary data network
> data, if thats what your suggesting (it wasn't exactly clear to me).
>

A string, a vector of char or any buffer is not an http message.
Conceptually you're writing a stream of the HTTP/1.1 wire format, when one
might not even exist (suppose you're using HTTP/1.0, HTTP/2.0, FastCGI or
ZeroMQ). Any HTTP parser will suit your desire. This library is a layer
above that (and incrementally reaching for more).

I'm not proposing an HTTP parser, I'm proposing an HTTP server library.
> > This library will grow and also include a client. The HTTP parser is the
> > least useful thing this library could provide. I'm aiming a library that
> is
> > so useful that you'd use it no matter if you're going to write an
> > application to be used on Raspberry Pi (or even more constrained) or on a
> > cluster of nodes that distribute workload among thousands and thousands
> of
> > nodes and use different intermediate protocols.
> >
> >
> Definining a HTTP parser (specifically a SAX-style parser) is likely the
> only way to achieve zero allocations, which is exactly what the parser in
> use by this library is doing. Providing C++ templated SAX parser _would_ be
> novel - it would allow for inlined/direct callbacks as opposed to the
> indirect callbacks provided by the C parser in use. Probably a small
> reduction in CPU cycles though, since parsing time should dominate.
> Providing such a library would likely change the current design
> dramatically.
>

No. That's a way to avoid memory copies. That's not necessary to avoid zero
allocations.

You can have a custom backend tied to a single type of message that will do
the HTTP parsing for you. It could be inefficient by maybe forcing the
parsing every time you call headers(), but without an implementation I
don't have much to say.

> You even assume HTTP version is present. It may not even make sense and it
> > shouldn't be exposed to the user anyway. The user is interested in
> features
> > (can I use chunking? can I use X?). It's more portable. You'd suggest a
> > different approach if you had invested time to propose a library that
> could
> > be used with different backends.
> >
> > About your http::view proposal: It's not much different from
> http::Message
> > concept, but:
> >
> > - Your http::view read headers and read_state. This means you're
> > mixing/coupling communication channels and HTTP messages. Bad for
> > different
> > backends. It's bad because you then need to answer questions such as
> > "how
> > do I pass a pool, custom alocator and this and that to my backend?".
> > - Also, how does your proposal handle progressive download and
> chunking?
> > v.body() doesn't seem very/any asynchronous at all. The flush method
> in
> > response is useful for chunking, but the lack of information around
> the
> > capability of chunking by the communication channel leaves no option
> > but to
> > buffer implicitly, which can exhaust memory in video live stream. And
> > this
> > magic node-like API can very easily hamper interoperability among
> > different
> > implementations (I've already went into this road[1].
> >
> >
> > About the no-buffering approach, I just cannot give this guarantee. Any
> > object fulfilling the ServerSocket concept will have its own guarantees
> and
> > it is up to the user to check them. At the handling level, you can use
> > custom types fulfilling the message concept that will avoid memory
> > allocation. This possibility is not accidental. Just like you're worried,
> > I'm worried too.
> >
>
> How could a user provide a message concept that satifies the
> SequenceContainer for the body avoid memory allocations? The documentation
> for reading the body doesn't state how the SequenceContainer is being used
> or modified. Can I set a size for the container, that the http::Socket
> concept never exceeds while reading (effectively making it a fixed buffer
> like ASIO)? I've read this several times (and maybe I'm being thick), but
> if I were writing an implementation of this concept, it would seem that its
> implementation defined how the SequenceContainer is being manipulated.
> Currently the body is being automatically resized on reads, with new data
> being copied in. I _think_ the only way to control how much is being copied
> in at a time (and thus the max the SequenceContainer is being resized), is
> through the buffer argument to basic_socket, but this is undocumented.
>
> And to satifies the message requirements for the header/trailers requires
> an AssociateMap<string, string>. Currently the key/value types are required
> to meet the string concept (which I do not recall being defined by C++ or
> boost), but assuming it meant an stl compatible basic_string, the best way
> to avoid memory allocation is a small string optimization and combine with
> flat_map or an intrusive map. This would reduce, but not eliminate
> allocations. I'm slightly less concerned about this, I think its a good
> tradeoff, but I'm not sure how acceptable these loose constraints are for
> embedded situations that are continually mentioned. Any embedded developers
> following this?
>

Like you guessed, you pass a buffer to basic_socket. It won't read more
bytes than the buffer size.

Fair enough. I didn't document this behaviour. I'll gather all these points
after the review period to implement these improvements.

About the embedded device situation, it'll be improved when I expose the
parser options, then you'll be able to set max headers size, max header
name size, mas header value size and so on. With all upper limits figured
out, you can provide a single chunk of memory for all data.

> Also please do not use coroutines at least in first example. coroutines is
> > > a new feature in ASIO, many users use old Boost versions that have no
> > > coroutines and some users are not familiar with them. So if you start
> > with
> > > coroutines user will have to go to ASIO docs and dig into coroutines to
> > > understand the very first example. Example without coroutines will be
> > more
> > > user friendly.
> > >
> >
> > This is the kind of argument... by the time Boost.Http reaches Boost,
> Asio
> > users will already be used to coroutines. I don't even depend on the
> latest
> > Boost release if I'm not mistaken.
> >
> >
> What about people that want to test the library before any (potential)
> boost inclusion? Or what about people that want a lower overhead (but more
> limited) ASIO::coroutine approach - is that a possible combination?
>

It's the tutorial and it's the most readable thing you can ever get. But
the tutorial is surely far from a reference on teachability. Unfortunately
I'm not so skilled to improve it. This review is helping to raise points
and to see where there are more doubts. And even if I'm terrible teacher to
write tutorials, I still can write lots of examples (and write a completion
handler based one).

> 7. How much effort did you put into your evaluation? A glance? A quick
> > > reading? In-depth study?
> > >
> > > A few hours digging into docs, about 3 hours looking into the sources,
> > > about 3 hours formulising the issues and recommending solutions.
> > >
> >
> > Please read the design choices chapter. You seem to be completely
> ignoring
> > all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't
> > used.
> >
> > I read the design choices section, and I don't understand the relevance.
> Could you elaborate?

A simple use case: You're not directly exposing your application to the
network. You're using proxies with auto load balancing. You're not using
HTTP wire protocol to glue communication among the internal nodes. You're
using ZeroMQ. There is no HTTP wire format usage in your application at all
and Boost.Http still would be used, with the current API, as is. A
different HTTP backend would be provided and you're done.

It makes no sense to enforce the use of HTTP wire format in this use case
at all. And if you're an advocate of HTTP wire format, keep in mind that
the format changed in HTTP 2.0. There is no definitive set-in-stone
serialized representation.

-- 
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk