|
Boost : |
Subject: Re: [boost] [http] Formal Review
From: Lee Clagett (forum_at_[hidden])
Date: 2015-08-14 01:49:25
On Thu, Aug 13, 2015 at 11:16 PM, VinÃcius dos Santos Oliveira <
vini.ipsmaker_at_[hidden]> wrote:
> 2015-08-13 22:19 GMT-03:00 Lee Clagett <forum_at_[hidden]>:
>
> [snip]
> > > This library will grow and also include a client. The HTTP parser is
> the
> > > least useful thing this library could provide. I'm aiming a library
> that
> > is
> > > so useful that you'd use it no matter if you're going to write an
> > > application to be used on Raspberry Pi (or even more constrained) or
> on a
> > > cluster of nodes that distribute workload among thousands and thousands
> > of
> > > nodes and use different intermediate protocols.
> > >
> > >
> > Definining a HTTP parser (specifically a SAX-style parser) is likely the
> > only way to achieve zero allocations, which is exactly what the parser in
> > use by this library is doing. Providing C++ templated SAX parser _would_
> be
> > novel - it would allow for inlined/direct callbacks as opposed to the
> > indirect callbacks provided by the C parser in use. Probably a small
> > reduction in CPU cycles though, since parsing time should dominate.
> > Providing such a library would likely change the current design
> > dramatically.
> >
>
> No. That's a way to avoid memory copies. That's not necessary to avoid zero
> allocations.
>
> You can have a custom backend tied to a single type of message that will do
> the HTTP parsing for you. It could be inefficient by maybe forcing the
> parsing every time you call headers(), but without an implementation I
> don't have much to say.
>
>
But this would only be able to handle a one HTTP message type? Or would
drop some useful information? I think it would be difficult to implement a
non-allocating HTTP parser unless it was SAX, or stopped at defined points
(essentially notifying you like a SAX parser).
> > You even assume HTTP version is present. It may not even make sense and
> it
> > > shouldn't be exposed to the user anyway. The user is interested in
> > features
> > > (can I use chunking? can I use X?). It's more portable. You'd suggest a
> > > different approach if you had invested time to propose a library that
> > could
> > > be used with different backends.
> > >
> > > About your http::view proposal: It's not much different from
> > http::Message
> > > concept, but:
> > >
> > > - Your http::view read headers and read_state. This means you're
> > > mixing/coupling communication channels and HTTP messages. Bad for
> > > different
> > > backends. It's bad because you then need to answer questions such as
> > > "how
> > > do I pass a pool, custom alocator and this and that to my backend?".
> > > - Also, how does your proposal handle progressive download and
> > chunking?
> > > v.body() doesn't seem very/any asynchronous at all. The flush method
> > in
> > > response is useful for chunking, but the lack of information around
> > the
> > > capability of chunking by the communication channel leaves no option
> > > but to
> > > buffer implicitly, which can exhaust memory in video live stream.
> And
> > > this
> > > magic node-like API can very easily hamper interoperability among
> > > different
> > > implementations (I've already went into this road[1].
> > >
> > >
> > > About the no-buffering approach, I just cannot give this guarantee. Any
> > > object fulfilling the ServerSocket concept will have its own guarantees
> > and
> > > it is up to the user to check them. At the handling level, you can use
> > > custom types fulfilling the message concept that will avoid memory
> > > allocation. This possibility is not accidental. Just like you're
> worried,
> > > I'm worried too.
> > >
> >
> > How could a user provide a message concept that satifies the
> > SequenceContainer for the body avoid memory allocations? The
> documentation
> > for reading the body doesn't state how the SequenceContainer is being
> used
> > or modified. Can I set a size for the container, that the http::Socket
> > concept never exceeds while reading (effectively making it a fixed buffer
> > like ASIO)? I've read this several times (and maybe I'm being thick), but
> > if I were writing an implementation of this concept, it would seem that
> its
> > implementation defined how the SequenceContainer is being manipulated.
> > Currently the body is being automatically resized on reads, with new data
> > being copied in. I _think_ the only way to control how much is being
> copied
> > in at a time (and thus the max the SequenceContainer is being resized),
> is
> > through the buffer argument to basic_socket, but this is undocumented.
> >
> > And to satifies the message requirements for the header/trailers requires
> > an AssociateMap<string, string>. Currently the key/value types are
> required
> > to meet the string concept (which I do not recall being defined by C++ or
> > boost), but assuming it meant an stl compatible basic_string, the best
> way
> > to avoid memory allocation is a small string optimization and combine
> with
> > flat_map or an intrusive map. This would reduce, but not eliminate
> > allocations. I'm slightly less concerned about this, I think its a good
> > tradeoff, but I'm not sure how acceptable these loose constraints are for
> > embedded situations that are continually mentioned. Any embedded
> developers
> > following this?
> >
>
> Like you guessed, you pass a buffer to basic_socket. It won't read more
> bytes than the buffer size.
>
But how can this be combined with higher order functions? For example a
`async_read_response(Socket, MaxDataSize, void(uint16_t, std::string,
vector<uint8_t>, error_code))`? However such a utility is defined, it will
have to be tied to a specific implementation currently, because theres no
way to control the max-read size via socket concept. Or would such a
function omit a max read size (several other libraries don't have one
either)? Or would it just overread a _bit_ into the container?
>
> Fair enough. I didn't document this behaviour. I'll gather all these points
> after the review period to implement these improvements.
>
> About the embedded device situation, it'll be improved when I expose the
> parser options, then you'll be able to set max headers size, max header
> name size, mas header value size and so on. With all upper limits figured
> out, you can provide a single chunk of memory for all data.
>
>
>
But what if an implementation wanted to discard some fields to really keep
the memory low? I think that was the point of the OP. I think this is
difficult to achieve with a notifying parser. It might be overkill for
Boost.Http, people under this durress can seek out existing Http parsers.
[snip]
> 7. How much effort did you put into your evaluation? A glance? A quick
> > > > reading? In-depth study?
> > > >
> > > > A few hours digging into docs, about 3 hours looking into the
> sources,
> > > > about 3 hours formulising the issues and recommending solutions.
> > > >
> > >
> > > Please read the design choices chapter. You seem to be completely
> > ignoring
> > > all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't
> > > used.
> > >
> > > I read the design choices section, and I don't understand the
> relevance.
> > Could you elaborate?
>
>
> A simple use case: You're not directly exposing your application to the
> network. You're using proxies with auto load balancing. You're not using
> HTTP wire protocol to glue communication among the internal nodes. You're
> using ZeroMQ. There is no HTTP wire format usage in your application at all
> and Boost.Http still would be used, with the current API, as is. A
> different HTTP backend would be provided and you're done.
>
>
It makes no sense to enforce the use of HTTP wire format in this use case
> at all. And if you're an advocate of HTTP wire format, keep in mind that
> the format changed in HTTP 2.0. There is no definitive set-in-stone
> serialized representation.
>
>
Yes, if HTTP were converted into a different (more efficient) wire format
(as I've seen done in various ways - sandstorm/capnproto now does this
too), a new implementation of http::ServerSocket could re-read that format
and be compatible. It would be useful to state this more clearly in the
documentation, unless I missed it (sorry).
Lee
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk