Boost logo

Boost :

Subject: Re: [boost] [beast] Supporting ICY 200 OK
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2017-10-04 17:52:58


On Wed, Oct 4, 2017 at 9:37 AM, Vinícius dos Santos Oliveira
<vini.ipsmaker_at_[hidden]> wrote:
> There's a big misunderstanding between us here.
> ...
> I thought you were trying to generalize `beast::http::read`.

Hmm...I'll restate it. The beast::http::read algorithm is a generic
algorithm which operates on any object meeting the SyncReadStream
requirements. A typical setup for un-encrypted connections might look
like this:

    beast::http::read <-- boost::asio::ip::tcp::socket

The read algorithm receives data directly from the socket and provides
it to the parser. My solution to recognizing "ICY 200 OK\r\n" is
simply to add a stream adapter into the middle of this pipeline:

    beast::http::read <-- icy_stream <-- boost::asio::ip::tcp::socket

The stream adapter replaces "ICY" with "HTTP/1.1" if it appears as the
first the characters of the input. This adapter would work with any
algorithm which operates on the SyncReadStream or AsyncReadStream
concept.

>> I'm not sure what an "HTTP backend" means, but for practical purposes
>> there are three versions of HTTP:
>>
>> HTTP/1.0
>> HTTP/1.1
>> HTTP/2
>
> Add ZeroMQ to the list.

Are you saying that HTTP has four versions 1.0, 1.1, 2, and ZeroMQ? A
casual search
of rfc2616[1], rfc7230[2], and rfc7540[3] does not turn up any matches
for "ZeroMQ."
Could you please provide a link to the document which describes the
ZeroMQ version
of HTTP?

> The point is making applications that answer HTTP requests (if you're writing a server).

Okay, so I think by this statement you are defining "HTTP backend" thusly:

    An algorithm which calculates the HTTP response for a given HTTP
    request and optional, additional state information associated with the
    connection or application.

The style I am promoting for these types of algorithms, is evidenced
in the example HTTP
servers which come with Beast [4]. If you look at those servers, you
will notice that although
each server offers different features (plain, SSL, synchronous,
asynchronous, support for
websocket) they all contain an identical copy of a function with this signature:

    // This function produces an HTTP response for the given
    // request. The type of the response object depends on the
    // contents of the request, so the interface requires the
    // caller to pass a generic lambda for receiving the response.
    template<
        class Body, class Allocator,
        class Send>
    void
    handle_request(
        boost::beast::string_view doc_root,
        http::request<Body, http::basic_fields<Allocator>>&& req,
        Send&& send);

More formally, the style of requesting handling is expressed as a pure function:

    {m', s'} = f(m, s)

where

    f is the function
    m is the HTTP request
    s is the initial state
    m' is the HTTP response
    s' is the final state

Authors can write pure functions such as `handle_request` above, and since they
are using beast::message as a common container for HTTP messages, those pure
functions can then be composed. This allows higher level libraries to
be composed
from lower level ones to arbitrary degree.

That all the different Beast example servers work using the same logic
for processing
requests is evidence that the design achieves its goal.

> For some, this meant writing plugins for existing HTTP servers.

Of course, the `handle_request` signature I provided above can only be
called directly
for code that executes within the same process. In order to delegate
request processing
to another process, it is necessary to serialize the HTTP message,
deliver it to the other
process, and then deserialize it back into the message container. That
is why Beast stream
algorithms operate on concepts. So you can implement "ostream_socket" which
meets the requirements of SyncWriteStream, and then use that stream with
beast::http::write to deliver an HTTP message to another process connected via
an output file descriptor. Still, the design remains the same, but a
layer to deliver
the message to another process is required.

> From my point of view, there is no need restrict to this set of HTTP versions.
> You should abstract the meaning/behaviour beyond that, Just like I wrote above:

I must disagree. The semantics of an HTTP message can change depending on
the version. I also disagree that a representation of HTTP-version needs to be
broad enough to include ZeroMQ. What does that even mean? ZeroMQ is not
an HTTP version.

I will restate: There are currently three meaningful HTTP versions:

    HTTP/1.0
    HTTP/1.1
    HTTP/2

These are documented in RFCs and well defined.

> A lack of abstract thinking.

No, my design does not represent a lack of abstract thinking. Quite
the opposite.
I have used abstractions *where appropriate*. Examples:

Inappropriate abstraction:

    template<class HTTPMessage>
    void handle_message(HTTPMessage const&);

Good abstraction:

    template<bool isRequest, class Body, class Fields>
    void handle_message(beast::http::message<isRequest, Body, Fields> const&);

I think your preoccupation with backends has good intentions. And it should be
clear to you now given my explanation and example code that Beast was
specifically
designed to allow for unlimited flexibility in how users choose to consume HTTP
messages. If you still disagree I would kindly ask that you provide a
counter-example
in the form of pseudo-code that demonstrates your case.

> The “In order to [...] in a message, the container must reflect the HTTP
> version” sentence is very interesting. It means I need to expose exactly the
> same message received from the wire to the user... like you need different
> API to support HTTP/2.0. Is my understanding of your sentence correct?

"support HTTP/2.0" is a vague question, so I will decompose it into
two questions:

* Does Beast require a different container to represent HTTP/2 messages?

* Does Beast require different interfaces to serialize and deserialize
HTTP/2 messages on streams?

Note that I've answered these questions already both on the list and
in the Beast documentation but for the sake being fully informed the
answers to those questions are "No" and "Yes."

> Because the container doesn't need to reflect the HTTP version.

Again I have to disagree. The interpretation of HTTP field values can
be different depending on the HTTP-version (which can be 1.0, 1.1, or
2).

> In my view, there are capabilities, like `native_stream`.

Okay. I don't know what that is. Can you provide a link to where the
"native_stream"
HTTP feature is explained?

> Even the HTTP/2.0 multiplexing behaviour can be supported under the same
> simple API with no additional complexities in the message container.

Are you changing the subject from message containers to stream
algorithms? When you
refer to "simple API" are you talking about a stream algorithm? A
serialization/deserialization
algorithm?

I agree that it is possible to design an interface to stream
operations which can
be agnostic to whether the underlying connection uses HTTP/1 versus HTTP/2.
However, such a library would by definition not be a low-level library. It could
be decomposed into four parts:

    1. A universal HTTP message container
    2. Serialization and stream operations for HTTP/1
    3. Serialization and stream operations for HTTP/2
    4. A unified interface using parts 1, 2, and 3

Beast provides 1 and 2 above. And I have plans to provide 3. I have no plans to
provide 4, although with 1, 2, and 3 it could certainly be implemented
and it would
be much easier than having to write everything.

My intuition is that users who say they want a unified interface, and
get a unified
interface, will later realize that they didn't need it at all. But the
pitch of a unified
stream operations interface that works with HTTP/1 and HTTP/2 is of course
quite appealing because it creates the illusion of getting HTTP/2
support "for free."
But as I said I think there are problems with it which will only become apparent
after someone tries to offer such an interface, and users actually try
to use it.

Regardless, a unified interface can be decomposed. And any library which CAN
be decomposed, SHOULD be decomposed strictly on the principle of separation
of concerns.

> “that is incorrect” + “message container is designed to be HTTP/2-ready”
> seems to me that you didn't understand my complaint.

Perhaps I did not understand your complaint. If you could repeat it in
clear terms that could help.

> Talking about parsers, what is your opinion on the design of parsers like
> showed in this talk: https://vimeo.com/channels/ndcoslo2016/171704565

It looks like an implementation detail. My experience with Beast is that the
vast majority of users don't care about how the parser is implemented other
than that it works and that it is reasonably fast or at least does not produce
a visible slowdown in the application. Users want these operations:

    parse_header() // parse just the header
    parse() // parse whatever is left

Thus far no one has expressed a desire to interact with HTTP message
tokens as they appear. However, should someone wish to do so the Beast
parser abstracts the post-processing of tokens by delegating them to a
derived class using CRTP. So for example, when the day comes that someone
wants to discard fields they don't recognize or care about, they can do so.
And they will find ample documentation on this subject along with examples:

<http://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/using_http/custom_parsers.html>

> Virtual field is not to be confused with HTTP field.

Ahh... okay. Yes, I thought you meant HTTP field. Thanks for clarifying.

> However, like I stated in the beginning of the email, I thought you were
> talking about the Beast message model. However, you're talking about the
> parser. In this case (reusing the parser), I'll limit myself to Bjørn words:
> “Sounds like a good solution.”

Oh.. yeah. The conversion from ICY to HTTP/1.1 is just a rewriting of
the buffer as an object inserted into the pipeline between the socket
and the read algorithm.

> But I'm still curious about what you think about this talk:
> https://vimeo.com/channels/ndcoslo2016/171704565

I don't really think much of it. Seems like a case of over-engineering to
me. HTTP parsing is relatively straightforward and in the case of Beast
it is a solved problem. Beast's parser works, has extensive tests and
code coverage, has been through a gauntlet of fuzzed inputs, and is
currently undergoing a security audit by a third party company.

Note that HTTP grammar does not require backtracking so I am not
seeing an immediate benefit from implementing a parser combinator.
This might not be true for some of the fields though, so perhaps a
parser combinator might be useful there. However, parsing the
values of fields (other than Connection, Proxy-Connection,
Upgrade, Transfer-Encoding, and Content-Length) is strictly
out-of-scope for Beast and would be the subject of a different library.

URIs require backtracking for authority elements missing the slash
but Beast doesn't parse the request-target, it just presents it as a
string_view to the caller. So the same rationale about being out of
scope applies.

I always welcome these discussions since they offer the possibility
of improvements. Or in this case they further cement the justifications
for the design decisions made in Beast.

Thanks

[1] <https://www.ietf.org/rfc/rfc2616.txt>

[2] <https://tools.ietf.org/html/rfc7230>

[3] <https://tools.ietf.org/html/rfc7540>

[4] <https://github.com/boostorg/beast/blob/7fe74b1bf544a64ecf8985fde44abf88f9902251/example/http/server/async/http_server_async.cpp#L97>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk