|
Boost : |
Subject: [boost] [beast] Formal review
From: Bjorn Reese (breese_at_[hidden])
Date: 2017-07-09 20:57:57
I am going to focus exclusively on the HTTP API in the review below
because that needs most attention.
Furthermore, to keep this review within reasonable bounds, I am omitting
issues already reported by others.
I. DESIGN
---------
Beast offers a well-designed WebSocket API along with a minimal and
deficient HTTP API.
Beast claims to offer low-level building-blocks that can be used to
produce higher-level HTTP server and client libraries.
The low-level approach makes Beast minimal. I do not have a problem with
a minimal API, other than it diminishes the usefulness of the library.
Although this can be solved by adding more functionality over time, if
I was to evaluate Beast with its current feature-level, I would only
choose it if I needed WebSocket. This, however, is not a precondition
for library acceptance.
But the low-level approach also means the abstractions must be right.
The abstractions that Beast currently offers are deficient, which is
best shown by the poor support for HTTP chunking. Chunking can and
should be represented more naturally by a HTTP API.
HTTP chunking splits a message into several parts, where each part can
contain metadata (chunk extensions) and data (chunk.) The metadata is a
set of key-value pairs, just like the HTTP headers.
During the review period I have described several use cases for HTTP
chunking that does require metadata (e.g. radio metadata,) as well as
the preservation of chunk boundaries (e.g. event-based publish-subscribe
channels.)
Conceptually, a normal HTTP message has the following structure:
message = start-line metadata* body?
A chunked HTTP message can be described in a similar manner as:
message = start-line metadata* ( metadata* chunk? )*
The Beast message container is designed for the structure of the normal
HTTP message, and assumes that chunks are appended the body. The chunk
metadata does not fit into this model, and therefore Beast discards them
by default. The repetitive structure of chunked messages is not
accommodated by the Beast message container. This means that we have to
make our own chunk decorator to generate chunk metadata, and our own
custom parser to receive chunk metadata.
The chunk decorator inserts metadata via a callback mechanism. Its
design is poor for two reasons. First, the user must keep track of
context to ensure that the correct metadata is applied to the correct
chunks. Second, the user must write the metadata in correct HTTP syntax.
The latter may sound trivial, but does anybody here know if
" ;key=value\r\n"
is correctly formatted? (notice the initial whitespace.) The answer is
that it is correct according to RFC 2616, incorrect according to RFC
7230, and correct according to RFC 7230 errata. The best approach for
handling this situation is to never insert whitespaces, but to always
accept whitespaces from others. Users should not have to be bothered
with such detail.
Receiving chunk metadata rather than having them discarded by default
requires a custom parser. The Beast message parser is a push parser
(SAX) where callbacks are invoked for each parsed message part. This
means that the user must keep track of which metadata belongs to which
chunk. The user must also parse the metadata manually. Beast does not
even check if the syntax is correct.
The custom parser is also necessary if the user needs to receive
individual chunks rather than having them appended to the body. I would
have expected that a low-level API returns the next message part as soon
as it is identified. The current collect-entire-message design is a
higher-level design.
In summary,
1. Beast should not discard anything by default, but should faithfully
forward all information received.
2. Beast should handle all low-level syntax and not leave parts of it
to the user.
3. Beast should not break the end-to-end principle unless allowed by
the user.
As a way forward I am going to sketch another simple message model that
does encompass HTTP chunking. Conceptually it looks like this (I am
going to ignore that, say, metadata may require different syntax
depending on context)
message = start-line section*
section = metadata* body-part?
Building an API around this model should offer read/write functions that
handles partial bodies (whether chunked or progressively downloaded.)
The current collect-entire-message API could be build on top of that.
With this approach there is no need to mess around with custom parsers.
This design does require state information to be able to select the
proper syntax for both metadata and data.
On another topic, the HTTP status codes can be roughly categorized as
instructions (the 1xx and 2xx series) and errors (the 4xx and 5xx
series.) I recommend that you create two error categories and pass
status codes as error_code. Notice that I have no comment about whether
or not they should be stored in the message container. What I really
want to achieve is to enable the user to easily propagate HTTP errors
in the same way that non-HTTP errors (including beast::http::error) are
propagated.
II. IMPLEMENTATION
------------------
I read the implementation to understand the design, so I did not look
for QA issues. My overall impression is that much of the code is
well-written, but there are parts that are convoluted and difficult
to verify.
There are too many goto statements for my taste; I do not mind the goto
pattern used in the parser, but the state-machines in read_some_op and
serializer looks more like micro-optimization. Furthermore, "case
do_body + 1:" looks like an awful hack.
The dynamic buffers are useful even by themselves.
The code contains several foreign files with different licensing
conditions:
* include/beast/core/detail/base64.hpp
* include/beast/zlib/zlib.hpp
* include/beast/websocket/detail/utf8_checker.hpp
* include/beast/http/detail/basic_parser.hpp
All these include both the BSL and the original license. This should be
checked by the Boost legal council.
III. DOCUMENTATION
------------------
The documentation is generally well-written (and it looks like the
DynamicBuffer section got a little help from Networking TS.)
The Reference is missing some concepts, e.g. SyncReadStream.
IV. NAMING
----------
The library name must change. "Boost.Beast" sounds like a bad joke. I do
not have strong opinions about the new name.
V. VERDICT
----------
I recommend to REJECT the library because the HTTP API provides the
wrong low-level abstractions as explained in the design section above.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk