|
Boost : |
Subject: [boost] [review][beast] Late review
From: Christopher Kohlhoff (chris_at_[hidden])
Date: 2017-07-16 12:41:40
I realise this is quite late, but Vinnie asked for my opinion, so I asked for special dispensation from the review manager. Hopefully that is ok!
In a nutshell, I think Beast should be accepted.
Background: Although I may have some use for websockets in the future, my use case right now requires only HTTP (sorry Vinnie!). The use case involves incorporating HTTP protocol support into network applications whose primary purpose is something otherwise unrelated to HTTP. HTTP is used to support monitoring, configuration, and debugging facilities. Unlike typical HTTP use, the protocol is implemented not just over TCP sockets but also over local (UNIX domain) sockets. It is essential that the HTTP implementation minimises runtime overhead to avoid negatively impacting the quality of service for the other protocols.
- What is your evaluation of the design?
Overall the design is good.
The library seems to be structured into several layers:
1. low level stateful parsers and serializers
2. operations to send/receive partial messages (*_some operations)
3. operations to send/receive header
4. operations to send/receive complete messages, using parser/serializer supplied by me
5. operations to send/receive complete messages
This is a good thing, because it lets me select the appropriate layer for a particular use case. Whichever of layers 2 - 4 I choose, the best thing is that they all follow the same model for asynchronous operations as in Asio. All four of those layers work with arbitrary Asio streams, which was a requirement of my use case. This made using the library seamless, easily replacing my otherwise quick and dirty HTTP encoding and decoding approaches. The benefit of course is that someone else is taking care of ensuring the messages are well-formed, and doing a much better job of it.
Layer 5 is good for simplicity and clarity. Based on my benchmarking, layer 4 seems to be the sweet spot for performance sensitive use cases, as it gives sufficient control of the placement of the "transient" objects used by the asynchronous operations.
I use HTTP/1.0 almost exclusively, and one reason for this is that HTTP/1.0 allows the response to be sent without a Content-Length, with the end of content delimited by stream closure. One use for this is an infinite stream of data (e.g. monitoring output). Another use case is the ability to use the file descriptor as stdout for a child process. In both these cases, layer 3 fits the bill where I can send just the response header, and then take over the socket for my own needs.
As for layer 2, the documentation for these functions still refers in places to a return value (or additional size_t argument to the completion handler), which is no longer in the code: synchronous read_some has a void return and async_read_some has a void(error_code) handler. Although this is probably vestigial documentation, actually I would prefer it if these lower level operations *did* return something, like the number of bytes transferred in the operation. This would assist in detecting badly behaved or pathological clients (like ones doing everything one byte at a time), and this is probably the main reason I would consider using the layer 2 operations.
I haven't yet needed layer 1, but I can imagine how it might be used in transport agnostic code.
I guess my main point is that by supplying all these layers, Beast is both powerful and flexible, and meets pretty much all of my likely needs.
The body/reader/writer design feels a little awkward, and it seems like there is a more elegant design struggling to get out. However, my primary reason for looking at implementing a custom body was to support serving static files. As Vinnie now plans to include file support in the library itself, I have little reason to work directly with body/reader/writer directly any longer. Consequently, I am less concerned about the design here. However, I'd encourage Vinnie to keep exploring this.
On a related note, once file body was added I was pleased to see how straightforward it was to incorporate platform-specific file transmission (TransmitFile, sendfile, etc) in a way that was transparent to the library user.
There are some minor inconsistencies in template parameters, e.g.: parser has Allocator where serializer has Fields. I would have expected some symmetry here.
The library includes reimplementations of a number of facilities from the networking TS. However, now the TS is done I plan to update Boost.Asio to match. If Beast is accepted then hopefully by the time it is incorporated, it will no longer need to include these at all.
I'm not a huge fan of the library including typedefs for names from other boost libraries as part of the documented public interface, e.g. beast::error_code, beast::string_view. However, if it is to retain these, I'd at least like to see them given their own section in the documentation quick reference to make it clear that they are just aliases for external functionality. (Note: I'm just fine with the aliases themselves as a convenience for the library implementer.)
- What is your evaluation of the implementation?
The implementation is high quality.
I do wonder about the implementation approach taken in the basic_fields storage. It uses a set, which is great asymptotically, but does mean allocation of a lot of small objects. In my experience (and in a totally unscientific survey of commonly used websites) the number of headers is usually small -- a dozen or two. In addition, when constructing or decoding a message, one would normally construct the set of fields in one go. Deletion of a field seems like it would be rare. Thus I wonder whether basic_fields could use a storage strategy geared towards the common case, e.g. a single storage blob for the fields themselves (vector<char>) plus an index (vector<size_t> of offsets to each field's beginning) for quick iteration. I don't think this would impact the API, so it's not essential.
- What is your evaluation of the documentation?
The documentation appears to be comprehensive. As mentioned above I did spot a few stray things that seemed to be left over from previous design iterations. There's also the occasional incomplete specification, e.g. basic_fields::at doesn't way what it does with multiple fields. Probably best to just raise these as issues.
- What is your evaluation of the potential usefulness of the library?
Very useful. It fits very well with my use cases.
- Did you try to use the library? With which compiler(s)? Did you
have any problems?
Using gcc 6.1 on linux, I spent some time writing and rewriting several small http servers to evaluate the library, and Vinnie has now incorporated some of these into the beast repository as examples. Along the way I did discover some minor opportunities for improvement in the library, and Vinnie was quick to respond.
- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
I spent many hours examining the library over a period of a few weeks, both before and during the review period. I gave the implementation and examples in-depth study, less so the documentation.
- Are you knowledgeable about the problem domain?
I hope Vinnie asked me to review Beast because I know a thing or two. However, I must qualify that: my experience in HTTP is almost entirely 1.0 and I only have a passing knowledge of websockets.
Many thanks to Vinnie for creating Beast and proposing it for inclusion in boost. It is nice to see the addition of protocol implementations that build on top of Asio's asynchronous model so seamlessly.
Cheers,
Chris
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk