Boost logo

Boost :

Subject: Re: [boost] [http] Formal Review
From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2015-08-13 11:38:37


2015-08-12 15:58 GMT-03:00 Antony Polukhin <antoshkka_at_[hidden]>:

> That's the fatal problem of the library. It seems like the library attempts
> to get best from ASIO and cpp-netlib and unfortunately fails. Now let's get
> into details:
>
> * ASIO is a low level abstraction over protocol that does not take care
> about memory management of buffers (in most cases it's the responsibility
> of user to allocate and keep alive the required buffers).
> * cpp-netlib takes care of memory management, hides low-level stuff.
>

I'm actually competing with cpp-netlib, but cpp-netlib has more
abstractions. I finished a strong core abstraction and submitted it for
review.

This strong core can already be used to build some high level abstractions
(or applications). There is a file server making use of all this low-level
abstraction and there are lots of type erasers to allow the use of this
library into plugins, so you can change the handlers at runtime without
restarting the server.

I just expect to provide more algorithms and abstractions on top of this
strong core and anything that is high-level will use the same main players
from this core proposal (http messages and sockets). And the generic
algorithms that are provided (e.g. file server) will work on any
alternative high-level API or HTTP backend that is added in the future.

Boost.Http is stuck somewhere in the middle: sometimes it allocates and
> extends buffers, sometimes not. Because of that the design does not seem
> solid. Any attempt to hide all the memory management inside the library
> will end in an another implementation of cpp-netlib. So it seems right to
> remove all the memory management from the library internals and make the
> library closer to ASIO.
>

I don't think it's fair to suggest that any attempt I make to improve the
high-level side of the library will end up in another way to implement the
cpp-netlib API. The cpp-netlib API has several flaws and I avoid them all.
The advantage of cpp-netlib is that cpp-netlib has more abstractions.

Here are my recommendations for separation of flies and dishes:
>
> * Stick to the idea that http socket must ONLY manage connection without
> extending the ASIO's send/receive function signatures. That means that
> http1.0 and http1.1 sockets must be almost a typedef for tcp/ssl socket.
> Http2.0 socket must be very close to SSL socket. No additional methods
> (like async_read_trailers or async_write_response) must be provided.
>

TCP is stream-oriented, UDP is datagram-oriented and HTTP is
message-oriented (request-reply actually).

HTTP messages are structured and should remain type-safe. Your application
may be using an HTTP backend that doesn't even do network (shared memory
protocol for load-balance high-load of HTTP traffic) and this library will
still be useful. This library is sticking to HTTP concepts, not some hacky
adaption of tcp/udp sockets. I don't even provide a parser (I made sure
it's completely hidden from user).

Additional methods like async_read_trailers and async_write_response is one
of the main reasons why it's possible to abstract HTTP communication
channels. You hide communication-specific details into communication
channel code.

If you don't want memory exhaustion trying to read a possibly infinite
message body that is **very** easy to craft you should start to thinking
about not making so many things implicit/magic. From the http::view
proposal you outline below, I assume you are considering
parsing/destructuring the message at a different point in time, which is
just changing the abstraction from one point to another and providing no
benefit over current design at all. Actually, your proposal seem to be
highly coupled to HTTP wire format, which is terrible for different
backends and may become the cause of major inefficiencies.

* It must be a user's responsibility to provide buffers for messages. Never
> extend buffers in socket-related methods.
>

Implementation is improving:

   -
   https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621aedf75521556/include/boost/http/socket.hpp#L269
   - Place for improvement:
   https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621aedf75521556/include/boost/http/socket.hpp#L288

message.insert(make_pair("host", "www.example.com")) kind of
allocate/extend buffers, but the message type is completely controlled by
the user and I even was careful to make sure that a single block of memory
could be used to the whole message (headers, body and trailers).

* Provide helper classes that parse and generate http
>

That's actually a good idea. Akin to Asio's async_write, I assume. It
wouldn't change any already provided API, though.

* Provide out-of-the-box completion conditions for async_read operation:
> http::completions::full, http::completions::headers,
> http::completions::body...
>

Okay.

Those bullets will allow to write following code (based on
>
> http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime3/src.html
> with minimal modifications):
>
> #include <ctime>
> #include <iostream>
> #include <string>
> #include <boost/bind.hpp>
> #include <boost/shared_ptr.hpp>
> #include <boost/enable_shared_from_this.hpp>
> #include <boost/asio.hpp>
>
> std::string make_daytime_string() {
> using namespace std; // For time_t, time and ctime;
> time_t now = time(0);
> return ctime(&now);
> }
>
> class http_connection : public
> boost::enable_shared_from_this<http_connection> {
> public:
> typedef boost::shared_ptr<http_connection> pointer;
>
> static pointer create(boost::asio::io_service& io_service) {
> return pointer(new http_connection(io_service));
> }
>
> http::socket& socket() {
> return socket_;
> }
>
> void start() {
> message_.resize(4 * 1024);
>
> boost::asio::async_read(socket_, boost::asio::buffer(message_),
> http::completions::full(message_), // read until all the whole
> request is in `message_`
> boost::bind(&http_connection::handle_read, shared_from_this(),
> boost::asio::placeholders::error,
> boost::asio::placeholders::bytes_transferred));
> }
>
> private:
> http_connection(boost::asio::io_service& io_service)
> : socket_(io_service)
> {}
>
> void handle_write(const boost::system::error_code& /*error*/,
> size_t /*bytes_transferred*/)
> {}
>
> void handle_read(const boost::system::error_code& error, size_t
> bytes_transferred) {
> assert(!error); // for shortness
> // `message_` contains the whole request, including headers and body
> // because http::completions::full was provided
>
> {
> boost::system::error_code e;
> http::view v(message_, e); // represent buffer as http message
> assert(!e); // no error occured during parsing
> assert(v.read_state() == http::read_state::empty);
> std::cerr << v.version(); // "HTTP1.1"
> std::cerr << v["Content-type"]; // "plain/text"
> std::cerr << v.body(); // string_ref("Hello! This is a body
> content. What time is it?")
> message_.clear(); // `v` is now invalid to use!
> }
>
> {
> http::generator resp(message_); // use `message_` for an
> output
> resp << make_daytime_string(); // append daytime to body
> resp["Content-Type"] = "plain/text"; // set some headers
> resp.version(http::version::HTTP_1_1); // explicitly set HTTP
> version
> resp.code(200); // "200 OK"
> resp.flush(); // explicitly flush to
> buffer. Done automatically in destructor
> }
>
> boost::asio::async_write(socket_, boost::asio::buffer(message_),
> boost::bind(&http_connection::handle_write, shared_from_this(),
> boost::asio::placeholders::error,
> boost::asio::placeholders::bytes_transferred));
> }
>
> http::socket socket_; // manages the connection only
> std::string message_;
> };
>
> class http_server {
> public:
> http_server(boost::asio::io_service& io_service)
> : acceptor_(io_service, http::endpoint(http::all_versions, tcp::v4(),
> 80))
> {
> start_accept();
> }
>
> private:
> void start_accept() {
> http_connection::pointer new_connection =
> http_connection::create(acceptor_.get_io_service());
>
> acceptor_.async_accept(new_connection->socket(),
> boost::bind(&http_server::handle_accept, this, new_connection,
> boost::asio::placeholders::error));
> }
>
> void handle_accept(http_connection::pointer new_connection,
> const boost::system::error_code& error)
> {
> if (!error) {
> if (new_connection->socket().version() == http::version::HTTP_2_0) {
> // do some more stuff
> }
>
> new_connection->start();
> }
>
> start_accept();
> }
>
> http::acceptor acceptor_;
> };
>
> int main() {
> try {
> boost::asio::io_service io_service;
> http_server server(io_service);
> io_service.run();
> } catch (std::exception& e) {
> std::cerr << e.what() << std::endl;
> }
>
> return 0;
> }
>

This idea is horrible. An HTTP message is not a std::string. HTTP does have
a string representation, the HTTP wire format from HTTP 1.1, which is
different than HTTP 2.0, which is different than FastCGI, which would be
different than a ZeroMQ-based approach and so on and so on.

I'm not proposing an HTTP parser, I'm proposing an HTTP server library.
This library will grow and also include a client. The HTTP parser is the
least useful thing this library could provide. I'm aiming a library that is
so useful that you'd use it no matter if you're going to write an
application to be used on Raspberry Pi (or even more constrained) or on a
cluster of nodes that distribute workload among thousands and thousands of
nodes and use different intermediate protocols.

You even assume HTTP version is present. It may not even make sense and it
shouldn't be exposed to the user anyway. The user is interested in features
(can I use chunking? can I use X?). It's more portable. You'd suggest a
different approach if you had invested time to propose a library that could
be used with different backends.

About your http::view proposal: It's not much different from http::Message
concept, but:

   - Your http::view read headers and read_state. This means you're
   mixing/coupling communication channels and HTTP messages. Bad for different
   backends. It's bad because you then need to answer questions such as "how
   do I pass a pool, custom alocator and this and that to my backend?".
   - Also, how does your proposal handle progressive download and chunking?
   v.body() doesn't seem very/any asynchronous at all. The flush method in
   response is useful for chunking, but the lack of information around the
   capability of chunking by the communication channel leaves no option but to
   buffer implicitly, which can exhaust memory in video live stream. And this
   magic node-like API can very easily hamper interoperability among different
   implementations (I've already went into this road[1].

About the no-buffering approach, I just cannot give this guarantee. Any
object fulfilling the ServerSocket concept will have its own guarantees and
it is up to the user to check them. At the handling level, you can use
custom types fulfilling the message concept that will avoid memory
allocation. This possibility is not accidental. Just like you're worried,
I'm worried too.

This will also untie Boost.HTTP from ASIO. Now users could use
> http::generator/http::view to read/generate HTTP data. This significantly
> extends the use cases of your library (for example on a previous work we
> had our own network transfer implementation, but we'd gladly use a http
> parser and generator).
>

The library concepts are already untied from Asio network code. But the
library remains tied to Asio's concepts for asynchronous coding. The
library is used to interact with external parties, so I use an asynchronous
API to provide scalability. I need some framework for asynchronous
foundations and I use Asio. I'm choosing to reuse a Boost component to
submit a Boost library. But I'm not using Asio just because it's already
into Boost. Asio is flexible enough to allow me to use any architecture I
want (single-threaded, one thread per operation, multiple operations per
thread and one thread per CPU core...) using any of multiple styles
(completion handlers, futures, coroutines, blocking...) and much more. I
won't provide this foundation myself into Boost.Http as Asio already does
it for me.

Such code is also less intrusive: only minimal chages were required to
> change a tcp example into the http example. Users will appreciate that,
> especially in cases when ASIO is already used for HTTP.
>

I do appreciate the "less intrusive" guideline. But I dislike the attempt
to make http::socket as close to tcp::socket (or any other socket) as
possible. It'll just remove features and you're ignoring that some users
want alternative HTTP backends. Your proposal is **less** useful because it
can be used in **less** applications. You need to elaborate more if you
want to convince that this approach will make **more** users happy.

BTW, after that you could easily implement http::stream (that must be close
> to tcp::stream by functionality). http::stream would be a killer feature
> that would be widely used by utility programs (as I remember current
> Boost's regression testing requires it, but uses tcp::stream with a bunch
> of workarounds).
>

An http::stream would be an adapter, not decrease the current API quality,
so I'm less reluctant against this point. Anyway, the more I read your
review, the more I think you're completely ignoring that network code might
not even be involved in passing HTTP messages around. This would explain
why you could prefer cpp-netlib, but it is just less useful.

Please, implement the HTTP2.0 protocol. There are some open source
> solutions that are licensed under the Boost license and are very close to
> full implementation of HTTP2.0. Just reuse the code and chat with the
> authors for better understanding of the problems.
>

I'll try to provide a prototype tonight, but the intent of the prototype is
just to prove that this core API is okay.

Also please do not use coroutines at least in first example. coroutines is
> a new feature in ASIO, many users use old Boost versions that have no
> coroutines and some users are not familiar with them. So if you start with
> coroutines user will have to go to ASIO docs and dig into coroutines to
> understand the very first example. Example without coroutines will be more
> user friendly.
>

This is the kind of argument... by the time Boost.Http reaches Boost, Asio
users will already be used to coroutines. I don't even depend on the latest
Boost release if I'm not mistaken.

7. How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
>
> A few hours digging into docs, about 3 hours looking into the sources,
> about 3 hours formulising the issues and recommending solutions.
>

Please read the design choices chapter. You seem to be completely ignoring
all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't
used.

I urge other reviewers to take a look at the cpp-netlib. If you treat
> Boost.Http as an out-of-box solution, then cpp-netlib is more mature,
> user-friendly and solid. If you treat Boost.Http as a basic building block
> for HTTP (just like ASIO is a basic building block for networking), then
> it's better to stick to the ASIO's design and do not *partially* manage
> memory!
>

I'm puzzled by the meaning of solid. If it's something old, then Boost.Http
will never beat cpp-netlib.

Anyway, thanks for such time-devoted review, Antony.
=)

[1] https://github.com/vinipsmaker/tufao/issues/41

-- 
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk