Boost logo

Boost :

Subject: Re: [boost] [http] Formal Review
From: Lee Clagett (forum_at_[hidden])
Date: 2015-08-13 21:19:47


On Thu, Aug 13, 2015 at 11:38 AM, Vinícius dos Santos Oliveira <
vini.ipsmaker_at_[hidden]> wrote:

> 2015-08-12 15:58 GMT-03:00 Antony Polukhin <antoshkka_at_[hidden]>:
>
> > That's the fatal problem of the library. It seems like the library
> attempts
> > to get best from ASIO and cpp-netlib and unfortunately fails. Now let's
> get
> > into details:
> >
> > * ASIO is a low level abstraction over protocol that does not take care
> > about memory management of buffers (in most cases it's the responsibility
> > of user to allocate and keep alive the required buffers).
> > * cpp-netlib takes care of memory management, hides low-level stuff.
> >
>
> [snip]

>
> Those bullets will allow to write following code (based on
> >
> >
> http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime3/src.html
> > with minimal modifications):
> >
> > #include <ctime>
> > #include <iostream>
> > #include <string>
> > #include <boost/bind.hpp>
> > #include <boost/shared_ptr.hpp>
> > #include <boost/enable_shared_from_this.hpp>
> > #include <boost/asio.hpp>
> >
> > std::string make_daytime_string() {
> > using namespace std; // For time_t, time and ctime;
> > time_t now = time(0);
> > return ctime(&now);
> > }
> >
> > class http_connection : public
> > boost::enable_shared_from_this<http_connection> {
> > public:
> > typedef boost::shared_ptr<http_connection> pointer;
> >
> > static pointer create(boost::asio::io_service& io_service) {
> > return pointer(new http_connection(io_service));
> > }
> >
> > http::socket& socket() {
> > return socket_;
> > }
> >
> > void start() {
> > message_.resize(4 * 1024);
> >
> > boost::asio::async_read(socket_, boost::asio::buffer(message_),
> > http::completions::full(message_), // read until all the whole
> > request is in `message_`
> > boost::bind(&http_connection::handle_read, shared_from_this(),
> > boost::asio::placeholders::error,
> > boost::asio::placeholders::bytes_transferred));
> > }
> >
> > private:
> > http_connection(boost::asio::io_service& io_service)
> > : socket_(io_service)
> > {}
> >
> > void handle_write(const boost::system::error_code& /*error*/,
> > size_t /*bytes_transferred*/)
> > {}
> >
> > void handle_read(const boost::system::error_code& error, size_t
> > bytes_transferred) {
> > assert(!error); // for shortness
> > // `message_` contains the whole request, including headers and body
> > // because http::completions::full was provided
> >
> > {
> > boost::system::error_code e;
> > http::view v(message_, e); // represent buffer as http message
> > assert(!e); // no error occured during parsing
> > assert(v.read_state() == http::read_state::empty);
> > std::cerr << v.version(); // "HTTP1.1"
> > std::cerr << v["Content-type"]; // "plain/text"
> > std::cerr << v.body(); // string_ref("Hello! This is a
> body
> > content. What time is it?")
> > message_.clear(); // `v` is now invalid to use!
> > }
> >
> > {
> > http::generator resp(message_); // use `message_` for an
> > output
> > resp << make_daytime_string(); // append daytime to body
> > resp["Content-Type"] = "plain/text"; // set some headers
> > resp.version(http::version::HTTP_1_1); // explicitly set HTTP
> > version
> > resp.code(200); // "200 OK"
> > resp.flush(); // explicitly flush to
> > buffer. Done automatically in destructor
> > }
> >
> > boost::asio::async_write(socket_, boost::asio::buffer(message_),
> > boost::bind(&http_connection::handle_write, shared_from_this(),
> > boost::asio::placeholders::error,
> > boost::asio::placeholders::bytes_transferred));
> > }
> >
> > http::socket socket_; // manages the connection only
> > std::string message_;
> > };
> >
> > class http_server {
> > public:
> > http_server(boost::asio::io_service& io_service)
> > : acceptor_(io_service, http::endpoint(http::all_versions, tcp::v4(),
> > 80))
> > {
> > start_accept();
> > }
> >
> > private:
> > void start_accept() {
> > http_connection::pointer new_connection =
> > http_connection::create(acceptor_.get_io_service());
> >
> > acceptor_.async_accept(new_connection->socket(),
> > boost::bind(&http_server::handle_accept, this, new_connection,
> > boost::asio::placeholders::error));
> > }
> >
> > void handle_accept(http_connection::pointer new_connection,
> > const boost::system::error_code& error)
> > {
> > if (!error) {
> > if (new_connection->socket().version() == http::version::HTTP_2_0)
> {
> > // do some more stuff
> > }
> >
> > new_connection->start();
> > }
> >
> > start_accept();
> > }
> >
> > http::acceptor acceptor_;
> > };
> >
> > int main() {
> > try {
> > boost::asio::io_service io_service;
> > http_server server(io_service);
> > io_service.run();
> > } catch (std::exception& e) {
> > std::cerr << e.what() << std::endl;
> > }
> >
> > return 0;
> > }
> >
>
> This idea is horrible. An HTTP message is not a std::string. HTTP does have
> a string representation, the HTTP wire format from HTTP 1.1, which is
> different than HTTP 2.0, which is different than FastCGI, which would be
> different than a ZeroMQ-based approach and so on and so on.
>
>
Why would this idea be limited to strings? The generator/parser could
easily take another type. And std::strings can store binary data network
data, if thats what your suggesting (it wasn't exactly clear to me).

I'm not proposing an HTTP parser, I'm proposing an HTTP server library.
> This library will grow and also include a client. The HTTP parser is the
> least useful thing this library could provide. I'm aiming a library that is
> so useful that you'd use it no matter if you're going to write an
> application to be used on Raspberry Pi (or even more constrained) or on a
> cluster of nodes that distribute workload among thousands and thousands of
> nodes and use different intermediate protocols.
>
>
Definining a HTTP parser (specifically a SAX-style parser) is likely the
only way to achieve zero allocations, which is exactly what the parser in
use by this library is doing. Providing C++ templated SAX parser _would_ be
novel - it would allow for inlined/direct callbacks as opposed to the
indirect callbacks provided by the C parser in use. Probably a small
reduction in CPU cycles though, since parsing time should dominate.
Providing such a library would likely change the current design
dramatically.

> You even assume HTTP version is present. It may not even make sense and it
> shouldn't be exposed to the user anyway. The user is interested in features
> (can I use chunking? can I use X?). It's more portable. You'd suggest a
> different approach if you had invested time to propose a library that could
> be used with different backends.
>
> About your http::view proposal: It's not much different from http::Message
> concept, but:
>
> - Your http::view read headers and read_state. This means you're
> mixing/coupling communication channels and HTTP messages. Bad for
> different
> backends. It's bad because you then need to answer questions such as
> "how
> do I pass a pool, custom alocator and this and that to my backend?".
> - Also, how does your proposal handle progressive download and chunking?
> v.body() doesn't seem very/any asynchronous at all. The flush method in
> response is useful for chunking, but the lack of information around the
> capability of chunking by the communication channel leaves no option
> but to
> buffer implicitly, which can exhaust memory in video live stream. And
> this
> magic node-like API can very easily hamper interoperability among
> different
> implementations (I've already went into this road[1].
>
>
> About the no-buffering approach, I just cannot give this guarantee. Any
> object fulfilling the ServerSocket concept will have its own guarantees and
> it is up to the user to check them. At the handling level, you can use
> custom types fulfilling the message concept that will avoid memory
> allocation. This possibility is not accidental. Just like you're worried,
> I'm worried too.
>

How could a user provide a message concept that satifies the
SequenceContainer for the body avoid memory allocations? The documentation
for reading the body doesn't state how the SequenceContainer is being used
or modified. Can I set a size for the container, that the http::Socket
concept never exceeds while reading (effectively making it a fixed buffer
like ASIO)? I've read this several times (and maybe I'm being thick), but
if I were writing an implementation of this concept, it would seem that its
implementation defined how the SequenceContainer is being manipulated.
Currently the body is being automatically resized on reads, with new data
being copied in. I _think_ the only way to control how much is being copied
in at a time (and thus the max the SequenceContainer is being resized), is
through the buffer argument to basic_socket, but this is undocumented.

And to satifies the message requirements for the header/trailers requires
an AssociateMap<string, string>. Currently the key/value types are required
to meet the string concept (which I do not recall being defined by C++ or
boost), but assuming it meant an stl compatible basic_string, the best way
to avoid memory allocation is a small string optimization and combine with
flat_map or an intrusive map. This would reduce, but not eliminate
allocations. I'm slightly less concerned about this, I think its a good
tradeoff, but I'm not sure how acceptable these loose constraints are for
embedded situations that are continually mentioned. Any embedded developers
following this?

>
> This will also untie Boost.HTTP from ASIO. Now users could use
> > http::generator/http::view to read/generate HTTP data. This significantly
> > extends the use cases of your library (for example on a previous work we
> > had our own network transfer implementation, but we'd gladly use a http
> > parser and generator).
> >
>
> The library concepts are already untied from Asio network code. But the
> library remains tied to Asio's concepts for asynchronous coding. The
> library is used to interact with external parties, so I use an asynchronous
> API to provide scalability. I need some framework for asynchronous
> foundations and I use Asio. I'm choosing to reuse a Boost component to
> submit a Boost library. But I'm not using Asio just because it's already
> into Boost. Asio is flexible enough to allow me to use any architecture I
> want (single-threaded, one thread per operation, multiple operations per
> thread and one thread per CPU core...) using any of multiple styles
> (completion handlers, futures, coroutines, blocking...) and much more. I
> won't provide this foundation myself into Boost.Http as Asio already does
> it for me.
>
> Such code is also less intrusive: only minimal chages were required to
> > change a tcp example into the http example. Users will appreciate that,
> > especially in cases when ASIO is already used for HTTP.
> >
>
> I do appreciate the "less intrusive" guideline. But I dislike the attempt
> to make http::socket as close to tcp::socket (or any other socket) as
> possible. It'll just remove features and you're ignoring that some users
> want alternative HTTP backends. Your proposal is **less** useful because it
> can be used in **less** applications. You need to elaborate more if you
> want to convince that this approach will make **more** users happy.
>
>
I asked this elsewhere: is there a plan to select different HTTP parsers?
How would that be achieved in terms of the existing socket concepts? Would
a new Parser concept be introduced? Or would the parser be tied to a
specific implementation of the existing socket concepts? i.e. basic_socket
is always the current parser, while basic_spirit_socket is a spirit based
approach ?

> BTW, after that you could easily implement http::stream (that must be close
> > to tcp::stream by functionality). http::stream would be a killer feature
> > that would be widely used by utility programs (as I remember current
> > Boost's regression testing requires it, but uses tcp::stream with a bunch
> > of workarounds).
> >
>
> An http::stream would be an adapter, not decrease the current API quality,
> so I'm less reluctant against this point. Anyway, the more I read your
> review, the more I think you're completely ignoring that network code might
> not even be involved in passing HTTP messages around. This would explain
> why you could prefer cpp-netlib, but it is just less useful.
>
> Please, implement the HTTP2.0 protocol. There are some open source
> > solutions that are licensed under the Boost license and are very close to
> > full implementation of HTTP2.0. Just reuse the code and chat with the
> > authors for better understanding of the problems.
> >
>
> I'll try to provide a prototype tonight, but the intent of the prototype is
> just to prove that this core API is okay.
>
> Also please do not use coroutines at least in first example. coroutines is
> > a new feature in ASIO, many users use old Boost versions that have no
> > coroutines and some users are not familiar with them. So if you start
> with
> > coroutines user will have to go to ASIO docs and dig into coroutines to
> > understand the very first example. Example without coroutines will be
> more
> > user friendly.
> >
>
> This is the kind of argument... by the time Boost.Http reaches Boost, Asio
> users will already be used to coroutines. I don't even depend on the latest
> Boost release if I'm not mistaken.
>
>
What about people that want to test the library before any (potential)
boost inclusion? Or what about people that want a lower overhead (but more
limited) ASIO::coroutine approach - is that a possible combination?

> 7. How much effort did you put into your evaluation? A glance? A quick
> > reading? In-depth study?
> >
> > A few hours digging into docs, about 3 hours looking into the sources,
> > about 3 hours formulising the issues and recommending solutions.
> >
>
> Please read the design choices chapter. You seem to be completely ignoring
> all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't
> used.
>
> I read the design choices section, and I don't understand the relevance.
Could you elaborate?

Lee


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk