Boost logo

Boost :

Subject: Re: [boost] [asio-users] [http] Formal review of Boost.Http
From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2015-08-13 07:56:19


First of all, sorry to all members of the list about my unavailability. I
was planing to write an experimental HTTP 2.0 backend, so I could give more
confidence about how much I believe this Boost.Http core I present for
review is the right abstraction.

Anyway, looks like I took the wrong approach. I should have answered the
easiest questions first and implement the HTTP 2.0 backend (experimental,
using existing-library and not Boost-quality) later.

2015-08-11 14:35 GMT-03:00 Niall Douglas <s_sourceforge_at_[hidden]>:

> Where I'm really at is I think if Http is accepted you're going to
> either have to ditch it and reimplement atop the Networking TS as
> Chris folds the substantial changes WG21 will force onto ASIO into
> Boost.ASIO, or end up refactoring Http to cleave more closely to the
> Networking TS anyway.
>

I'm okay with that.

> And then you aren't following C++ rule number #1 anymore: You only pay for
> > what you use. That's why Asio itself doesn't solve this problem for you.
> > You can use boost::http::basic_socket<queue_socket> if you need to work
> > around Asio composed operations at this level. All customization points
> are
> > there for anyone.
>
> I am finding myself unconvinced by your arguments here. What stands
> in the way of a two layer API design? Bottom layer is racy but lowest
> latency. Top layer is not racy, but adds some latency.
>
> I think for the majority of HTTP users they just want it to work
> without surprises to a high default performance level. If you look at
> the history of the HTTP library support in Python you'll see what I
> mean - firstly, it's surprisingly easy to get a HTTP library API
> design wrong, even in a v2 refactor. And secondly that people need
> both a stupid-simple API and a more bare metal API *simultaneously*
> with HTTP, and therein lies the design gotcha.
>

If you want a high level API, you're going to use coroutines. There is
nothing in the wild so readable as coroutines. Coroutines are **the**
solution to solve spaghetti code in asynchronous abstractions. Lambdas and
futures will never be as readable as coroutines. Anyway...

If you want a high-level API, you're going to use coroutines and the use of
coroutines will already suspend your code until the completion of the
previous operation. You end up not scheduling too many operations at once
(less resources consumed) and you are using the API the right way.

If you want to not use coroutines and still have a somehow high-level API,
just change the underlying socket. It's not a problem. The only problem I
see here is the lack of a page just documenting composed operations given
the confusion that arose on this matter.

Really, you should NOT pay for what you do NOT use. Eventually we'll have
coroutines in the language, so you will be unable to even something more
efficient and will end up just using coroutines. And with the design I
propose, you won't even pay for scheduling/storing multiple operations that
just cannot be used right now. If the "pay for what I don't use" design was
what Boost wanted, I believe Boost.Asio would do different (fair enough
that Boost.Asio is a low-level library).

Now, about a real high-level API. Boost.Http is somewhat low-level, but not
because the reasons given. I can provide a higher-level API and it won't
change the points you're against. If you see the Boost.Http roadmap, you'll
notice where Boost.Http is really low-level (lack of requests router, form
parsing, HTTP session management...). This kind of stuff can go hugely
polemic and I think it's very unwisely to integrate it all at once. You'd
be like comparing frameworks that are completely different (like Python's
Django or Flask) and asking "hey, what is the **right** choice?". This kind
of question is really unhelpful. These frameworks continue to evolve and
sometimes they break API or even completely new approaches arise (like the
recent rise of popularity in web microframeworks). I'd rather provide
really generic and flexible building blocks than state that my view of web
development is correct. Not too long ago, LAMP was a very popular solution,
and this solution assumed you would use MySql database, but sometimes you
do not even need a database.

You should check the answer for "Why isn't a router available?" on the
Boost.Http FAQ: https://boostgsoc14.github.io/boost.http/design_choices.html

It's very polemic and I need to develop a NEW approach, not import the
design from some place or another. I need to reconcile current approaches
(I'd like to use the word paradigm here). Not only reconcile them, but I
need to allow some kind of collaboration. And it's C++, it's harder. It's
not harder because it's C++. It's harder because the C++ community takes
software development very seriously. And then, it's not just C++, it's
Boost, the small group from C++ that is know among the community as the
group who strives to deliver even higher quality software.

If we stick for what we need for now, I believe the correct question to
focus on is "If I need to communicate HTTP messages, what is the correct
approach?". It's what we need now, pass HTTP messages around. And then, the
HTTP protocol may not even be involved, that's why I focused so much on
allow alternative HTTP backends. Extremely detailed and careful
requirements like the ones written for Message[1] and ServerSocket[2] is
not something you'll see anyone doing. And it takes a lot of care because
you need to be careful who you're excluding. I choose to not exclude HTTP
1.0 and I've put HTTP upgrade and HTTP chunking as optional features that
the user must check. I also choose to not exclude alternative backends. I
also choose to allow **lightweight** implementations, so embedded devices
would be left out. cpp-netlib and pion have their own thread pools and
aren't very friendly to embedded devices. I also go fully async, allowing
really really fine-grained control by the user. Even trying to be so
ambitious, I'd not say that the design went so low-level that becomes
unusable, as the spawn example proves[3] (163 lines and a good part is
because Boost.Asio boilerplate, not Boost.Http).

I have some experience with HTTP libraries (one of the most important being
the Tufão project[4]) and I acquired experience. This experience I have is
what makes me capable of judging some design decisions that can be a
mistake. Not considering alternative HTTP backends from the beginning being
one of them. Other mistakes being less obvious. The Node.js API is really
trick (implicit decision on whether use chunking or not) to get right if
you're concerned with portability among applications (there is an old and
fixed Tufão bug related to just this[5]). The hush to get high-level APIs
can also become a problem because you can very very easily lose the
possibility to do fine-grained adjustments. First you're fine with
single-threaded, and then you want the handling being split into threads.
Then you want not split the responsibility to split the connections among
threads, but each pair of request-response with a scheduler that is clever
than round-robin[6]. Eventually you'll end up losing the interoperability
and having to rewrite large parts of the application.

I'm very concerned about interoperability, that's why I mentioned it in the
very initial GSoC proposal last year, along the following lines:

"In fact, there is a lot of higher-level abstractions competing with each
> other, providing mostly incompatible abstractions. By not targeting this
> field, this library can actually become a new base for all these
> higher-level abstractions and allow greater interoperability among them."
> -- https://github.com/vinipsmaker/gsoc2014-boost#non-goals
>

The more I think, the more I believe how much such "middle-level" API is
underestimated.

I mentioned at random places how much I appreciate the message-oriented
approach I'm using. Now I think I should have dedicated a whole chapter on
the topic. If it weren't for this message-oriented approach, the design
would be much more complex. It'd be like trying to solve problems of a
high-level API (set_timeout, set_scheduler, set_handler_factory,
set_allocator, set_pool, set_feature_xyz) at this level already and
thinking really hard to not miss **ANY** feature that the user could
possibly want to customize.

The message-oriented approach has a single-first-immediate impact:
Communication channels and message representations are decoupled.

Without this simple separation, you do not need to complicate communication
channels so much. And you also gain the ability to use your own allocators,
pools, data structures, non-allocating buffers and so on. I've read some
messages about people concerned that the API is too level and even still
will allocate sometimes. All the examples I've written allocate and the
reason is that all the examples I wrote don't have a bounded limit of HTTP
connections (there isn't a single big chunk of stack-allocated
pool/buffer/...). A new "handler" will be created as a new connection
appears. But if you put a limit, implement your own data structures and
read the documentation (it doesn't even need to be an extremely careful
read), you're done. There are some gritty decisions on implementation
details that made me allocate at some spots (use of functors, lack of
dynarray...), but then I should discuss the implementation and not mix with
API design (unless you propose an interface to specify allocators).

It's not really like Boost.Http is too low level. It's more like only the
less polemic building blocks are available now. You'll see the same main
players in a higher-level abstraction and the main difference is most
likely that you won't manage the main players yourself.

And advocate for a Boost.Http instead a Boost.NetExtensions because I allow
alternative backends. Asio will always be involved because the pure async
nature, but you might not even use network and use an alternative backend
that communicate using shared memory and other means.

There are many more thoughts, but I think I'm diverging too much from
Niall's concern, so I'll stop here and answer the rest of the questions. If
you guys have any question to specific points, just raise them.

[1] https://boostgsoc14.github.io/boost.http/reference/message_concept.html
[2]
https://boostgsoc14.github.io/boost.http/reference/server_socket_concept.html
[3]
https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621aedf75521556/example/spawn.cpp
[4] https://github.com/vinipsmaker/tufao
[5] https://github.com/vinipsmaker/tufao/issues/41
[6] https://en.wikipedia.org/wiki/Round-robin_scheduling

-- 
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk