Boost logo

Boost :

Subject: Re: [boost] [http] Formal review of Boost.Http
From: Vinícius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2015-08-07 18:28:14


2015-08-07 12:53 GMT-03:00 Niall Douglas <s_sourceforge_at_[hidden]>:

> I do not believe Http is ready for peer review for these reasons, and
> I personally would urge you withdraw it until you have fixed these
> issues and then come back to us. If you do not withdraw it, I will
> vote for rejection.
>

Can I persuade you to review the Boost.Http design? I'd reject myself if
you find any fundamental flaw on the design and even if you're going to
reject the library because the lack of tooling (not directly related to
design or documentation, which are the most important items on any
library), it'd be useful to know what needs to be addressed on the design
before I submit again (in the case the submission is rejected).

1. You require a C++ 11 compiler and are very much an extension of
> ASIO throughout, and then don't take advantage of ASIO's extensive
> internal C++ 11 vs Boost switching infrastructure.

It's not my intention to make Boost.Http work with C++. I use Boost as
dependency and I'm not willing to change that (unless the pieces I use from
Boost enter on the C++ standard).

About C++03 support, it's not required for Boost acceptance, so I can add
later. I want to increase tests coverage a lot before porting code to
C++03, so I don't introduce accidental bugs in the process.

In the case the abstractions I need enter on the C++ standard, maybe your
APIbind will help me.

On the second page
> of your documentation you state:
>
> "std::error_code, for instance, cannot be transparently replaced by
> boost::system::error_code within ASIO-using code"
>
> This is utterly untrue, and I told you that several months ago. ASIO
> itself maps some error_code implementation into namespace asio - go
> look at its config.hpp. Therefore you simply use whatever error_code
> ASIO itself uses, using the same internal STL abstraction
> infrastructure ASIO uses. The same goes for every other instance
> where you are using Boost dependencies.
>

I tried to find such macros on Asio and I couldn't find any for error_code:
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/overview/cpp2011.html

Now that you mentioned config.hpp, I used my grep/find skills and I found:
BOOST_ASIO_HAS_STD_SYSTEM_ERROR

But I just don't find any reference to BOOST_ASIO_HAS_STD_SYSTEM_ERROR
outside of config.hpp. Even if it was used, it is undocumented, which means
it is an implementation detail that won't be guaranteed to be there forever
(accidental behaviour versus a documented guarantee).

I said several months ago that you should properly finish the ASIO
> integration by ripping out all your current hard coded usage of Boost
> and using ASIO's own STL abstraction layer, and thereby making Http
> identical in supported platforms and configurations as ASIO (i.e.
> standalone C++ 11 or with-Boost) reusing the same ASIO configuration
> macros. I then suggested it would be a very short step to port Http
> to C++ 03 as you are not using anything which particularly demands
> C++ 11 outside the examples and tests and which ASIO hasn't
> abstracted for you.
>
> Http should be a model EXTENSION of ASIO in the cleanest possible
> sense in my opinion. You're not far from achieving it, and I really
> think you need to finish it.
>

Delay acceptance of the library will only delay its usage. Design of core
abstractions is finished and I fail to see why its acceptance should be
delayed while I improve implementation details.

Standalone Boost.Http isn't as exciting as a WebSocket implementation.
There is a roadmap of the features:
https://boostgsoc14.github.io/boost.http/design_choices/roadmap.html

C++03 is on the roadmap, but standalone Boost.Http is not.

2. You claim Boost.Build doesn't work outside the Boost tree. This is
> untrue - Boost.Build doesn't need Boost. I don't mind cmake for the
> unit testing alone as a temporary stopgap until you bite the
> Boost.Build bullet before it enters Boost, but I draw the line at
> requiring cmake for normal builds. Before presenting for review
> normal library builds cannot use cmake OR should be header only like
> Hana.
>

CMake will be replaced by Boost.Build before any integration.

3. ... which leads me to asking why is Http not a header only library
> like ASIO? That would eliminate your Boost.Build requirement in the
> strictest sense. Standalone ASIO doesn't need Boost.Build. Neither
> should Http.
>
> You mention in the FAQ the reason why not header only is mainly due
> to the use of a third party HTTP C parser. It could be insufficient
> familiarity on my part with the vaguaries of parsing HTTP, but it has
> never struck me as difficult to parse - you don't even need to
> tokenise, and indeed my first instinct would be it is better to NOT
> tokenise HTTP as that is both slower and opens a wider attack surface
> to a hand written "stupid" parser.
>
> (BTW static custom error code categories work just fine header only
> if correctly set up. Indeed Boost.System can be configured to be
> header only)
>

This parser is well tested and largely used. I want to increase test
coverage a lot before replacing the parser with one of my own. HTTP spec
used to be very confusing before RFC7230. Just now it's easy to implement
all corner-cases. Previously, you needed a lot of experience.

4. Http is the first line of parsing code in a world full of
> malicious exploitation - just this week all Android devices were made
> powned with a single malformed MMS message causing a buffer overflow.
> Anything parsing untrusted input needs to be regularly input fuzz
> tested period, and very especially Http. Until Http is running at
> least one of the below fuzz tools (american fuzzy lop or LLVM
> libfuzzer) at least weekly on a CI it is not safe to allow into
> production code.
>
> https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook#a7.SAFETY:S
> tronglyconsideranightlyorweeklyinputfuzzautomatedtestifyourlibraryisab
> letoacceptuntrustedinput
>

That's why this first implementation uses a proven-parser that is very well
tested. All tests from the Boost.Http socket will run with varying buffer
sizes (from 1 to 2047). Of course I do enjoy tests and I'm constantly
implementing more (the last experiment was using sanitizers).

Until you address a minimum of items 1 and 4, I am very sorry but I
> must vote for rejection.
>
> Other notes:
>
> * I don't understand why you cannot issue more than one async read at
> a time nor async write at a time. In something like pipelined HTTP
> that seems like a very common pattern. I smell a potential design
> flaw, and while the docs mention a "queue_socket" I can't see such a
> thing in the reference section.
>

Too much synchronization and buffering.

The pipeline has the following requests:

A => B => C

If I were to allow replies to request B, the design would be more complex,
increasing the difficult to implement alternative backends and everyone's
life. Also, the reply to request B cannot actually be sent while the reply
to request A is on hold. This means that the whole response to request B
would need buffering and if response B is a video live stream, this means
an implicit (the user cannot know) crash after an unfortunate memory
exhaustion. I couldn't claim a **lightweight** server if this excessive
buffering was to occur. I couldn't claim the project is appropriate for
embedded devices if I had chosen the buffering approach. It's a trade-off.
I should write about it in the "design choice" of the documentation.

HTTP/2.0 does have proper multiplexing and it can be more useful to respond
several requests without "head of line blocking".

A queue socket is a concept, I don't provide one. I can look into how make
the documentation less confusing by adding a page exclusively dedicated to
explain the problem it solves. Would that work for you?

* Your tutorial examples use "using namespace std;" at global level.
> That's a showstopper of bad practice, and you need to purge all of
> those. Same for "using namespace boost;" at global level.
>

The tutorials should be as small as possible to not scary the users.

But I can put a comment "you should avoid using namespace". Does it help?

* Your reference section is one very long thin list. You might want
> to use more horizontal space.
>

I'd be happy already if I knew hot to remove the TOC from the reference
page. It'd be much more clean already.

* I personally found the design rationale not useful. I specifically
> want to know:
>
> 1. Why did Http choose ASIO as my base over leading competitor X
> and Y?
> 2. What sacrifices occurred due to that choice? i.e. things I'd
> love weren't the case.
>

I assume I don't need to reply these questions during the review, given the
reviewers may already be experienced on the answers. However, these are
indeed good questions that I could add to the "design choices" page. Thank
you.

  3. What alternatives may be better suited for the user examining
> Http for suitability and why? i.e. make your documentation useful to
> people with a problem to solve, not just a howto manual.
>

I don't quite understand this point.

* Benchmarks, especially latency ones which are important for many
> uses of HTTP are missing. I'd particularly like to know confidence
> intervals on the statistical distribution so I know worst case
> outcomes.
>

These values will change when I replace the parser used now. Also, they can
change based on the buffer sizes and parser options, which the user is/will
be able to change.

Nevertheless, I do want to do some serious benchmarking, specially because
it'll give me a reference point when I write my own parser and want to
compare performance speedup. I can do some poor benchmark this weekend and
show the results.

* Personally speaking, I'd like if in your tutorial you took ten
> self-contained steps in building a simple HTTP fileserver, each with
> a finished working program at the end of each section. Like the ASIO
> tutorial does. In particular, I'd like to see layers which add on
> etags and other of the fancy facilities in stages.
>

That's a good idea, thank you.

I won't replace the current tutorial, though, as it teaches the main
players on HTTP. I can, however, add a second tutorial showing how to
implement a **simple** file server.

* I would personally worry about investing on developing code based
> on Http until you have HTTP 2.0 support in there, only because I
> would suspect you might need to break the API a bit to get the
> all-binary format of HTTP 2.0 working as it's such a change from HTTP
> 1.0. It may merely be paranoia, but I suspect I wouldn't be alone on
> this silent assumption.
>

You're right about being paranoid here. I was too. But I've spent some time
researching HTTP/2.0 and I got confident about the current design. But you
could vote for conditional acceptance based on HTTP/2.0 and I'd think it is
fair.

* You're still missing CI testing with valgrind, thread sanitiser,
> coveralls.io coverage testing, etc etc all the stuff from
> https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook.
>

I already started the experiments with the sanitizers, but I got a problem
with Boost Test and the experiment was put on hold. I'll come back to it
later.

I hope all the above isn't too disheartening Vinícius.

Not at all. I think you're the second person to dedicate more time
reviewing my work and this isn't disheartening at all. Just that your
feedback is usually around tooling and not the design itself (everybody has
its preferences).

If you took
> six weeks extra to finish the port of Http to ASIO proper,

I disagree about this point. Boost.Http is properly using Asio. Just
because it won't support standalone Asio, it doesn't mean it's improperly
using Asio.

added a
> Jamfile for the main library build,

Consider this done. I won't integrate into Boost without removing CMake
first. This was already partially stated in the Bjorn's announcement (but
somehow implicit, to be fair).

-- 
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk