Boost logo

Boost :

Subject: Re: [boost] [asio-users] [http] Formal review of Boost.Http
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-08 10:00:46


On 7 Aug 2015 at 19:29, Vinícius dos Santos Oliveira wrote:

> > I do not believe Http is ready for peer review for these reasons, and
> > I personally would urge you withdraw it until you have fixed these
> > issues and then come back to us. If you do not withdraw it, I will
> > vote for rejection.
>
> Can I persuade you to review the Boost.Http design? I'd reject myself if
> you find any fundamental flaw on the design and even if you're going to
> reject the library because the lack of tooling (not directly related to
> design or documentation, which are the most important items on any
> library), it'd be useful to know what needs to be addressed on the design
> before I submit again (in the case the submission is rejected).

I think my earlier email was poorly phrased now I've slept on it. I
made my objections seem like they were about portability, when they
were really about design and intent of Http.

What I should have said was this: Standalone ASIO is the reference
Networking TS implementation for standardisation. Http is currently a
set of extensions to Boost.ASIO specifically when it *should* be a
set of extensions to the reference Networking TS implementation for
C++ standardisation. This is why your current design and architecture
is flawed, but luckily there isn't too much work to fix this by
finishing the port of Http to the Networking TS.

The Internet of Things tends to involve lots of small somewhat
underpowered internet devices where C++ is likely going to play a
much more important role than historically in embedded systems. The
chances are that any Networking TS in C++ will be very extensively
used in such IoT applications. If there were also a high quality set
of extensions adding HTTP support to the Networking TS, I think your
chances are good that your Http library would be seriously considered
by WG21 for standardisation.

*This* is why I very strongly believe that Http needs to cleave
itself tightly to the Networking TS reference implementation, and not
to Boost.ASIO. Moreover, if you make Http an extension of the
Networking TS, you get Boost support with that for free - Http
becomes dual use.

> About C++03 support, it's not required for Boost acceptance, so I can add
> later. I want to increase tests coverage a lot before porting code to
> C++03, so I don't introduce accidental bugs in the process.

As a strong advocate for C++ 11 over 03 I can understand. However,
IoT developers tend to be stuck on some pretty ancient toolsets for
longer than others. 03 support I think would greatly increase your
potential userbase.

> In the case the abstractions I need enter on the C++ standard, maybe your
> APIbind will help me.

I actually would advise you to choose ASIO's STL abstraction
machinery over my own alternative APIBind in this instance.

> But I just don't find any reference to BOOST_ASIO_HAS_STD_SYSTEM_ERROR
> outside of config.hpp. Even if it was used, it is undocumented, which means
> it is an implementation detail that won't be guaranteed to be there forever
> (accidental behaviour versus a documented guarantee).

I'm pretty sure that implementation detail won't change in a hurry.
STL abstraction machinery changes as quickly as a STL does i.e. every
decade or so.

> > https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook#a7.SAFETY:S
> > tronglyconsideranightlyorweeklyinputfuzzautomatedtestifyourlibraryisab
> > letoacceptuntrustedinput
>
> That's why this first implementation uses a proven-parser that is very well
> tested.

The buck always stops with the top most layer, not internally used
third party libraries.

You NEED to fuzz test untrusted inputs when parsing HTTP. For all you
know, your STL implementation could be the thing with the overflow
bug, or the way in which you use it.

> > * I don't understand why you cannot issue more than one async read at
> > a time nor async write at a time. In something like pipelined HTTP
> > that seems like a very common pattern. I smell a potential design
> > flaw, and while the docs mention a "queue_socket" I can't see such a
> > thing in the reference section.
> >
>
> Too much synchronization and buffering.
>
> The pipeline has the following requests:
>
> A => B => C
>
> If I were to allow replies to request B, the design would be more complex,
> increasing the difficult to implement alternative backends and everyone's
> life. Also, the reply to request B cannot actually be sent while the reply
> to request A is on hold. This means that the whole response to request B
> would need buffering and if response B is a video live stream, this means
> an implicit (the user cannot know) crash after an unfortunate memory
> exhaustion. I couldn't claim a **lightweight** server if this excessive
> buffering was to occur. I couldn't claim the project is appropriate for
> embedded devices if I had chosen the buffering approach. It's a trade-off.
> I should write about it in the "design choice" of the documentation.
>
> HTTP/2.0 does have proper multiplexing and it can be more useful to respond
> several requests without "head of line blocking".

Ok, let's accept that HTTP before 2.0 can't multiplex (not my
experience personally, but I agree it's got big gotchas before 2.0).

You need a user facing API for Http which lets user code multiplex
where that is available, and not multiplex when that is not
available. In other words, identical code written against Http must
automatically be optimal in either connection case.

You're probably going to ask how I might implement that right? I
think my first guess is you need two completion handler layers and
therefore two completion dispatcher reactors: the bottom layer is the
same as ASIO's and uses the ASIO reactor, whilst the top layer which
users of Http sees is a reactor operated by Http and is disconnected,
though operated by, the ASIO reactor.

> A queue socket is a concept, I don't provide one. I can look into how make
> the documentation less confusing by adding a page exclusively dedicated to
> explain the problem it solves. Would that work for you?

It could be this queue socket concept of yours is exactly what I just
proposed as two completion handler and reactor layers. In either
case, I don't think queue sockets should be a concept, I think they
need to be the core of your library's user facing API.

If power users wish to bypass that user facing API layer for less
hand holding and less latency, that should also be possible. But I
think HTTP multiplexing should be assumed to be the default starting
point for Http applications. Once HTTP 2.0 is more common, it will
seem madness that Http's user facing API is _not_ 100% multiplexing.

> * Your tutorial examples use "using namespace std;" at global level.
> > That's a showstopper of bad practice, and you need to purge all of
> > those. Same for "using namespace boost;" at global level.
> >
>
> The tutorials should be as small as possible to not scary the users.
>
> But I can put a comment "you should avoid using namespace". Does it help?

using namespace is fine at local scope. It's not fine at global
scope.

Replacing all global using namespace with local using namespace is
all you need to do.

> * Your reference section is one very long thin list. You might want
> > to use more horizontal space.
> >
>
> I'd be happy already if I knew hot to remove the TOC from the reference
> page. It'd be much more clean already.

Personally in AFIO I hack the CSS to make the list into columns. ASIO
employs a table. There are many solutions.

> * I personally found the design rationale not useful. I specifically
> > want to know:
> >
> > 1. Why did Http choose ASIO as my base over leading competitor X
> > and Y?
> > 2. What sacrifices occurred due to that choice? i.e. things I'd
> > love weren't the case.
> >
>
> I assume I don't need to reply these questions during the review, given the
> reviewers may already be experienced on the answers.

Well ... I'm not sure that's actually the case.

It's not we don't have opinions on the answers, it's whether our
answers are different to your answers. You'll see this during the
AFIO review later this month - I have some really weird opinions no
one else shares like choosing my own custom lightweight monadic
future-promise implementation over async_result. That will be the
source of much criticism I'm sure.

> 3. What alternatives may be better suited for the user examining
> > Http for suitability and why? i.e. make your documentation useful to
> > people with a problem to solve, not just a howto manual.
> >
>
> I don't quite understand this point.

What I'm really asking is for details of when before starting Http
you reviewed the options available and the designs they used, and
exactly why you chose the design choices in Http you did with
reference and comparision to prior art.

Like a literature review, but for programming libraries. Does this
make sense? It's partially for us to help us understand your choices,
but also to help those examining Http to see if it's useful to them
to figure out the best solution to their problem (which may be to use
an alternative to Http).

Documentation which is useful over documentation which is reference.

> * Benchmarks, especially latency ones which are important for many
> > uses of HTTP are missing. I'd particularly like to know confidence
> > intervals on the statistical distribution so I know worst case
> > outcomes.
> >
>
> These values will change when I replace the parser used now. Also, they can
> change based on the buffer sizes and parser options, which the user is/will
> be able to change.
>
> Nevertheless, I do want to do some serious benchmarking, specially because
> it'll give me a reference point when I write my own parser and want to
> compare performance speedup. I can do some poor benchmark this weekend and
> show the results.

That would be interesting, but don't go crazy on it.

Also, please do enable the undefined behaviour sanitiser permanently
including in release mode and with full stack smashing protection
before running benchmarks. Anything parsing malicious input should
have all those turned on where available.

> I hope all the above isn't too disheartening Vinícius.
>
> Not at all. I think you're the second person to dedicate more time
> reviewing my work and this isn't disheartening at all. Just that your
> feedback is usually around tooling and not the design itself (everybody has
> its preferences).

That's because I like the design a lot and have little bad to say
about it.

I've also been subscribed to your github since the beginning, and I
have been reviewing your code on and off as you wrote it. You're a
talented programmer.

> > Jamfile for the main library build,
>
> Consider this done. I won't integrate into Boost without removing CMake
> first. This was already partially stated in the Bjorn's announcement (but
> somehow implicit, to be fair).

I have no problem at all with dual build systems if you're happy
keeping both maintained.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk