Boost logo

Boost :

Subject: [boost] My Beast Review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2017-07-09 18:45:38


Dear All,

This review is not very thorough - I've only looked at the docs and
some small parts of the code, and read the discussion on the list. I've
not tried to use the library. But I think I've seen enough to express an
opinion.

I've never used ASIO. I've studied it in the past, and concluded that
it offers portability to platforms that I don't use but doesn't work
well with other platforms that I do use. It also seems more complicated
than I would have liked, perhaps in order to achieve some optimised
level of performance that I don't need (or perhaps for no good reason).
I have however quite often used HTTP (hasn't everyone?) and I think it's
an embarrassment that std::c++ can't download something from the 'net
with one or two lines of code in the way that almost every other language
can. So, my first reaction is that it's a disappointment to see that a
proposed HTTP library is tied to ASIO.

On further investigation, however, I think this is no great loss to me
as the functionality that this library provides is so limited in scope.
Parsing HTTP headers, for example, is not much work. Even my own
hacked-together-in-an-afternoon HTTP classes do more than this - to
mention just one thing, parsing and formatting of dates in the correct
format. Everyone who uses this library will have to implement this sort
of thing themselves on top of it. (And many of them will implement it
wrongly.)

Vinnie says that he hopes that more functionality will be layered on
top of this work. The danger is that the person who comes along wanting
to implement the other 75% will not like Vinnie's starting point, and
re-implement it all in their own style. (I know I would.)

Unless, of course, Vinnie is going to offer this "other 75%" himself
eventually. *That would be great*. If that is the plan, my review is
"this is a fine start but please remove the ASIO dependency and come
back when it's finished".

But I think Vinnie has said that the library is to be reviewed "as is".
In that case, my review is "just add this to ASIO and call it boost.asio.http
and boost.asio.websocket, no need for a review" (assuming that the ASIO
developer is happy with that).

Regarding what I have seen of the implementation:

- Security must be one of the most important properties of an HTTP
library, and I'd really hope to find that such a library had been
implemented by someone with significant experience in this area.
I never cease to be amazed by the ingenuity of attackers.

- I fear that there may be much premature optimisation in the parser
code that I've looked at. For example:

    static
    bool
    is_digit(char c)
    {
        return static_cast<unsigned char>(c-'0') < 10;
    }

Is that right? Say c='!' (ASCII 33). '0' is ASCII 48. So c-'0' is -15.
But you cast that to an unsigned char, so -15 = 241, and 241 > 10, so the
function returns false. So it is correct - but (1) it needs a comment to
explain it, (2) it's the sort of thing that makes me worry from a security
viewpoint, and (3) is it actually an improvement on the more obvious
c >= '0' && c <= '9'? Well I've done a quick test and I get the same assembler
for both - so why bother?

(Except on close inspection actually I don't get exactly the same assembler.
To get the same assembler I have to replace the constant 10 in your code with
10U, which changes the final comparison from signed to unsigned. I don't think
that matters, but it again makes me wonder whether this security-critical code
is ideal.)

But hey, why even do c >= '0' && c <= '9' when we have std::isdigit()? But
it would be much slower, right? Wrong, it's actually one instruction shorter;
somehow it manages to avoid one of the zero-extension instructions. So this is
not a premature optimisation, it's a premature pesimisation. (My test code is
at the end of this message.)

Now, is_digit is just about the simplest thing in the parser; Vinnie has
also implemented his own code for parsing numbers, and then we come to
parse_field. Please have a look at this (around line 666 of basic_parser.hpp).
Note that it calls find_fast() near the top of that file, which uses some
Intel SIMD instruction to, I think, skip forward over the header field name,
and then over the field value to the CRLF. Not that there are any comments in
find_fast to explain what it does, for those of us who don't know what
_mm_loadu_si128 and _mm_cmpestri do. Has this code actually been
benchmarked against a much simpler and more obviously-correct version, and
found to have a worthwhile performance benefit?

(I believe that some of the discussion has used this "optimised parser" as
a justification for not making the code more generic, i.e. to work with
any iterator pairs. I believe that to be flawed, since (a) I don't think
this optimisation is useful, and (b) even if it were, the code could
use it for any contiguous iterators, and fall back to "unoptimised" code
in other cases. I think that having the "unoptimised" code present would
be useful for documenting what the parser is trying to do, in any case.)

Could it be that the reason why the scope of the library seems so limited
is that Vinnie has spent his time fiddling with this optimisation, rather
than adding useful features?

My opinion: having looked at the implementation, reject.

Test for is_digit:

#include <cctype>

bool is_digit_1(char c)
{
  return static_cast<unsigned char>(c-'0') < 10U;
}

bool is_digit_2(char c)
{
  return c >= '0' && c <= '9';
}

bool is_digit_3(char c)
{
  return std::isdigit(c);
}

Aarch64 assembler output from g++ 6.3.0 -O3, with the uninteresting bits
removed:

_Z10is_digit_1c:
        uxtb w0, w0
        sub w0, w0, #48
        uxtb w0, w0
        cmp w0, 9
        cset w0, ls
        ret

_Z10is_digit_2c:
        uxtb w0, w0
        sub w0, w0, #48
        uxtb w0, w0
        cmp w0, 9
        cset w0, ls
        ret

_Z10is_digit_3c:
        uxtb w0, w0
        sub w0, w0, #48
        cmp w0, 9
        cset w0, ls
        ret

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk