Boost logo

Boost :

Subject: Re: [boost] [beast] Formal review
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2017-07-11 00:50:17


On Mon, Jul 10, 2017 at 5:34 PM, Phil Endecott via Boost
<boost_at_[hidden]> wrote:
> ...if I'm right that it is undefined behaviour then it might
> stop working with some future compiler update somewhere.
>
>> The reinterpret_cast<> can be trivially changed to std::memcpy:
>> ...
> Yes, I believe that's the right thing to do.

That hurts 32-bit ARM. So I am faced with a choice, penalize an
existing platform today to benefit some possible future platform?
Hmmmm......let me think about that....

...probably not such a good idea! And the 32-bit ARM users might
revolt with such a change. I've heard from a couple of users running
Beast on constrained hardware.

> Note that this is only about 50 lines of code; Beast's utf8_checker.hpp is
> maybe 5 times as long. Code follows.

Nice, this is great! I like where you are headed with your function
and thanks for investing the time to write it. Perhaps the Beast utf8
validator could be improved, there's nothing more satisfying than
removing lines of code!

There's just an eensy teensy problem, the Beast validator is an
"online" algorithm. It works with chunks of the entire input sequence
at a time, sequentially, so there could be a code point that is split
across the buffer boundary. All of that extra code you see in Beast is
designed to handle that case, by saving the bytes at the end for when
it gets called again (after the validator returns it will never see
that buffer again).

I admit that there is surprisingly large amount of code required just
to handle this case. The good news is that those extra lines only
execute in the special case where the code point is split. The bulk of
the loop works on the parts of the buffer where code points can't
possibly be split. And the unit test is exhaustive, it tries all
possible code points and split positions.

But who knows? I have never claimed to be a great coder, I consider
myself average at best so its entirely possible that this could all be
done in far fewer lines. Maybe you can update your function to handle
this case? I am always happy to accept improvements into Beast. You
might need to turn the function into a class to save those bytes at
the end.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk