Boost logo

Boost :

Subject: Re: [boost] [beast] Formal review
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2017-07-10 21:23:27


On Mon, Jul 10, 2017 at 2:06 PM, Artyom Beilis via Boost
<boost_at_[hidden]> wrote:
> It looks for me you are running into premature optimization.

Hmm, no, I don't think so. Correct utf8 validation is a bottleneck for
every websocket program, I have used a profiler so this comes from
measurement not opinion.

It looks like you are using source inputs that contain high-ascii
characters. In this case, Beast switches to the "slow" algorithm which
is similar to what Locale does. Try using an input file that consists
only of low-ASCII characters.

Your results are quite different from mine, even with std::memcpy:

beast: 1,124,515,969 char/s
beast: 1,336,074,093 char/s
beast: 1,494,183,562 char/s
beast: 1,506,365,044 char/s
beast: 1,533,419,187 char/s
locale: 75,457,683 char/s
locale: 81,358,140 char/s
locale: 80,413,657 char/s
locale: 81,635,114 char/s
locale: 67,234,619 char/s

Ubuntu VM:
beast.benchmarks.utf8_checker
beast: 2894806032 char/s
beast: 2874126708 char/s
beast: 2890616214 char/s
beast: 2017890885 char/s
beast: 2785087614 char/s
locale: 574731777 char/s
locale: 571439694 char/s
locale: 242245477 char/s
locale: 511534158 char/s
locale: 574121386 char/s

Travis
<https://travis-ci.org/vinniefalco/Beast/jobs/252155928#L1334>
beast: 1155900653 char/s
beast: 1146058480 char/s
beast: 1162309551 char/s
beast: 1151093660 char/s
beast: 1159334387 char/s
locale: 218684840 char/s
locale: 220357048 char/s
locale: 208476005 char/s
locale: 224853783 char/s
locale: 209990002 char/s

On every machine I try, locale performs more poorly on all-low-ascii
inputs by at least a factor of 5.

Code:
<https://github.com/vinniefalco/Beast/blob/da7946b6e5f8bda225ff122984e945b9e088a196/test/benchmarks/utf8_checker.cpp#L78>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk