Boost logo

Boost :

From: Alexander Grund (alexander.grund_at_[hidden])
Date: 2020-06-18 11:38:01


Am 18.06.20 um 13:10 schrieb Phil Endecott via Boost:
> Alexander Grund wrote:
>> I've seen other SIMD UTF-8 conversions around and they basically all
>> focus on ASCII converting as much as possible and fallback to
>> one-by-one decoding once a non-ascii is found
>
> The question is, do they do that because they've determined that
> that gives the best performance (for some benchmark input), or
> have they not tried to do more with the SIMD code?
I guess the former which would be my intuition. It is easy to detect the
first byte of a multi-byte UTF-8 sequence in SIMD and also easy to bulk
convert single-byte UTF-8 sequences. Once you get to converting the
multi-byte sequence then SIMD doesn't make sense anymore. To much
checking to do: How many bytes to "squash", end-of-input, shortest
value, legal value, ...
So summary: Once it requries branching it doesn't make sense to use SIMD
anymore.




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk