Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Peter Dimov (lists_at_[hidden])
Date: 2017-06-13 23:33:48


Artyom Beilis wrote:

> Now as you have seen there are many possible "non-standard" UTF-8
> variants.
>
> What should I accept?

Others have made their case for WTF-8, which has the desirable property of
only allowing one encoding for any uint16_t sequence (taken in isolation).

My personal choice here is to be more lenient and accept any combination of
valid UTF-8 and UTF-8 encoded surrogates (a superset of CESU-8 and WTF-8),
but I'm not going to argue very strongly for it over WTF-8, if at all.

I think that overlongs should NOT be accepted, including overlong zero.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk