Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Groke, Paul (paul.groke_at_[hidden])
Date: 2017-06-14 08:12:11


Peter Dimov wrote:
> Others have made their case for WTF-8, which has the desirable property of
> only allowing one encoding for any uint16_t sequence (taken in isolation).
>
> My personal choice here is to be more lenient and accept any combination of
> valid UTF-8 and UTF-8 encoded surrogates (a superset of CESU-8 and WTF-8),
> but I'm not going to argue very strongly for it over WTF-8, if at all.

I have to admit that I was confusing WTF-8 and CESU-8. When converting from wide to narrow I would favor CESU-8, mostly because it makes the rules simpler and creates less special cases (e.g. when concatenating).

When converting from narrow to wide I agree that accepting any UTF-8/WTF-8/CESU-8 mix would be good.

> I think that overlongs should NOT be accepted, including overlong zero.

Agreed.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk