Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2017-06-12 09:29:21


On Mon, Jun 12, 2017 at 12:20 PM, Groke, Paul via Boost <
boost_at_[hidden]> wrote:

> I know modified UTF-8 is (can be) invalid UTF-8, that's why I asked. I
> think it could make sense to support it anyway though. Round tripping
> (strictly invalid, but possible) file names on Windows, easier
> interoperability with stuff like JNI, ...
>

Don't you mean WTF-8 then? AFAIK "Modified UTF-8" is UTF-8 that encodes the
null character with an overlong sequence, and thus is incompatible with
standard UTF-8, unlike WTF-8 which is a compatible extension.

> OTOH it would add overhead for systems with native UTF-8 APIs, because
> Nowide would at least have to check every string for "modified UTF-8
> encoded" surrogate pairs and convert the string if necessary. Which of
> course is a good argument for not supporting modified UTF-8, because then
> Nowide could just pass the strings through unmodified on those systems.
>

Implementing WTF-8 removes a check in UTF-8 → UTF-16 conversion, and
doesn't change anything in the reverse direction when there is a valid
UTF-16. I suspect it isn't slower.

-- 
Yakov Galka
http://stannum.co.il/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk