Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2017-06-12 18:12:23


On Mon, Jun 12, 2017 at 12:55 PM, Groke, Paul via Boost <
boost_at_[hidden]> wrote:

> Supporting modified UTF-8 or WTF-8 adds overhead on systems where the
> native OS API accepts UTF-8, but only strictly valid UTF-8.
> When some UTF-8 enabled function of the library is called on such a
> system, it would have to check for WTF-8 encoded surrogates and
> convert them to "true" UTF-8 before passing the string to the OS API.
> Because you would expect and want the "normal" UTF-8 encoding for
> a string to refer to the same file as the WTF-8 encoding of the same
> string.
>

That's the point: if the string is at all representable in UTF-8 then its
WTF-8 representation is already in a valid UTF-8 representation and no
conversion has to be done. Thus you don't even have to check anything at
all. This is how WTF-8 is 'more compatible' with UTF-8 than Modified UTF-8
is.

By analogy, you don't need to do special checks if you want to pass a UTF-8
string to an ASCII only API, because UTF-8 is a strict superset.

-- 
Yakov Galka
http://stannum.co.il/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk