Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: degski (degski_at_[hidden])
Date: 2017-06-13 06:20:23
On 12 June 2017 at 17:57, Peter Dimov via Boost <boost_at_[hidden]>
> degski wrote:
> Question: "Shouldn't the passing of invalid UTF-8/16 sequences be defined
>> as UB?"
> Of course not. Why would one need to use the library then? It defeats the
> whole purpose of it.
>From WP (read up on it now): "RFC 3629 states "Implementations of the
decoding algorithm MUST protect against decoding invalid sequences."
<https://en.wikipedia.org/wiki/UTF-8#cite_note-rfc3629-13> *The Unicode
Standard* requires decoders to "...treat any ill-formed code unit sequence
as an error condition. This guarantees that it will neither interpret nor
emit an ill-formed code unit sequence.""
So not UB then, but it should not pass either.
Are we talking FAT32 or NTFS? What Windows verions are affected? I also
think, as some posters below (and in another thread) state, that Windows
should not be treated differently. A new boost library should not
accomodate bad/sloppy windows' historic quirks. The library *can* require
that's it's use depends on the system and its' users adhere to the standard.
Then WP on Overlong encodings: "The standard specifies that the correct
encoding of a code point use only the minimum number of bytes required to
hold the significant bits of the code point. Longer encodings are called
*overlong* and are not valid UTF-8 representations of the code point. This
rule maintains a one-to-one correspondence between code points and their
valid encodings, so that there is a unique valid encoding for each code
The key being: "... are not valid UTF-8 representations ...", i.e. we're
back to the case above.
-- "*Ihre sogenannte Religion wirkt bloÃ wie ein Opiat reizend, betÃ¤ubend, Schmerzen aus SchwÃ¤che stillend.*" - Novalis 1798
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk