Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: degski (degski_at_[hidden])
Date: 2017-06-13 06:20:23

On 12 June 2017 at 17:57, Peter Dimov via Boost <boost_at_[hidden]>

> degski wrote:
> Question: "Shouldn't the passing of invalid UTF-8/16 sequences be defined
>> as UB?"
> Of course not. Why would one need to use the library then? It defeats the
> whole purpose of it.

>From WP (read up on it now): "RFC 3629 states "Implementations of the
decoding algorithm MUST protect against decoding invalid sequences."[13]
<> *The Unicode
Standard* requires decoders to "...treat any ill-formed code unit sequence
as an error condition. This guarantees that it will neither interpret nor
emit an ill-formed code unit sequence.""

So not UB then, but it should not pass either.

Are we talking FAT32 or NTFS? What Windows verions are affected? I also
think, as some posters below (and in another thread) state, that Windows
should not be treated differently. A new boost library should not
accomodate bad/sloppy windows' historic quirks. The library *can* require
that's it's use depends on the system and its' users adhere to the standard.

Then WP on Overlong encodings: "The standard specifies that the correct
encoding of a code point use only the minimum number of bytes required to
hold the significant bits of the code point. Longer encodings are called
*overlong* and are not valid UTF-8 representations of the code point. This
rule maintains a one-to-one correspondence between code points and their
valid encodings, so that there is a unique valid encoding for each code

The key being: "... are not valid UTF-8 representations ...", i.e. we're
back to the case above.



"*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend,
Schmerzen aus Schwäche stillend.*" - Novalis 1798

Boost list run by bdawes at, gregod at, cpdaniel at, john at