Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: degski (degski_at_[hidden])
Date: 2017-06-13 06:20:23

Next message: Andrzej Krzemienski: "Re: [boost] Noexcept"
Previous message: Emil Dotchevski: "Re: [boost] Noexcept"
In reply to: Peter Dimov: "Re: [boost] [review] Review of Nowide (Unicode) starts today"
Next in thread: Artyom Beilis: "Re: [boost] [review] Review of Nowide (Unicode) starts today"

On 12 June 2017 at 17:57, Peter Dimov via Boost <boost_at_[hidden]>
wrote:

> degski wrote:
>
> Question: "Shouldn't the passing of invalid UTF-8/16 sequences be defined
>> as UB?"
>>
>
> Of course not. Why would one need to use the library then? It defeats the
> whole purpose of it.

>From WP (read up on it now): "RFC 3629 states "Implementations of the
decoding algorithm MUST protect against decoding invalid sequences."[13]
<https://en.wikipedia.org/wiki/UTF-8#cite_note-rfc3629-13> *The Unicode
Standard* requires decoders to "...treat any ill-formed code unit sequence
as an error condition. This guarantees that it will neither interpret nor
emit an ill-formed code unit sequence.""

So not UB then, but it should not pass either.

Are we talking FAT32 or NTFS? What Windows verions are affected? I also
think, as some posters below (and in another thread) state, that Windows
should not be treated differently. A new boost library should not
accomodate bad/sloppy windows' historic quirks. The library *can* require
that's it's use depends on the system and its' users adhere to the standard.

Then WP on Overlong encodings: "The standard specifies that the correct
encoding of a code point use only the minimum number of bytes required to
hold the significant bits of the code point. Longer encodings are called
*overlong* and are not valid UTF-8 representations of the code point. This
rule maintains a one-to-one correspondence between code points and their
valid encodings, so that there is a unique valid encoding for each code
point."

The key being: "... are not valid UTF-8 representations ...", i.e. we're
back to the case above.

degski

WP: https://en.wikipedia.org/wiki/UTF-8

-- 
"*Ihre sogenannte Religion wirkt bloÃŸ wie ein Opiat reizend, betÃ¤ubend,
Schmerzen aus SchwÃ¤che stillend.*" - Novalis 1798

Next message: Andrzej Krzemienski: "Re: [boost] Noexcept"
Previous message: Emil Dotchevski: "Re: [boost] Noexcept"
In reply to: Peter Dimov: "Re: [boost] [review] Review of Nowide (Unicode) starts today"
Next in thread: Artyom Beilis: "Re: [boost] [review] Review of Nowide (Unicode) starts today"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk