Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet
From: Peter Dimov (lists_at_[hidden])
Date: 2015-10-09 11:41:58

Next message: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Previous message: Peter Dimov: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
In reply to: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Reply: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Reply: Artyom Beilis: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"

Andrey Semashev wrote:

> WTF-8 and CESU-8 are not UTF-8 but different encodings. Dealing with them
> should be the user's explicit choice (e.g. the user should write
> utf16_to_wtf8 instead of utf16_to_utf8).

In addition to what I wrote earlier, the choices here are not representable
in a single U or W letter. When taking UTF-8, you need to decide whether to

- accept codepoints over 10FFFF
- accept codepoints encoded with more bytes than necessary
- accept surrogates
- probably more because Unicode is hard

and then for each rejected byte sequence whether to

- throw
- ignore and skip
- replace with U+FFFD

Next message: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Previous message: Peter Dimov: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
In reply to: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Reply: Andrey Semashev: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"
Reply: Artyom Beilis: "Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk