Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost'sbrokenUTF-8codecvt facet
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2015-10-09 17:16:57


----- Original Message -----

> From: Peter Dimov <lists_at_[hidden]>
> To: boost_at_[hidden]
> Cc:
> Sent: Friday, October 9, 2015 11:40 PM
> Subject: Re: [boost] [nowide] Library Updates and Boost'sbrokenUTF-8codecvt facet
>
> Artyom Beilis wrote:
>
>> Codepoints above 10FFFF is like lets assume Pi=3.15..
>
> No, sorry. This is not at all the same. The reason we're in this mess is
> precisely because codepoints above 0xFFFF were like pi=3.15. And then it
> turned out they weren't.
>

Yeah but for UTF-16 it is over you can't go past it ;-)

>
>> > - probably more because Unicode is hard
>>
>> Unicode isn't hard - it is just treated with ignorance by even big
>> organization not talking about average programmers.
>
> What I meant by that is for instance
>
> - is 0xCC 0x81 a valid UTF-8 string?
> - is 0x65 0xCC 0x81 0xCC 0x81 a valid UTF-8 string?
>

Both are valid strings.. and both are meaningless on their own i.e. accent without
letter or two same accents.

Being illogical in human terms or representation does not make them UTF-8 illegal.

UTF-8 is simple, human language processing is complex.

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk