Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-26 07:38:11


On 25/04/2011 21:50, Ryou Ezoe wrote:
> On Tue, Apr 26, 2011 at 3:55 AM, Artyom<artyomtnk_at_[hidden]> wrote:
>>> From: Ryou Ezoe<boostcpp_at_[hidden]>
>>>
>>> Sort by code point is not the best solution.
>>> But at least, it's consistent if we use one encoding.
>>>
>>
>> No it is not, UCS encoding has different order
>> in different representations:
>>
>> UTF-8 and UTF-32 order is consistent i.e.
>>
>> for each a,b in utf8(a)< utf8(b) iff utf32(a)< utf32(b)
>>
>> However this is not correct for UTF-16 where codepoints
>> outside of BMP has different ordering. i.e.
>>
>> It may be that codepoint (a)> codepoint(b) but UTF-16(a) sorted before
>> UTF-16(b)
>
> What do you mean?
> No matter what UTF you use.
> Code point is same.
> You can't compare UTF-8 string by comparing each octet.

Actually, you can. And you should actually do it at the octet level for
efficiency.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk