Boost logo

Boost Users :

Subject: Re: [Boost-users] Inconsistent unicode encoding between boost and wx on mac osx
From: Sachin Garg (schngrg_at_[hidden])
Date: 2010-02-13 16:44:21


On Sun, Feb 14, 2010 at 3:06 AM, Lars Viklund <zao_at_[hidden]> wrote:
> On Sun, Feb 14, 2010 at 02:50:43AM +0530, Sachin Garg wrote:
>> My project uses both boost and wxwidgets and unicode encoding by both
>> is different on Mac OSX. Everything works fine on windows.
>>
>> Problem: Boost and WX do end up encoding the strings differently when
>> converting to unicode on OSX. I am detailing an example:
>>
>> WX's encoding is same on both windows and osx but Boost's encoding is
>> different on both platforms. It is probably not a bug but I am unable
>> to figure out the reason and how to make them both work together.
>> Hex dumps of Unicode encodings of this string
>
> Unicode has a bunch of different Normalization Forms [1].
> A normalization form tells how diacritics and composite codepoints
> should be composed or decomposed when represented.
>
> The choice of NF is up to the OS, most importantly, OSX and Windows does
> it differently. The encoding of your strings seems to be the same,
> they're just composed differently.
>
> Boost likely uses OS functions to convert between encodings while I
> assume that WX uses its own internally consistent transcoding.
>
> [1] http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

Thanks, this explains a lot. Is there some std/boost way to specify
which encoding/normalization to use? Or to find out which encoding
boost defaults to?

I will bring this up on WX list too, but in case there is no 'correct'
way to decide which encoding to use, I will still need to make them
compatible to make my software work.

SG


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net