Boost logo

Boost :

Subject: Re: [boost] [locale] [filesystem] Windows local 8 bit encoding
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2012-11-30 05:41:35


On Fri, Nov 30, 2012 at 12:29 PM, Ryo IGARASHI <rigarash_at_[hidden]> wrote:

> Hi Artyom,
>
> On Thu, Nov 29, 2012 at 10:45 PM, Artyom Beilis <artyomtnk_at_[hidden]>
> wrote:
> > If so there is no such a locale under windows that works with
> Shift_JIS...
>
> [...] See the reference information from Microsoft:
> http://support.microsoft.com/default.aspx?scid=kb;en-us;Q170559
> (Note that 'Shift JIS' in the above link means CP932)
>
> This means that in order to handle the Japanese string properly under
> Windows,
> the programmers are encouraged not to convert at all. [...]
>

As I understand from the page the problem of CP932 is that it has duplicate
code points, so a CP932 → UTF-8 → CP932 will result in, although binary
different, but semantically identical text.

I do not see a problem with this. So Unicode itself has *many more* ways to
encode the same thing, including, but not limited to, duplicate code points
and combining characters. And we are living with this fine for years. The
solution to this is using normalization if this *really* matters. And where
it matters (comparison, likely. What else?) you will be forced to normalize
your CP932 too...

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk