Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Ryou Ezoe (boostcpp_at_[hidden])
Date: 2011-04-26 22:29:57


On Tue, Apr 26, 2011 at 9:27 PM, Artyom <artyomtnk_at_[hidden]> wrote:
>> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
>>
>> On 26/04/2011 11:17, Sebastian Redl wrote:
>>
>> > GCC has options to  control both the source (-finput-charset) and the
>> > execution character  set (-fexec-charset). They both default to UTF-8.
>> > However, MSVC is more  complicated. It will try to auto-detect the source
>> > character set, but  while it can detect UTF-16, it will treat everything
>> > else as the system  narrow encoding (usually a Windows-xxxx codepage)
>> > unless the file starts  with a UTF-8-encoded BOM. The worse problem is
>> > that, except for a very  new, poorly documented, and probably
>> > experimental pragma, there is *no  way* to change MSVC's execution
>> > character set away from the system  narrow encoding.
>>
>> A long time ago, I asked Vladimir Prus to help me add an  option to
>> Boost.Build that would allow to automatically prepend the BOM
>> to  source files when using MSVC, but unfortunately he was never able to help
>>me do  this.
>>
>
>
> The problem even if the source is UTF-8 with BOM "שלום" would
> be encoded according to locale's 8bit codepage like 1255 or 936
> and not UTF-8 string (codepage 65001).
>
> It is rather stupid, but this is how MSVC works or understands
> the place of UTF-8 in this world.
>
> Unicode and Visual Studio is just broken...

The real obstacle for localization is not a software which wasn't
programmed for a localization in mind.
We can replace hard coded text.(whether the source code is provided or not)
It's rather tedious but straight forward task.
Most program don't need a runtime language switch anyway.
Hard coded text is all right.

The real obstacle is ASCII.
If ASCII is used instead of UTF-8, UTF-16, or UTF-32, we have to use
ASCII compatible encoding.
In Windows and for Japanese, it's CP932(Microsoft variant of Shift-JIS).

In that case, we can't translate a program simply replacing the text.
Because Windows can't tell which encoding it is(it can be anything, we
can't detect it heuristically),
we have to explicitly specify it.

For example, an argument for every call of CreateFont API's fdwCharSet
parameter must be modified to SHIFTJIS_CHARSET.

In Windows, software should use UTF-16.
If a locale library expect ASCII input, even though it support wchar_t
output, I wonder how many people actually use it.

By using ASCII input, this library encourage to use ASCII.
Another obstacle for real world localization.

>
> Artyom
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
Ryou Ezoe

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk