Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Ryou Ezoe (boostcpp_at_[hidden])
Date: 2011-04-26 08:41:23


On Tue, Apr 26, 2011 at 9:27 PM, Artyom <artyomtnk_at_[hidden]> wrote:
>> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
>>
>> On 26/04/2011 11:17, Sebastian Redl wrote:
>>
>> > GCC has options to  control both the source (-finput-charset) and the
>> > execution character  set (-fexec-charset). They both default to UTF-8.
>> > However, MSVC is more  complicated. It will try to auto-detect the source
>> > character set, but  while it can detect UTF-16, it will treat everything
>> > else as the system  narrow encoding (usually a Windows-xxxx codepage)
>> > unless the file starts  with a UTF-8-encoded BOM. The worse problem is
>> > that, except for a very  new, poorly documented, and probably
>> > experimental pragma, there is *no  way* to change MSVC's execution
>> > character set away from the system  narrow encoding.
>>
>> A long time ago, I asked Vladimir Prus to help me add an  option to
>> Boost.Build that would allow to automatically prepend the BOM
>> to  source files when using MSVC, but unfortunately he was never able to help
>>me do  this.
>>
>
>
> The problem even if the source is UTF-8 with BOM "שלום" would
> be encoded according to locale's 8bit codepage like 1255 or 936
> and not UTF-8 string (codepage 65001).
>
> It is rather stupid, but this is how MSVC works or understands
> the place of UTF-8 in this world.

It's not stupid.
It's because ANSI version of Win32 API expect these encodings.

To me, encoding of ordinary string literal use source file's encoding
is a stupid idea.

>
> Unicode and Visual Studio is just broken...
>
> Artyom
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
Ryou Ezoe

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk