Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-26 08:27:06


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]> > > On 26/04/2011 11:17, Sebastian Redl wrote: > > > GCC has options to control both the source (-finput-charset) and the > > execution character set (-fexec-charset). They both default to UTF-8. > > However, MSVC is more complicated. It will try to auto-detect the source > > character set, but while it can detect UTF-16, it will treat everything > > else as the system narrow encoding (usually a Windows-xxxx codepage) > > unless the file starts with a UTF-8-encoded BOM. The worse problem is > > that, except for a very new, poorly documented, and probably > > experimental pragma, there is *no way* to change MSVC's execution > > character set away from the system narrow encoding. > > A long time ago, I asked Vladimir Prus to help me add an option to > Boost.Build that would allow to automatically prepend the BOM > to source files when using MSVC, but unfortunately he was never able to help >me do this. > The problem even if the source is UTF-8 with BOM "שלום" would be encoded according to locale's 8bit codepage like 1255 or 936 and not UTF-8 string (codepage 65001). It is rather stupid, but this is how MSVC works or understands the place of UTF-8 in this world. Unicode and Visual Studio is just broken... Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk