Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-05-01 14:44:19


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
> On 30/04/2011 18:45, Vladimir Prus wrote:
> >> On 26/04/2011 11:17, Sebastian Redl wrote:
> >>
> >>> GCC has options to control both the source (-finput-charset) and the
> >>> execution character set (-fexec-charset). They both default to UTF-8.
> >>> However, MSVC is more complicated. It will try to auto-detect the source
> >>> character set, but while it can detect UTF-16, it will treat everything
> >>> else as the system narrow encoding (usually a Windows-xxxx codepage)
> >>> unless the file starts with a UTF-8-encoded BOM. The worse problem is
> >>> that, except for a very new, poorly documented, and probably
> >>> experimental pragma, there is *no way* to change MSVC's execution
> >>> character set away from the system narrow encoding.
> >>
> >> A long time ago, I asked Vladimir Prus to help me add an option to
> >> Boost.Build that would allow to automatically prepend the BOM to source
> >> files when using MSVC, but unfortunately he was never able to help me do
> >> this.
> >
> > Well, if you have a command that can prepend BOM to a file, you can
> > easily modify 'actions compile-c-c++' in msvc.jam to run that command.
>
> It would be nice if I could only do this when the source files have been
> tagged as utf-8 or something like that.
>

Few points:

1. -fexec-charset in MSVC can be simulated with

   #pragma setlocale(".XXXX") where XXXX is the codepage.

   However 65001 (UTF-8) can't be used!

2. -finput-charset can be either defined by the same setlocale pragma
   and can't be 65001 (UTF-8) as well, and it can be UTF-8 if you
   add BOM.

   But in fact BOM is needed for files that contain non-ASCII characters.

But the bigger question is what exactly do you want to do with BOM
and how it would help you to make the "cross-platform" software?

If you write for MSVC add BOM in first place, if you work for
cross platform/compiler software MSVC incompatibility with
the rest of the world would actually make it impossible
to use UTF-8 in cross platform way because the only
real Unicode strings with MSVC would be L"" and they
are actually would be encoded with UTF-16 encoding
while all non-Windows world uses UTF-32 as wide character
encodings.

So basically I can say that untill Microsoft Visual Studio
team would take UTF-8 seriously and either support 65001
codepage as expected or provide GCC's like options
for input and exec encodings I don't see how
this BOM would be useful.

Does anybody know how to open a bug or feature request
for MSVC? Such that MSVC11 /201[^0] would support it
properly?

My $0.02

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk