Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-05-02 02:47:43
> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
> On 01/05/2011 20:44, Artyom wrote:
> > But the bigger question is what exactly do you want to do with BOM
> > and how it would help you to make the "cross-platform" software?
> The goal is to allow all compilers to recognize that the source is encoded in
> This is what you need to write cross-platform source that contains non-ASCII
It is not enough.
You can't do it in cross platform way properly as you
can't currently get UTF-8 or UTF-16 or UTF-32 string
literal properly for cross platform code till all
compilers will support C++0x u/U/u8 literals
and at this point NONE of the existing popular compilers
support them (checked MSVC, GCC, Intel, SunCC)
> > the only
> > real Unicode strings with MSVC would be L"" and they
> > are actually would be encoded with UTF-16 encoding
> > while all non-Windows world uses UTF-32 as wide character
> > encodings.
> How is that a problem at all?
> And using narrow string literals with UTF-8 content
> masquerading as ANSI is a hack, sorry.
> That's not the C++-endorsed solution.
First of all ANSI codepage exists only on Windows
and has nothing to do with cross platform software.
C++ standard does not know what is "ANSI" encodings.
> > So basically I can say that untill Microsoft Visual Studio
> > team would take UTF-8 seriously and either support 65001
> > codepage as expected or provide GCC's like options
> > for input and exec encodings I don't see how
> > this BOM would be useful.
> I don't really care about what the execution character set is.
> I definitely do not want to change it, it should be the user locale.
No, you never want to be it in user's locale because it makes
compilation locale dependent! Because
source.cpp / With UTF-8 BOM
In Israel it would be "×©×××-???-????" in CP1255
In Egypt it would be "????-Ø³ÙØ§Ù
-???" in CP1256
In Russia it would be "????-????-ÐÐ¸Ñ" in CP1251
In France it would be "????-???-???" in CP1252
So no, you always want to have execution character set
to be well defined unless all your sources are
written using US-ASCII which is a subset of
all character sets.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk