|
Boost : |
Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-05-02 02:47:43
> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
>
> On 01/05/2011 20:44, Artyom wrote:
>
> > But the bigger question is what exactly do you want to do with BOM
> > and how it would help you to make the "cross-platform" software?
>
> The goal is to allow all compilers to recognize that the source is encoded in
>UTF-8.
> This is what you need to write cross-platform source that contains non-ASCII
>characters.
>
It is not enough.
You can't do it in cross platform way properly as you
can't currently get UTF-8 or UTF-16 or UTF-32 string
literal properly for cross platform code till all
compilers will support C++0x u/U/u8 literals
and at this point NONE of the existing popular compilers
support them (checked MSVC, GCC, Intel, SunCC)
>
> > the only
> > real Unicode strings with MSVC would be L"" and they
> > are actually would be encoded with UTF-16 encoding
> > while all non-Windows world uses UTF-32 as wide character
> > encodings.
>
> How is that a problem at all?
>
> And using narrow string literals with UTF-8 content
> masquerading as ANSI is a hack, sorry.
> That's not the C++-endorsed solution.
>
First of all ANSI codepage exists only on Windows
and has nothing to do with cross platform software.
C++ standard does not know what is "ANSI" encodings.
>
> > So basically I can say that untill Microsoft Visual Studio
> > team would take UTF-8 seriously and either support 65001
> > codepage as expected or provide GCC's like options
> > for input and exec encodings I don't see how
> > this BOM would be useful.
>
> I don't really care about what the execution character set is.
> I definitely do not want to change it, it should be the user locale.
>
No, you never want to be it in user's locale because it makes
compilation locale dependent! Because
source.cpp / With UTF-8 BOM
--------------------------------
std::string test="ש×××-سÙاÙ
-ÐиÑ"
In Israel it would be "ש×××-???-????" in CP1255
In Egypt it would be "????-سÙاÙ
-???" in CP1256
In Russia it would be "????-????-ÐиÑ" in CP1251
In France it would be "????-???-???" in CP1252
So no, you always want to have execution character set
to be well defined unless all your sources are
written using US-ASCII which is a subset of
all character sets.
Artyom Beilis.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk