Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-05-02 02:47:43


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]> > > On 01/05/2011 20:44, Artyom wrote: > > > But the bigger question is what exactly do you want to do with BOM > > and how it would help you to make the "cross-platform" software? > > The goal is to allow all compilers to recognize that the source is encoded in >UTF-8. > This is what you need to write cross-platform source that contains non-ASCII >characters. > It is not enough. You can't do it in cross platform way properly as you can't currently get UTF-8 or UTF-16 or UTF-32 string literal properly for cross platform code till all compilers will support C++0x u/U/u8 literals and at this point NONE of the existing popular compilers support them (checked MSVC, GCC, Intel, SunCC) > > > the only > > real Unicode strings with MSVC would be L"" and they > > are actually would be encoded with UTF-16 encoding > > while all non-Windows world uses UTF-32 as wide character > > encodings. > > How is that a problem at all? > > And using narrow string literals with UTF-8 content > masquerading as ANSI is a hack, sorry. > That's not the C++-endorsed solution. > First of all ANSI codepage exists only on Windows and has nothing to do with cross platform software. C++ standard does not know what is "ANSI" encodings. > > > So basically I can say that untill Microsoft Visual Studio > > team would take UTF-8 seriously and either support 65001 > > codepage as expected or provide GCC's like options > > for input and exec encodings I don't see how > > this BOM would be useful. > > I don't really care about what the execution character set is. > I definitely do not want to change it, it should be the user locale. > No, you never want to be it in user's locale because it makes compilation locale dependent! Because source.cpp / With UTF-8 BOM -------------------------------- std::string test="שלום-سلام-Мир" In Israel it would be "שלום-???-????" in CP1255 In Egypt it would be "????-سلام-???" in CP1256 In Russia it would be "????-????-Мир" in CP1251 In France it would be "????-???-???" in CP1252 So no, you always want to have execution character set to be well defined unless all your sources are written using US-ASCII which is a subset of all character sets. Artyom Beilis.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk