Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-14 10:27:49


> > -1 > > I'm opposed to this strategy simply because it differs from the way > existing libraries treat narrow strings. Not least the STL. If you open > an fstream with a narrow filename, for instance, this isn't treated as a > UTF-8 string. It's treated as being in the local codepage. > First of all, neither in C++/03 nor in C++0x you can open a file stream with wide file name. MSVC provides non-standard extension but it does not exist in other compilers like GCC/MinGW. So using C++ you can't open a file called: "שלום-سلام-pease-Мир.txt" under Microsoft Windows. You can use OS level API like _wfopen to do this job using wide string. But you can't to do this in C++. Period. The idea is following: 1. Provide replacement for system libraries that actually use text and relate to it as text in some encoding. For STL and standard C library it would be filesystem API. So you need to provide something like boost::filesystem::fstream 2. Make all boost libraries use Wide API only and never call ANSI API. 3. Treat narrow strings as UTF-8 and convert then to wide prior system calls. > > While this behaviour isn't great, it is standard. > If the standard it bad, leads to unportable and platform incompatible code it should not be used! You can always provide a fallback like boost::utf8_to_locale_encoding if you have to use ANSI API. But generally you should just use something like boost::utf8_to_utf16 and always call Wide API. You must not use ANSI API under Windows. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk