Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-01-20 17:15:17
On Thu, Jan 20, 2011 at 1:22 PM, Sergey Cheban <s.cheban_at_[hidden]> wrote:
> The problems with MSVC and multilingual filenames are not boost-related.
> Even the following code don't work correctly:
> #include <stdio.h>
> int main( int argc, char *argv)
> printf("%s", argv);
> return 0;
> >1.exe asdfÑÑÐ²Ð°
You lost me. That example has nothing to do with filenames.
> As you can see, the cyrillic characters are broken (this is an ANSI vs OEM
> issue and is not related to the unicode at all).
> Please note that the cygwin compiler/libc has no such problems because it
> uses utf-8 (by default, at least). The fopen() uses the utf-8 for filenames,
> So, we may choose one of the following:
> 1. Wait until MS fixes the problem on their side. For now, the windows
> users may use the short filenames (i.e. GetShortPathName() ) for the
> multilingual filenames.
> 2. Provide a char * interface that will allow the windows developers to
> work with multilingual filenames.
> 3. Provide WCHAR * interface specially for the windows developers and allow
> them to write the non-portable code. Leave the char * interface unusable for
> windows/msvc and wait until MS fixes it on their side.
> 4. Create the almost-portable wchar_t * interface.
> 5. Create our own type (boost::native_t or boost::utf8_t) and conversion
> routines for it. Please note that independent libraries will NEVER use
> foreign non-standard types.
> I think only 2nd and 3rd options are realistic.
Why not just use Boost.Filesystem V3 for dealing with files and filenames?
You can work with char strings in the encoding of your choice, including
utf-8 encoding. You can use wchar_t strings in utf-16 encoding. If your
compiler supports C++0x char16_t and char_32t, you will be able to also use
strings based on those as C++0x support matures. Class
boost::filesystem::path provides a single non-template class that works fine
with all of those types and encodings. Your code can be written to be
reasonably portable too, particularly if all you are concerned with is
either Windows systems or POSIX-like systems that use utf-8 for filenames.
If you want wider portability, you would have to avoid narrow strings so
that on POSIX-like systems the wide strings could be converted to whatever
narrow encoding the system uses.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk