Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Sergey Cheban (s.cheban_at_[hidden])
Date: 2011-01-20 13:22:15


19.01.2011 18:34, Alexander Lamaison wrote:

> Even if I bought the UTF-8ed-Boost idea, what would we do about the STL
> implementation on Windows which expects local-codepage narrow
strings? Are
> we hoping MS etc. change these to match? Because otherwise we'll be
> converting between narrow encodings for the rest of eternity.
The problems with MSVC and multilingual filenames are not boost-related.
Even the following code don't work correctly:

#include <stdio.h>
int main( int argc, char *argv[])
{
     printf("%s", argv[1]);
     return 0;
}

>1.exe asdfфыва
asdfЇ√тр

As you can see, the cyrillic characters are broken (this is an ANSI vs
OEM issue and is not related to the unicode at all).

Please note that the cygwin compiler/libc has no such problems because
it uses utf-8 (by default, at least). The fopen() uses the utf-8 for
filenames, too.

So, we may choose one of the following:

1. Wait until MS fixes the problem on their side. For now, the windows
users may use the short filenames (i.e. GetShortPathName() ) for the
multilingual filenames.

2. Provide a char * interface that will allow the windows developers to
work with multilingual filenames.

3. Provide WCHAR * interface specially for the windows developers and
allow them to write the non-portable code. Leave the char * interface
unusable for windows/msvc and wait until MS fixes it on their side.

4. Create the almost-portable wchar_t * interface.

5. Create our own type (boost::native_t or boost::utf8_t) and conversion
routines for it. Please note that independent libraries will NEVER use
foreign non-standard types.

I think only 2nd and 3rd options are realistic.

-- 
Best regards,
Sergey Cheban

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk