Boost logo

Boost :

Subject: Re: [boost] "Best so far" for C level i/o of Unicode text with Windows console
From: Emil Dotchevski (emildotchevski_at_[hidden])
Date: 2011-11-03 19:43:21

On Thu, Nov 3, 2011 at 4:14 PM, Stephan T. Lavavej
<stl_at_[hidden]> wrote:
> [Alf P. Steinbach]
>> I found that the Visual C++ implementation of the C library i/o
>> generally does not support console input of international characters. It
>> can deal with narrow character input from the current codepage, if that
>> codepage is not UTF-8.
> Changing the console's codepage isn't the right magic. See
> With _O_U16TEXT, VC8+ can write Unicode to the console perfectly. However, I believe that input was broken up to and including VC10, and that it's been fixed in VC11.
> (I don't know about UTF-8. For reasons that are still mysterious to me, UTF-8 typically isn't handled as well as people expect it to be. Windows really really likes UTF-16 for Unicode. In practice, this is not a big deal, because UTF-8 and UTF-16 are losslessly convertible.)

I've found that for a multi-platform library, the most
straight-forward strategy for handling Unicode is to use UTF-8 which
when running on Windows gets converted to UTF-16 just before calling a
SomethingSomethingW function.

Why not the other way around (use UTF-16 and convert to UTF-8 before
calling Posix functions)? Because:

- most portable Unicode-aware libraries use UTF-8,
- many unaware libraries just work with UTF-8,
- even on Windows, last time I checked MinGW still doesn't support
std::wstring which makes it difficult to manage UTF-16 strings
(assuming portability is important.)

Emil Dotchevski
Reverge Studios, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at