Boost logo

Boost :

Subject: Re: [boost] "Best so far" for C level i/o of Unicode text with Windows console
From: Emil Dotchevski (emildotchevski_at_[hidden])
Date: 2011-11-03 19:43:21


On Thu, Nov 3, 2011 at 4:14 PM, Stephan T. Lavavej
<stl_at_[hidden]> wrote:
> [Alf P. Steinbach]
>> I found that the Visual C++ implementation of the C library i/o
>> generally does not support console input of international characters. It
>> can deal with narrow character input from the current codepage, if that
>> codepage is not UTF-8.
>
> Changing the console's codepage isn't the right magic. See http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx
>
> With _O_U16TEXT, VC8+ can write Unicode to the console perfectly. However, I believe that input was broken up to and including VC10, and that it's been fixed in VC11.
>
> (I don't know about UTF-8. For reasons that are still mysterious to me, UTF-8 typically isn't handled as well as people expect it to be. Windows really really likes UTF-16 for Unicode. In practice, this is not a big deal, because UTF-8 and UTF-16 are losslessly convertible.)

I've found that for a multi-platform library, the most
straight-forward strategy for handling Unicode is to use UTF-8 which
when running on Windows gets converted to UTF-16 just before calling a
SomethingSomethingW function.

Why not the other way around (use UTF-16 and convert to UTF-8 before
calling Posix functions)? Because:

- most portable Unicode-aware libraries use UTF-8,
- many unaware libraries just work with UTF-8,
- even on Windows, last time I checked MinGW still doesn't support
std::wstring which makes it difficult to manage UTF-16 strings
(assuming portability is important.)

Emil Dotchevski
Reverge Studios, Inc.
http://www.revergestudios.com/reblog/index.php?n=ReCode


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk