Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow string encoding in Windows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-29 13:07:56


On 29.10.2011 18:23, Daniel James wrote:
> On Saturday, 29 October 2011, Peter Dimov wrote:
>>
>>
>> The "dir" command has no problem displaying arbitrary file names directly
>> to the console (presumably via WriteConsoleW), but once it has to write to
>> a file, it needs to convert to narrow and no code page other than 65001 can
>> express the above file name.
>>
>
> This is not that relevant to the wider issue, but wide streams will work
> for console output if you first do this:
>
> if (_isatty(_fileno(stdout))) _setmode(_fileno(stdout), _O_U16TEXT);
> if (_isatty(_fileno(stderr))) _setmode(_fileno(stderr), _O_U16TEXT);
>
> i.e. set the output mode to UTF-16 when writing to the console. This only
> works for recent versions of Visual C++. Obviously doesn't fix piped output.

Right.

But the added 'if's produce another problem, namely that redirection to
a file is prevented from working.

<example>
P:\test> chcp 65001
Active code page: 65001

P:\test> type jam.cpp
#include <stdio.h>
#include <io.h> // _setmode
#include <fcntl.h> // _O_U8TEXT

int main()
{
     //_setmode( _fileno( stdout ), _O_U8TEXT );
     if( _isatty( _fileno( stdout ) ) )
     {
         _setmode( _fileno( stdout ), _O_U16TEXT );
     }
     ::wprintf( L"BlÃ¥bærsyltetøy! 日本国 кошка!\n" );
}

P:\test> cl jam.cpp
jam.cpp

P:\test> jam
Blåbærsyltetøy! 日本国 кошка!

P:\test> jam >x

P:\test> type x
Bl�b�rsyltet�y!
P:\test> _
</example>

Without the added 'if's, and instead adding a Unicode BOM to the start
of the text, it works fine for redirection:

<example
P:\test> chcp 65001
Active code page: 65001

P:\test> type jam.cpp
#include <stdio.h>
#include <io.h> // _setmode
#include <fcntl.h> // _O_U16TEXT

int main()
{
     _setmode( _fileno( stdout ), _O_U16TEXT );
     ::wprintf( L"\uFEFF" L"BlÃ¥bærsyltetøy! 日本国 кошка!\n" );
}

P:\test> cl jam.cpp
jam.cpp
jam.cpp(8) : warning C4428: universal-character-name encountered in source

P:\test> jam
Blåbærsyltetøy! 日本国 кошка!

P:\test> jam >x

P:\test> type x
Blåbærsyltetøy! 日本国 кошка!

P:\test> chcp 437
Active code page: 437

P:\test> type x
Blåbærsyltetøy! 日本国 кошка!

P:\test> _
</example>

UTF-8 is even more forgiving as an external format. You don't see the
BOM. Oh, I see that it's disappeared above, difficult to copy-paste, but
it's there in the direct output as a rectangle.

Cheers & hth.,

- Alf


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk