Boost logo

Boost Users :

From: Kirit Sælensminde (kirit.saelensminde_at_[hidden])
Date: 2007-05-28 05:25:17


Filip Konvička wrote:
> Kirit Sælensminde 26.5.2007 5:35:
>> Filip Konvička wrote:
>>
>>> Hi,
>>>
>>> in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no
>>> longer print "international" characters. When I use a temporary
>>> wostringstream for printing the ptime, everything is OK. Minimal repro
>>> see below (the "2" at the end is never printed). With some characters
>>> like the "Å¡" in the example, the output is totally cut off; with others,
>>> like "á", the codepage is changed, so the characters are displayed
>>> incorrectly.
>>>
>>> Any suggestions?
>>>
>> This is a problem in the MSVC libraries. If you print a character above
>> code 255 then the stream crashes and is good for nothing afterwards.
>> Only std::wstringstream doesn't have this problem.
>>
>> I think if you buy the Dinkumware libraries this works. The _cputws
>> function does work as you would expect, but it can't be piped. The
>> behaviour also changes if you run the program from a command shell with
>> Unicode turned on ( cmd.exe /u).
>>
>> I did talk to PJ Plauger about it on clc. This is what he explained:
>>
>> "When you write to a wofstream, the wchar_t sequence you write gets
>> converted to a byte sequence written to the file. How that conversion
>> occurs depends on the codecvt facet you choose. Choose none any you get
>> some default. In the case of VC++ the default is pretty stupid -- the
>> first 256 codes get written as single bytes and all other wide-character
>> codes fail to write. "
>>
>> http://groups.google.com/group/comp.lang.c++/browse_thread/thread/3c203253708befb5/1bc5d68887f1a72d?lnk=st&q=&rnum=107
>>
> How do you explain that the workaround works, then? When I don't send
> any ptime to wcout, all wcin / wcout i/o works as expected, including
> international characters (all I do is call setlocale(LC_ALL, ".OEM"); at
> startup).

I'm not sure which workaround you're referring to I'm afraid. I didn't
notice any call to setlocale in your example. As for why the \x161
displays I don't know (if that is what you are saying happens). Is it a
character available in the code page for the machine you are using?

The Unicode for the console should be able to handle the full range of
display restricted only by the font in use. The streams implementation
doesn't have such wide applicability though as it narrows it to eight
bit output. All the experimentation I've done leads me to the conclusion
that _cputws is able to display the widest range of characters properly,
but only if you start the command shell with the Unicode handling turned on.

I'm not suggesting that this is the only possible explanation for what
you are seeing, but it seemed a reasonable possibility given your
description.

K

This is the program I was using to test things (again must be compiled
with _UNICODE):

#include <iostream>
#include <conio.h>
#include <fstream>

int wmain(int /*argc*/, wchar_t* /*argv*/[])
{
        std::wcout << L"Hello world!" << std::endl;
        // Surname with AE ligature
        std::wcout << L"Hello Kirit S\x00e6lensminde" << std::endl;
        // Kirit transliterated (probably badly) into Greek
        std::wcout << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
        // Kirit transliterated into Thai
        std::wcout << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;

        //if ( std::wcout )
        // std::cout << "\nstd::wcout still good" << std::endl;
        //else
        // std::cout << "\nstd::wcout gone bad" << std::endl;

        _cputws( L"\n\n\n" );
        _cputws( L"Hello Kirit S\x00e6lensminde\n" ); // AE ligature
        _cputws( L"Hello \x039a\x03b9\x03c1\x03b9\x03c4\n" ); // Greek
        _cputws( L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17\n" ); // Thai

        std::wofstream wout1( "test1.txt" );
        wout1 << L"12345" << std::endl;

        //if ( wout1 )
        // std::cout << "\nwout1 still good" << std::endl;
        //else
        // std::cout << "\nwout1 gone bad" << std::endl;

        std::wofstream wout2( "test2.txt" );
        wout2 << L"Hello world!" << std::endl;
        wout2 << L"Hello Kirit S\x00e6lensminde" << std::endl;
        wout2 << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
        wout2 << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;

        //if ( wout2 )
        // std::cout << "\nwout2 still good" << std::endl;
        //else
        // std::cout << "\nwout2 gone bad" << std::endl;

        return 0;
}


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net