|
Boost : |
Subject: Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-28 11:47:20
On 28.10.2011 14:41, Yakov Galka wrote:
> On Fri, Oct 28, 2011 at 13:58, Peter Dimov<pdimov_at_[hidden]> wrote:
>
>> Alf P. Steinbach wrote:
>>
>> How do I make the following program work with Visual C++ in Windows, using
>>> narrow character string?
>>>
>>> <code>
>>> #include<stdio.h>
>>> #include<fcntl.h> // _O_U8TEXT
>>> #include<io.h> // _setmode, _fileno
>>> #include<windows.h>
>>>
>>> int main()
>>> {
>>> //SetConsoleOutputCP( 65001 );
>>> //_setmode( _fileno( stdout ), _O_U8TEXT );
>>> printf( "BlÃ¥bærsyltetøy! æ¥æ¬å½ коÑка!\n" );
>>> }
>>> </code>
>>>
>>
>> Output to a console wasn't our topic so far (and is not one of my strong
>> points), but the specific problem with this program is that the embedded
>> literal is not UTF-8, as the warning C4566 tells us, so there is no way for
>> you to get UTF-8 in the output. (You should be able to set VC++'s code page
>> to 65001, but I don't think you can.)
>>
>> int main()
>> {
>> printf( utf8_encode( L"коÑка" ).c_str() );
>> }
>>
>
> You don't need to configure anything, in fact you cannot do it properly in
> VS. What you can do is:
>
> 1) don't use wide-char literals with non ascii characters
> 2) use UTF-8 literals for narrow-char.
>
> All you need is to save the source as UTF-8 WITHOUT BOM. Works as charm on
> VS2005 and VS2010. Apparently it's portable. The IDE can detect UTF-8 even
> without BOM ("â Auto-detect UTF-8 encoding without signature").
This is interesting in a perverse sort of way.
In order to make Visual C++ produce UTF-8 encoded compiled narrow
strings, one must /lie/ to the compiler. The source code is UTF-8. And
one lies and tells the Visual C++ compiler that it's ANSI.
And in order to make g++ produce ANSI encoded compiled narrow strings,
one must /lie/ the compiler. The source code is ANSI. And one lies and
tells the g++ compiler that it's UTF-8.
As I see it, there's something wrong here.
Notwithstanding the limitation that codepage 65000 is impractical in the
Windows command interpreter -- e.g. 'more' command CRASHES.
>> This is not a practical problem for "proper" applications because Russian
>> text literals should always come from the equivalent of gettext and never be
>> embedded in code.
>
> +1
I find that a very narrow minded view.
Would you like to be the one telling Norwegian student Ã
shild Bjørnson
that you favor the notion that she should waste hours or days installing
Boost and some other nix-oriented library and use 'gettext', in order to
be able to display her name in her first C++ program?
That text representation and output in C++ has been designed (with your
not just willing but enthusiastic vote) to be so inherently complex that
it requires hours and days of efforts just to display your name?
> Personally I'm happy with
>
> printf( "BlÃ¥bærsyltetøy! æ¥æ¬å½ коÑка!\n" );
>
> writing UTF-8. Even if I cannot configure the console, I still can redirect
> it to a file, and it will correctly save this as UTF-8. Preventing data-loss
> is more important for me.
I find it thoroughly disgusting to have to lie to your tools, and to
rely on an assumption that the tools will not wisen up in the future.
However, I concede the point that IF one is happy with output that's
encoded so that most Windows command line tools fail (e.g. `more`
crashes), and IF one is happy with lying to the compiler about the
source encoding, and IF one is happy assuming that the compiler won't
wisen up about encodings in a future version, then -- the UTF-8 scheme
allows literals with national language characters, not just A through Z.
However, those are pretty constricting conditions.
Cheers & hth.,
- Alf
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk