Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-28 11:47:20

Next message: Alf P. Steinbach: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Previous message: Andrey Semashev: "Re: [boost] [atomic] comments"
In reply to: Yakov Galka: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Next in thread: Peter Dimov: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Reply: Peter Dimov: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Reply: Anders Dalvander: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"

On 28.10.2011 14:41, Yakov Galka wrote:
> On Fri, Oct 28, 2011 at 13:58, Peter Dimov<pdimov_at_[hidden]> wrote:
>
>> Alf P. Steinbach wrote:
>>
>> How do I make the following program work with Visual C++ in Windows, using
>>> narrow character string?
>>>
>>> <code>
>>> #include<stdio.h>
>>> #include<fcntl.h> // _O_U8TEXT
>>> #include<io.h> // _setmode, _fileno
>>> #include<windows.h>
>>>
>>> int main()
>>> {
>>> //SetConsoleOutputCP( 65001 );
>>> //_setmode( _fileno( stdout ), _O_U8TEXT );
>>> printf( "BlÃ¥bÃ¦rsyltetÃ¸y! æ—¥æœ¬å›½ ÐºÐ¾ÑˆÐºÐ°!\n" );
>>> }
>>> </code>
>>>
>>
>> Output to a console wasn't our topic so far (and is not one of my strong
>> points), but the specific problem with this program is that the embedded
>> literal is not UTF-8, as the warning C4566 tells us, so there is no way for
>> you to get UTF-8 in the output. (You should be able to set VC++'s code page
>> to 65001, but I don't think you can.)
>>
>> int main()
>> {
>> printf( utf8_encode( L"ÐºÐ¾ÑˆÐºÐ°" ).c_str() );
>> }
>>
>
> You don't need to configure anything, in fact you cannot do it properly in
> VS. What you can do is:
>
> 1) don't use wide-char literals with non ascii characters
> 2) use UTF-8 literals for narrow-char.
>
> All you need is to save the source as UTF-8 WITHOUT BOM. Works as charm on
> VS2005 and VS2010. Apparently it's portable. The IDE can detect UTF-8 even
> without BOM ("â˜‘ Auto-detect UTF-8 encoding without signature").

This is interesting in a perverse sort of way.

In order to make Visual C++ produce UTF-8 encoded compiled narrow
strings, one must /lie/ to the compiler. The source code is UTF-8. And
one lies and tells the Visual C++ compiler that it's ANSI.

And in order to make g++ produce ANSI encoded compiled narrow strings,
one must /lie/ the compiler. The source code is ANSI. And one lies and
tells the g++ compiler that it's UTF-8.

As I see it, there's something wrong here.

Notwithstanding the limitation that codepage 65000 is impractical in the
Windows command interpreter -- e.g. 'more' command CRASHES.

>> This is not a practical problem for "proper" applications because Russian
>> text literals should always come from the equivalent of gettext and never be
>> embedded in code.
>
> +1

I find that a very narrow minded view.

Would you like to be the one telling Norwegian student Ã…shild BjÃ¸rnson
that you favor the notion that she should waste hours or days installing
Boost and some other nix-oriented library and use 'gettext', in order to
be able to display her name in her first C++ program?

That text representation and output in C++ has been designed (with your
not just willing but enthusiastic vote) to be so inherently complex that
it requires hours and days of efforts just to display your name?

> Personally I'm happy with
>
> printf( "BlÃ¥bÃ¦rsyltetÃ¸y! æ—¥æœ¬å›½ ÐºÐ¾ÑˆÐºÐ°!\n" );
>
> writing UTF-8. Even if I cannot configure the console, I still can redirect
> it to a file, and it will correctly save this as UTF-8. Preventing data-loss
> is more important for me.

I find it thoroughly disgusting to have to lie to your tools, and to
rely on an assumption that the tools will not wisen up in the future.

However, I concede the point that IF one is happy with output that's
encoded so that most Windows command line tools fail (e.g. `more`
crashes), and IF one is happy with lying to the compiler about the
source encoding, and IF one is happy assuming that the compiler won't
wisen up about encodings in a future version, then -- the UTF-8 scheme
allows literals with national language characters, not just A through Z.

However, those are pretty constricting conditions.

Cheers & hth.,

- Alf

Next message: Alf P. Steinbach: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Previous message: Andrey Semashev: "Re: [boost] [atomic] comments"
In reply to: Yakov Galka: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Next in thread: Peter Dimov: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Reply: Peter Dimov: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"
Reply: Anders Dalvander: "Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk