Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-28 07:17:06


On 28.10.2011 12:36, Yakov Galka wrote:
> On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach<
> alf.p.steinbach+usenet_at_[hidden]> wrote:
>
>> On 27.10.2011 23:56, Peter Dimov wrote:
>>>
>>> The advantage of using UTF-8 is that, apart from the border layer that
>>> calls the OS (and that needs to be ported either way), the rest of the
>>> code is happily char[]-based.
>>
>> Oh.
>>
>> I would be happy to learn this.
>>
>> How do I make the following program work with Visual C++ in Windows, using
>> narrow character string?
>>
>>
>> <code>
>> #include<stdio.h>
>> #include<fcntl.h> // _O_U8TEXT
>> #include<io.h> // _setmode, _fileno
>> #include<windows.h>
>>
>> int main()
>> {
>> //SetConsoleOutputCP( 65001 );
>> //_setmode( _fileno( stdout ), _O_U8TEXT );
>> printf( "Blåbærsyltetøy! 日本国 кошка!\n" );
>> }
>> </code>
>>
>
> How will you make this program portable?

Well, that was *my* question.

The claim that this minimal "Hello, world!" program puts to the point,
is that "the rest of the [UTF-8 based] code is happily char[]-based".

Apparently that is not so.

> The out-commented code is from my random efforts to Make It Work(TM).
>>
>> It refused.
>>
>
> This is because windows narrow-chars can't be UTF-8. You could make it
> portable by:
>
> int main()
> {
> boost::printf("Blåbærsyltetøy! 日本国 кошка!\n");
> }

Thanks, TIL boost::printf.

The idea of UTF-8 as a universal encoding seems now to be to use some
workaround such as boost::printf for each and every case where it turns
out that it doesn't work portably.

When every portability problem has been diagnosed and special cased to
use functions that translate to/from UTF-8 translation, and ignoring the
efficiency aspect of that, then UTF-8 just magically works, hurray.

E.g., if 'fopen( "rød.txt", "r" )' fails in the universal UTF-8 code,
then just replace with 'boost::fopen', or 'my_special_casing::fopen'.

However, with these workaround details made manifest, it is /much less/
convincing than the original general vague claim that UTF-8 just works.

[snip]
> You judge from a non-portable coed point-of-view. How about:
>
> #include <cstdio>
> #include "gtkext/message_box.h" // for gtkext::message_box
>
> int main()
> {
> char buffer[80];
> sprintf(buffer, "The answer is %d!", 6*7);
> gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah,
> ...);
> }
>
> And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on
> windows)

Aha. When you use a library L that translates in platform-specific ways
to/from UTF-8 for you, then UTF-8 is magically portable. For use of L.

However, try to pass a `main` argument over to gtkext::message_box.

Then you have involved some /ohter code/ (namely the runtime library
code that calls 'main') that may not necessarily translate for you, and
in fact in Windows is extremely unlikely to translate for you.

Such code is prevalent.

Most code does not translate to/from UTF-8.

Cheers & hth., & thanks for mention of boost::printf,

- Alf

PS: With C++11 there is no longer any reason to use <cstdio> instead of
<stdio.h>, because <cstdio> no longer formally guarantees to not pollute
the global namespace (and in practice it has never honored its C++98
guarantee). The code above is a good example why <stdio.h> is preferable
-- it is too easy to write non-portable code with <cstdio>, such as
using unqualified sprintf (not to mention size_t!).


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk