Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-10-28 06:36:58


On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach <
alf.p.steinbach+usenet_at_[hidden]> wrote:

> On 27.10.2011 23:56, Peter Dimov wrote:
>
>> Alf P. Steinbach wrote:
>>
>>> On 27.10.2011 21:07, Peter Dimov wrote:
>>> > Alf P. Steinbach wrote:
>>>
>> ...
>>
>>> >> Right, that's one reason why modern Windows programs should best be
>>> >> wchar_t based.
>>> >
>>> > This is one of the two options. The other is using UTF-8 for
>>> > representing paths as narrow strings. The first option is more natural
>>> > for Windows-only code, and the second is better, in practice, for
>>> > portable code because it avoids the need to duplicate all path-related
>>> > functions for char/wchar_t. The motivation for using UTF-8 is
>>> practical,
>>> > not political or religious.
>>>
>>> Thanks for that clarification of the current thinking at Boost.
>>>
>>
>> My opinion is not representative of all of Boost, although I've found
>> that there is substantial agreement between people who write portable
>> software that needs to deal with paths (#2, UTF-8, as the way to go).
>>
>> 3. the most natural sufficiently general native encoding, 1 or 2
>>> depending on the platform that the source is being built for.
>>>
>>
>> Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type,
>> 3c, providing both char and wchar_t overloads. They all have their
>> problems; people don't move to UTF-8 merely out of spite.
>>
>> Prior art in this direction, includes Microsoft's [tchar.h].
>>>
>>
>> This works, more or less, once you've accumulated the appropriate
>> library of _T macros, _t functions and T/t typedefs. I've never heard of
>> it actually being used for a portable code base,
>>
>
> [tchar.h], plus the similar support in <windows.h>, was heavily used for
> porting applications between Windows 9x ANSI and Windows NT Unicode, before
> Microsoft introduced the Layer for Unicode in 2001 or thereabouts (the layer
> allowed wchar_t-apps to run in Windows 9x).
>
> I'm not saying it's a good C++ approach for that porting -- it's not,
> since it was designed for the C language.
>
> I just gave it as an example of prior art, which includes a neat header
> where the names of the relevant functions to wrap (or whatever) can be
> extracted by a small Python script. ;-)
>
>
>
> but I admit that it's
>> possible to do things this way, even if it's somewhat alien to POSIX
>> people.
>>
>> The advantage of using UTF-8 is that, apart from the border layer that
>> calls the OS (and that needs to be ported either way), the rest of the
>> code is happily char[]-based.
>>
>
> Oh.
>
> I would be happy to learn this.
>
> How do I make the following program work with Visual C++ in Windows, using
> narrow character string?
>
>
> <code>
> #include <stdio.h>
> #include <fcntl.h> // _O_U8TEXT
> #include <io.h> // _setmode, _fileno
> #include <windows.h>
>
> int main()
> {
> //SetConsoleOutputCP( 65001 );
> //_setmode( _fileno( stdout ), _O_U8TEXT );
> printf( "Blåbærsyltetøy! 日本国 кошка!\n" );
> }
> </code>
>

How will you make this program portable?

The out-commented code is from my random efforts to Make It Work(TM).
>
> It refused.
>

This is because windows narrow-chars can't be UTF-8. You could make it
portable by:

int main()
{
    boost::printf("BlÃ¥bærsyltetøy! 日本国 кошка!\n");
}

>
> By the way, I'm hoping Boost isn't supporting old versions of g++.
>
> Because old versions of g++ chocked on a BOM at start of UTF-8 encoded
> source code, while Visual C++ requires that BOM... So, UTF-8 source code
> ungood with old versions of g++, if Visual C++ is also used.

If you don't use widechars, you can cheat VC++ to use UTF-8 string-literals.
Just save the file as UTF-8 *without* BOM. It will just embed them verbatim
into the executable.

 There's no need to be aware of the fact
>> that literals need to be quoted or that strlen should be spelled
>> _tcslen. There's no need to convert paths to an external representation
>> when writing them into a portable config/project file.
>>
>
> Hm, I'm not so sure.
>
> I'd like to see this magic in action before believing in it, e.g., the
> program above working with narrow chars and printf, with Visual C++.

See above and see
http://permalink.gmane.org/gmane.comp.lib.boost.devel/225036

>
> That's an unrelated issue, really, but I think Boost could use a "get
>>> undamaged program arguments in portable strings" thing, if it isn't
>>> there already?
>>>
>>
>> We'll be back to the question of what constitutes a portable string. I'd
>> prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer
>> TCHAR[].
>>
>
> No, not TCHAR, which was designed for the C language (and is an ugly
> uppercase name to boot).
>
> Instead, like this:
>
>
> <code>
> #include "u/stdio_h.h" // u::CodingValue, u::sprintf, U
>
> #undef UNICODE
> #define UNICODE
> #include <windows.h> // MessageBox
>
> int main()
> {
> u::CodingValue buffer[80];
>
> sprintf( buffer, U( "The answer is %d!" ), 6*7 ); // Koenig lookup.
> MessageBox(
> 0,
> buffer->rawPtr(),
> U( "This is a title!" )->rawPtr(),
> MB_ICONINFORMATION | MB_SETFOREGROUND
> );
> }
> </code>
>

You judge from a non-portable coed point-of-view. How about:

#inclued <cstdio>
#include "gtkext/message_box.h" // for gtkext::message_box

int main()
{
    char buffer[80];
    sprintf(buffer, "The answer is %d!", 6*7);
    gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah,
...);
}

And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on
windows)

Sincerely,

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk