Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-27 22:23:01


On 27.10.2011 23:56, Peter Dimov wrote:
> Alf P. Steinbach wrote:
>> On 27.10.2011 21:07, Peter Dimov wrote:
>> > Alf P. Steinbach wrote:
> ...
>> >> Right, that's one reason why modern Windows programs should best be
>> >> wchar_t based.
>> >
>> > This is one of the two options. The other is using UTF-8 for
>> > representing paths as narrow strings. The first option is more natural
>> > for Windows-only code, and the second is better, in practice, for
>> > portable code because it avoids the need to duplicate all path-related
>> > functions for char/wchar_t. The motivation for using UTF-8 is
>> practical,
>> > not political or religious.
>>
>> Thanks for that clarification of the current thinking at Boost.
>
> My opinion is not representative of all of Boost, although I've found
> that there is substantial agreement between people who write portable
> software that needs to deal with paths (#2, UTF-8, as the way to go).
>
>> 3. the most natural sufficiently general native encoding, 1 or 2
>> depending on the platform that the source is being built for.
>
> Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type,
> 3c, providing both char and wchar_t overloads. They all have their
> problems; people don't move to UTF-8 merely out of spite.
>
>> Prior art in this direction, includes Microsoft's [tchar.h].
>
> This works, more or less, once you've accumulated the appropriate
> library of _T macros, _t functions and T/t typedefs. I've never heard of
> it actually being used for a portable code base,

[tchar.h], plus the similar support in <windows.h>, was heavily used for
porting applications between Windows 9x ANSI and Windows NT Unicode,
before Microsoft introduced the Layer for Unicode in 2001 or thereabouts
(the layer allowed wchar_t-apps to run in Windows 9x).

I'm not saying it's a good C++ approach for that porting -- it's not,
since it was designed for the C language.

I just gave it as an example of prior art, which includes a neat header
where the names of the relevant functions to wrap (or whatever) can be
extracted by a small Python script. ;-)

> but I admit that it's
> possible to do things this way, even if it's somewhat alien to POSIX
> people.
>
> The advantage of using UTF-8 is that, apart from the border layer that
> calls the OS (and that needs to be ported either way), the rest of the
> code is happily char[]-based.

Oh.

I would be happy to learn this.

How do I make the following program work with Visual C++ in Windows,
using narrow character string?

<code>
#include <stdio.h>
#include <fcntl.h> // _O_U8TEXT
#include <io.h> // _setmode, _fileno
#include <windows.h>

int main()
{
     //SetConsoleOutputCP( 65001 );
     //_setmode( _fileno( stdout ), _O_U8TEXT );
     printf( "BlÃ¥bærsyltetøy! 日本国 кошка!\n" );
}
</code>

The out-commented code is from my random efforts to Make It Work(TM).

It refused.

By the way, I'm hoping Boost isn't supporting old versions of g++.

Because old versions of g++ chocked on a BOM at start of UTF-8 encoded
source code, while Visual C++ requires that BOM... So, UTF-8 source code
ungood with old versions of g++, if Visual C++ is also used.

> There's no need to be aware of the fact
> that literals need to be quoted or that strlen should be spelled
> _tcslen. There's no need to convert paths to an external representation
> when writing them into a portable config/project file.

Hm, I'm not so sure.

I'd like to see this magic in action before believing in it, e.g., the
program above working with narrow chars and printf, with Visual C++.

>> That's an unrelated issue, really, but I think Boost could use a "get
>> undamaged program arguments in portable strings" thing, if it isn't
>> there already?
>
> We'll be back to the question of what constitutes a portable string. I'd
> prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer
> TCHAR[].

No, not TCHAR, which was designed for the C language (and is an ugly
uppercase name to boot).

Instead, like this:

<code>
#include "u/stdio_h.h" // u::CodingValue, u::sprintf, U

#undef UNICODE
#define UNICODE
#include <windows.h> // MessageBox

int main()
{
     u::CodingValue buffer[80];

     sprintf( buffer, U( "The answer is %d!" ), 6*7 ); // Koenig lookup.
     MessageBox(
         0,
         buffer->rawPtr(),
         U( "This is a title!" )->rawPtr(),
         MB_ICONINFORMATION | MB_SETFOREGROUND
         );
}
</code>

I coded up that support after reading the article I'm responding to now,
because I felt that without coding it up I would be just spewing gut
feelings and hunches. Well-informed such, but still. So I coded. :-)

Cheers & hth.,

- Alf


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk