Subject: Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-10-28 06:36:58
On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach <
> On 27.10.2011 23:56, Peter Dimov wrote:
>> Alf P. Steinbach wrote:
>>> On 27.10.2011 21:07, Peter Dimov wrote:
>>> > Alf P. Steinbach wrote:
>>> >> Right, that's one reason why modern Windows programs should best be
>>> >> wchar_t based.
>>> > This is one of the two options. The other is using UTF-8 for
>>> > representing paths as narrow strings. The first option is more natural
>>> > for Windows-only code, and the second is better, in practice, for
>>> > portable code because it avoids the need to duplicate all path-related
>>> > functions for char/wchar_t. The motivation for using UTF-8 is
>>> > not political or religious.
>>> Thanks for that clarification of the current thinking at Boost.
>> My opinion is not representative of all of Boost, although I've found
>> that there is substantial agreement between people who write portable
>> software that needs to deal with paths (#2, UTF-8, as the way to go).
>> 3. the most natural sufficiently general native encoding, 1 or 2
>>> depending on the platform that the source is being built for.
>> Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type,
>> 3c, providing both char and wchar_t overloads. They all have their
>> problems; people don't move to UTF-8 merely out of spite.
>> Prior art in this direction, includes Microsoft's [tchar.h].
>> This works, more or less, once you've accumulated the appropriate
>> library of _T macros, _t functions and T/t typedefs. I've never heard of
>> it actually being used for a portable code base,
> [tchar.h], plus the similar support in <windows.h>, was heavily used for
> porting applications between Windows 9x ANSI and Windows NT Unicode, before
> Microsoft introduced the Layer for Unicode in 2001 or thereabouts (the layer
> allowed wchar_t-apps to run in Windows 9x).
> I'm not saying it's a good C++ approach for that porting -- it's not,
> since it was designed for the C language.
> I just gave it as an example of prior art, which includes a neat header
> where the names of the relevant functions to wrap (or whatever) can be
> extracted by a small Python script. ;-)
> but I admit that it's
>> possible to do things this way, even if it's somewhat alien to POSIX
>> The advantage of using UTF-8 is that, apart from the border layer that
>> calls the OS (and that needs to be ported either way), the rest of the
>> code is happily char-based.
> I would be happy to learn this.
> How do I make the following program work with Visual C++ in Windows, using
> narrow character string?
> #include <stdio.h>
> #include <fcntl.h> // _O_U8TEXT
> #include <io.h> // _setmode, _fileno
> #include <windows.h>
> int main()
> //SetConsoleOutputCP( 65001 );
> //_setmode( _fileno( stdout ), _O_U8TEXT );
> printf( "BlÃ¥bÃ¦rsyltetÃ¸y! æ¥æ¬å½ ÐºÐ¾ÑÐºÐ°!\n" );
How will you make this program portable?
The out-commented code is from my random efforts to Make It Work(TM).
> It refused.
This is because windows narrow-chars can't be UTF-8. You could make it
boost::printf("BlÃ¥bÃ¦rsyltetÃ¸y! æ¥æ¬å½ ÐºÐ¾ÑÐºÐ°!\n");
> By the way, I'm hoping Boost isn't supporting old versions of g++.
> Because old versions of g++ chocked on a BOM at start of UTF-8 encoded
> source code, while Visual C++ requires that BOM... So, UTF-8 source code
> ungood with old versions of g++, if Visual C++ is also used.
If you don't use widechars, you can cheat VC++ to use UTF-8 string-literals.
Just save the file as UTF-8 *without* BOM. It will just embed them verbatim
into the executable.
There's no need to be aware of the fact
>> that literals need to be quoted or that strlen should be spelled
>> _tcslen. There's no need to convert paths to an external representation
>> when writing them into a portable config/project file.
> Hm, I'm not so sure.
> I'd like to see this magic in action before believing in it, e.g., the
> program above working with narrow chars and printf, with Visual C++.
See above and see
> That's an unrelated issue, really, but I think Boost could use a "get
>>> undamaged program arguments in portable strings" thing, if it isn't
>>> there already?
>> We'll be back to the question of what constitutes a portable string. I'd
>> prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer
> No, not TCHAR, which was designed for the C language (and is an ugly
> uppercase name to boot).
> Instead, like this:
> #include "u/stdio_h.h" // u::CodingValue, u::sprintf, U
> #undef UNICODE
> #define UNICODE
> #include <windows.h> // MessageBox
> int main()
> u::CodingValue buffer;
> sprintf( buffer, U( "The answer is %d!" ), 6*7 ); // Koenig lookup.
> U( "This is a title!" )->rawPtr(),
> MB_ICONINFORMATION | MB_SETFOREGROUND
You judge from a non-portable coed point-of-view. How about:
#include "gtkext/message_box.h" // for gtkext::message_box
sprintf(buffer, "The answer is %d!", 6*7);
gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah,
And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk