Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-10-27 17:56:59

Alf P. Steinbach wrote:
> On 27.10.2011 21:07, Peter Dimov wrote:
> > Alf P. Steinbach wrote:
> >> Right, that's one reason why modern Windows programs should best be
> >> wchar_t based.
> >
> > This is one of the two options. The other is using UTF-8 for
> > representing paths as narrow strings. The first option is more natural
> > for Windows-only code, and the second is better, in practice, for
> > portable code because it avoids the need to duplicate all path-related
> > functions for char/wchar_t. The motivation for using UTF-8 is practical,
> > not political or religious.
> Thanks for that clarification of the current thinking at Boost.

My opinion is not representative of all of Boost, although I've found that
there is substantial agreement between people who write portable software
that needs to deal with paths (#2, UTF-8, as the way to go).

> 3. the most natural sufficiently general native encoding, 1 or 2
> depending on the platform that the source is being built for.

Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type, 3c,
providing both char and wchar_t overloads. They all have their problems;
people don't move to UTF-8 merely out of spite.

> Prior art in this direction, includes Microsoft's [tchar.h].

This works, more or less, once you've accumulated the appropriate library of
_T macros, _t functions and T/t typedefs. I've never heard of it actually
being used for a portable code base, but I admit that it's possible to do
things this way, even if it's somewhat alien to POSIX people.

The advantage of using UTF-8 is that, apart from the border layer that calls
the OS (and that needs to be ported either way), the rest of the code is
happily char[]-based. There's no need to be aware of the fact that literals
need to be quoted or that strlen should be spelled _tcslen. There's no need
to convert paths to an external representation when writing them into a
portable config/project file.

> That's an unrelated issue, really, but I think Boost could use a "get
> undamaged program arguments in portable strings" thing, if it isn't there
> already?

We'll be back to the question of what constitutes a portable string. I'd
prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer

Boost list run by bdawes at, gregod at, cpdaniel at, john at