Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrow string encoding inWindows
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-27 14:41:09

On 27.10.2011 20:01, Peter Dimov wrote:
> Alf P. Steinbach wrote:
>> On 27.10.2011 18:47, Peter Dimov wrote:
>> > Alf P. Steinbach wrote:
>> >
>> >> However, I still ask:
>> >>
>> >> why FORCE INEFFICIENCY & AWKWARDNESS on Boost users -- why not just do
>> >> it right, using the platforms' native encodings.
>> >
>> > Comment out the imbue line.
>> But that line is much of the point, isn't it?
> There wouldn't be much point in calling imbue if you didn't want a
> change in the boost::filesystem default behavior, which is to convert
> using the ANSI CP (or the OEM CP if AreFIleApisAnsi() returns false, if
> I'm not mistaken).

Oh there is.

It is a level of indirection.

You want Boost.Filesystem to assume /the same/ narrow character encoding
as Boost.Locale, whatever it is.

And to quote the docs where I found that program,

"Boost Locale fully supports both narrow and wide API. The default
character encoding is assumed to be UTF-8 on Windows."

>> > (The platform's native encoding is UTF-16. The "ANSI" code page, which
>> > is not necessarily ANSI or ANSI-like at all, despite your assertion,
>> The article you responded to did not contain the word "ANSI".
>> Thus, when you refer to an assertion about "ANSI", you have fantasized
>> something.

That's a different context and a different discussion, where it was
neither necessary nor natural to dot the i's and cross the t's to

Talk about dragging in things from out of the blue.

If you wanted to point out the possibility of e.g. a Japanese codepage
as ANSI, then you should have done that over there, in that thread. I
mean in the context where it could make sense and where it could help
prevent readers getting a wrong impression. If it was that important.


> Under Windows (NT+ and NTFS), the narrow character API is a wrapper over
> the wide character API. The system converts from/to the ANSI code page
> as needed. The narrowing conversion may lose data.

OK, we're just talking about two different meanings of "native", for two
different contexts: windows internals, and windows apps.

The relevant context for discussing Boost.Locale's treatment of narrow
strings, is the application level.

>> > [the program] will work fine until it's given a file name that is not
>> > representable in the ANSI CP.)
>> Nope, sorry, for any /reasonable interpretation/ of what you're writing.
> File names on NTFS are not necessarily representable in the ANSI code
> page. A program that uses narrow strings in the ANSI code page to
> represents paths will not necessarily be able to open all files on the
> system.

Right, that's one reason why modern Windows programs should best be
wchar_t based. Other reasons include efficiency (avoiding conversions)
and simple convenience. Some API functions do not have narrow wrappers.

However, a default assumption of UTF-8 encoding for narrow strings, as
in Boost.Locale, seems to me to clash with most uses of narrow strings.

For example, if you output UTF-8 on standard output, and then try to
pipe that through `more` in Windows' [cmd.exe], you get this:

d:\dave> chcp 65001
Active code page: 65001

d:\dave> echo "imagine this is utf8" | more
Not enough memory.

d:\dave> _

So utf-8 is, to put it less than strongly, not very practical as a
general narrow-character encoding in Windows.

The example that I gave at top of the thread was passing a `main`
argument further on, when using Boost.Locale. It causes trouble because
in Windows `main` arguments are by convention encoded as ANSI, while
Boost.Locale has UTF-8 as default. Treating ANSI as UTF-8 generally
yields gobbledygook, except for the pure ASCII common subset.

But with ANSI as Boost.Locale default, with that more reasonable choice
of default, the imbue call would not cause trouble, but would instead
help to avoid trouble -- which is surely the original intention.

Cheers & hth.,

- Alf

Boost list run by bdawes at, gregod at, cpdaniel at, john at