Boost logo

Boost :

Subject: Re: [boost] Silly Boost.Locale default narrowstringencodinginWindows
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-10-29 10:41:03


Alf P. Steinbach wrote, about chcp 65001:

> it break a hell of a lot more than batch files. try `more`.

Yes. Life isn't perfect. Incidentally, 'more' demonstrates once again the
superiority of UTF-8 (if it worked):

C:\Projects\testbed\tmp>dir
Volume in drive C has no label.
Volume Serial Number is 34C7-A38D

Directory of C:\Projects\testbed\tmp

29.10.2011 17:28 <DIR> .
29.10.2011 17:28 <DIR> ..
29.10.2011 17:25 0 Blåbærsyltetøy! 日本国 кошка!.txt
               1 File(s) 0 bytes
               2 Dir(s) 856,726,167,552 bytes free

C:\Projects\testbed\tmp>dir | more
Volume in drive C has no label.
Volume Serial Number is 34C7-A38D

Directory of C:\Projects\testbed\tmp

29.10.2011 17:28 <DIR> .
29.10.2011 17:28 <DIR> ..
29.10.2011 17:25 0 Blåbærsyltetoy! ??? ?????!.txt
               1 File(s) 0 bytes
               2 Dir(s) 856,726,167,552 bytes free

The "dir" command has no problem displaying arbitrary file names directly to
the console (presumably via WriteConsoleW), but once it has to write to a
file, it needs to convert to narrow and no code page other than 65001 can
express the above file name. (My default console code page is 437, which
doesn't even have ø. The Consolas font doesn't have glyphs for 日本国, but the
characters are present, just not displayable, which is why I could copy and
paste them here.)

It would've been nice for Microsoft to set all the narrow code pages to
UTF-8 in Windows NT (or Windows 64 bit, the other transition point), but
they didn't, so here we are.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk