Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8? (was[Process] List of small issues)
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-01-14 09:09:02


Alexander Lamaison wrote:

> > On Fri, 14 Jan 2011 00:48:43 -0800 (PST), Artyom wrote:
...
> > Two problems with this approach:
> >
> > - Even if the encoding under POSIX platforms is not UTF-8 you will
> > be still able to open files, close them, stat on them and do any
> > other operations regardless encoding as POSIX API is encoding
> > agnostic, this is why it works well.
>
> This isn't a problem, right? This is exactly why it _does_ work :D
> Assume
> the strings are in OS-default encoding, don't mess with them, hand them to
> the OS API which knows how to treat them.

It doesn't always work. On Mac OS X, the paths must be UTF-8; the OS isn't
encoding-agnostic, because the HFS+ file system stores file names as UTF-16
(much like NTFS). You can achieve something similar on Linux by mounting a
HFS+ or NTFS file system; the encoding is then specified at mount time and
should also be observed. Of course, file systems that store file names as
arbitrary null-terminated byte sequences are typically encoding-agnostic.

For my own code, I've gradually reached the conclusion that I should always
use UTF-8 encoded narrow paths. This may not be feasible for a library (yet)
because people still insist on using other encodings on Unix-like OSes,
usually koi8-r. :-) I'm anxiously awaiting the day everyone in the
Linux/Unix world will finally switch to UTF-8 so we can be done with this
question once and for all.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk