Boost logo

Boost :

Subject: Re: [boost] [filesystem]Extracting path as string from wpath
From: Peter Dimov (pdimov_at_[hidden])
Date: 2008-10-20 09:06:48


Beman Dawes:

> Yes. The situation on POSIX systems is quite messy.

I'm not sure that it is as messy as usually cited. There are basically two
cases:

1. The filesystem is "8 bit neutral", that is, it stores the NTBS that is
passed exactly as-is (and returns it unmodified);

2. The filesystem uses UTF-16 (NTFS and HPFS+). In this case, the OS
translates the NTBS to UTF-16 for storage (using the system codepage in
Windows, UTF-8 in Mac OS X, and the codepage specified at mount time on
Linux) and translates the UTF-16 name from the FS back when returning it to
the application. Note that the roundtrip on HPFS+ may not produce the
original NTBS even for valid UTF-8 inputs because of the Unicode
normalization that occurs (but it does produce the original string, as read
by the user).

Most of the perceived complexity comes from the fact that people living in
the (1) world can't comprehend that non-neutral filesystems exist and expect
to be able to (1) pass arbitrary byte strings to the OS and (2) get them
back. This leads to other mistaken beliefs that it's possible for the user
to choose the encoding of the input.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk