Subject: Re: [boost] boost filesystem path as utf-8?
From: Beman Dawes (bdawes_at_[hidden])
Date: 2012-01-23 14:52:48
On Mon, Jan 23, 2012 at 9:28 AM, Yakov Galka <ybungalobill_at_[hidden]> wrote:
> On Mon, Jan 23, 2012 at 14:47, Beman Dawes <bdawes_at_[hidden]> wrote:
> > On Mon, Jan 23, 2012 at 4:46 AM, Yakov Galka <ybungalobill_at_[hidden]>
> > wrote:
> > [...]
> > > Unfortunately it boils to the interface whence you can
> > > get a c_str() to a UTF-16 string only.
> > That's not correct.
> It's correct. I state that path::c_str() returns UTF-16 on Windows. It's a
> fact. So the encoding isn't an implementation detail but a part of the
As quoted above, you said only that "...the interface whence you can get a
c_str() to a UTF-16 string only."
The interface includes multiple observers, which return values with various
encodings other than UTF-16. The return types from the observers allow
c_str() to access those values.
During the design discussions, two other alternatives were discussed. (1)
Always hold the path internally in a char string encoded UTF-8. The cost on
Windows is that a conversion has to be done before every file system
operation. The cost on POSIX is that a double conversion has to be done
before every file system operation if the encoding is not UTF-8. (2) Hold
two strings internally, one in the native type and encoding, the other in
UTF-8. The cost is trying to keep them in sync, with the conversions that
implies, for some definition of "in sync".
If class std::basic_string itself had better support for string
interoperability, class path would be able to side step at least some of
the conversion headaches.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk