Boost logo

Boost :

Subject: Re: [boost] boost filesystem path as utf-8?
From: Beman Dawes (bdawes_at_[hidden])
Date: 2012-01-23 14:52:48


On Mon, Jan 23, 2012 at 9:28 AM, Yakov Galka <ybungalobill_at_[hidden]> wrote:

> On Mon, Jan 23, 2012 at 14:47, Beman Dawes <bdawes_at_[hidden]> wrote:
>
> > On Mon, Jan 23, 2012 at 4:46 AM, Yakov Galka <ybungalobill_at_[hidden]>
> > wrote:
> > [...]
> >
> > > Unfortunately it boils to the interface whence you can
> > > get a c_str() to a UTF-16 string only.
> >
> > That's not correct.
> >
>
> It's correct. I state that path::c_str() returns UTF-16 on Windows. It's a
> fact. So the encoding isn't an implementation detail but a part of the
> interface.
>

As quoted above, you said only that "...the interface whence you can get a
c_str() to a UTF-16 string only."

The interface includes multiple observers, which return values with various
encodings other than UTF-16. The return types from the observers allow
c_str() to access those values.

During the design discussions, two other alternatives were discussed. (1)
Always hold the path internally in a char string encoded UTF-8. The cost on
Windows is that a conversion has to be done before every file system
operation. The cost on POSIX is that a double conversion has to be done
before every file system operation if the encoding is not UTF-8. (2) Hold
two strings internally, one in the native type and encoding, the other in
UTF-8. The cost is trying to keep them in sync, with the conversions that
implies, for some definition of "in sync".

If class std::basic_string itself had better support for string
interoperability, class path would be able to side step at least some of
the conversion headaches.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk