Subject: Re: [boost] [general] What will string handling inC++ looklike inthe future [was Always treat ... ]
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-01-19 16:02:02
Dave Abrahams wrote:
> > OK. You're designing a portable library that talks to the OS. It has
> > the following functions:
> > T get_path( ... );
> > void process_path( T );
> > What do you use for T? string or utf8_string?
> I'm even less of an expert on encodings at the OS boundary than I am
> on an expert on encodings in general, but I'll take a shot at this
> OK, according to all the experts (like you), we should be trafficking
> in UTF-8 everywhere, so I guess I'd say T is utf8_string (well, T is
> boost::filesystem::path, but that begs the same questions, ultimately).
My answer is different. T is std::string, and:
- on POSIX OSes, this string is taken directly from the OS and given
directly to the OS, without any conversion;
- on Windows, this string is UTF-8 and is converted to UTF-16 before being
given to the OS, and converted from UTF-16 after being received from it.
This conversion should tolerate broken UTF-16 because the OS does so as