Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling inC++ looklike inthe future [was Always treat ... ]
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-01-19 16:32:54


At Wed, 19 Jan 2011 23:02:02 +0200,
Peter Dimov wrote:
>
> Dave Abrahams wrote:
> ...
> > > OK. You're designing a portable library that talks to the OS. It has
> > > the following functions:
> > >
> > > T get_path( ... );
> > > void process_path( T );
> > >
> > > What do you use for T? string or utf8_string?
> >
> > I'm even less of an expert on encodings at the OS boundary than I am
> > on an expert on encodings in general, but I'll take a shot at this
> > one.
> >
> > OK, according to all the experts (like you), we should be trafficking
> > in UTF-8 everywhere, so I guess I'd say T is utf8_string (well, T is
> > boost::filesystem::path, but that begs the same questions, ultimately).
>
> My answer is different. T is std::string, and:
>
> - on POSIX OSes, this string is taken directly from the OS and given
> directly to the OS, without any conversion;
>
> - on Windows, this string is UTF-8 and is converted to UTF-16 before
> being given to the OS, and converted from UTF-16 after being received
> from it. This conversion should tolerate broken UTF-16 because the OS
> does so as well.

A fine answer if:

a. you think the interface to std::string is a good one for posterity,
   and

b. every other std::string that might be used along with your portable
   library is guaranteed to be utf-8 encoded.

But I don't agree with a), and the interface to std::string makes a
future where b) holds look highly unlikely to me.

I prefer to have semantic constraints/invariants like "this is UTF-8
encoded" represented in the type system and enforced by public library
interfaces. I'm arguing for a future like that.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk