Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-01-14 12:37:23


Dave Abrahams wrote:
> Let me try asking it differently: how do I program in an environment that
> has both "right" and "wrong" libraries?

There's really no good answer to that; it's, basically, a mess. You could
use UTF-8 everywhere in your code, pass that to "right" libraries as-is, and
only pass wchar_t[] to "wrong" libraries and the OS. This doesn't work when
the "wrong" libraries or the OS don't have a wide API though. And there's no
standard way of being wrong; some libraries use the OS narrow API, some
convert to wchar_t[] internally and use the wide API, using a variety of
encodings - the OS default (and there can be more than one), the C locale,
the C++ locale, or a global encoding that can be set per-library. It's even
more fun when supposedly portable libraries use different decoding
strategies depending on the platform.

> Also, is there any use in trying to get the difference into the type
> system, e.g. by using some kind of wrapper over std::string that gives it
> a distinct "utf-8" type?

This could help; a hybrid right+wrong library ought probably be able to take
either utf8_string or non_utf8_string, with the latter using who-knows-what
encoding. :-)

The "bite the bullet" solution is just to demand "right" libraries and use
UTF-8 throughout.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk