Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-01-15 08:54:35

On 15/01/2011 14:29, Artyom wrote:
>>> namespace boost {
>>> std::string utf8_to_ansi(std::string const&s);
>>> std::string ansi_to_utf8(std::string const&s);
>>> std::wstring utf8_to_wide(std::string const&s);
>>> std::string wide_to_utf8(std::wstring const&s);
>>> }
>> ANSI doesn't really mean much.
>> It's purely a windows thing.
>> utf8_to_locale, which would take a std::locale object, would make more
>> sense.
> 1. std::locale based conversion using std::codecvt facet strongly depends on
> current
> implementation and this is bad point to start from.

It is "reasonably reasonable" to assume the wide character locale is
UTF-16 or UTF-32.
Some IBM mainframes are the only ones where this is not the case as far
as I know.

Therefore you can portably convert a locale to UTF-8 by using
std::codecvt<char, wchar_t> to convert it to UTF-16 or UTF-32,
converting that UTF-16 to UTF-32 if needed, then convert it back to UTF-8.

That's, of course, not exactly very efficient, especially when you're
unable to pipeline those conversions.

> 2. These utf8_to_ansi and backwards should not be used outside windows scope,
> where ANSI means
> narrow windows API (a.k.a. ANSI API)

Good code is code that doesn't expose platform-specific details.

The name ANSI is so bad (it means American National Standards Institute,
even though Windows locales have nothing to do with that body) that I'd
rather not put that in any function I'd use in real code.

> 3. Under non-windows platform that should do anything to strings and pass them
> as is as
> native POSIX api is narrow and not wide.

Yet you still need to convert between UTF-8 and the POSIX locales.
Even if most recent POSIX systems use UTF-8 as their locale, there is no
guarantee of that.
Indeed, quite a few still run in latin-1.

Boost list run by bdawes at, gregod at, cpdaniel at, john at