Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8 codecvt facet
From: Beman Dawes (bdawes_at_[hidden])
Date: 2015-10-09 10:17:51

> To be honest I don't know what guys who designed <codecvt> in first place

It was done in the early and mid 1990's, with primary input coming from
Asian national bodies and the now long gone Unix vendors who had a big
presence in that market.

thought of - I feel string influence of broken MS Unicode policies

This was years before Microsoft folks started to participate in the LWG.

> So I'm not going to implement C++11 <codecvt> because IMHO it is broken by
> design in first
> place.

Header <codecvt> isn't what we need, as you point out below.

> Boost.Locale provides one but currently it is deep internal and complex
> part of library.
> The code I written for Boost.Nowide or one I suggest to put into
> Boost.Locale header-only part
> is codecvt that converts between utf8 and utf-16/32 according to size of
> character:
> boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows)
> utf-32 (posix)

Don't forget utf-8 to utf-8 (some embedded systems).

IMO, a critical aspect of all of those, including utf-8 to utf-8, is that
they detect all utf-8 errors since ill-formed utf-8 is used as an attack

See Markus Kuhn's

I can contribute a Boost regression test friendly version of Kuhn's
malformed tests.


Boost list run by bdawes at, gregod at, cpdaniel at, john at