Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet
From: Peter Dimov (lists_at_[hidden])
Date: 2015-10-09 10:41:12


Beman Dawes wrote:

> IMO, a critical aspect of all of those, including utf-8 to utf-8, is that
> they detect all utf-8 errors since ill-formed utf-8 is used as an attack
> vector.

That is what I alluded to earlier with my bikeshedding comment - I
personally find this policy a bit too firm for my taste. Sure, sometimes I
do want to reject any invalid UTF-8 with extreme prejudice, but at other
times I do not. For instance, when I get a Windows file name, it can well be
invalid UTF-16, which when converted will become invalid UTF-8 but which
will roundtrip correctly back to its original invalid UTF-16 form and refer
to the same file. That's why things like CESU-8 or WTF-8 exist.

So I like the "method" argument of locale::conv::utf_to_utf, except that I
think that it doesn't offer enough control.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk