Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost's broken UTF-8 codecvt facet
From: Peter Dimov (lists_at_[hidden])
Date: 2015-10-08 15:14:48


Artyom Beilis wrote:
> The code I written for Boost.Nowide or one I suggest to put into
> Boost.Locale header-only part is codecvt that converts between utf8 and
> utf-16/32 according to size of character:
>
> boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows)
> utf-32 (posix)
> boost::(nowide|or locale)::utf8_facet<char16_t> - utf-8 to utf-16 on any
> platform
> boost::(nowide|or locale)::utf8_facet<char32_t> - utf-8 to utf-32 on any
> platform
>
> That's it. It isn't <codecvt> because C++11 <codecvt> does not actually do
> the job needed.

I agree that this makes the most sense. I only brought up <codecvt> because
if we used the standard interface and names we wouldn't have needed a full
review of the hypothetical libs/codecvt.

As this stands, libs/utility seems the best bet, although I'm not overly
fond of the practice of putting everything that doesn't fit elsewhere into
Utility. :-) But it's better than Detail because it's documented and tested.

One could make the case for libs/utf8 which would contain utf8_facet and the
"obvious"

    bool is_valid_utf8( string const & s );
    wstring utf8_decode( string const & s );
    string utf8_encode( wstring const & s );

but this is already well into full review/bikeshed territory.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk