Boost logo

Boost :

Subject: [boost] [Locale] New utf-8 codecvt facet in master (replacement of boost/details/utf8_codect_facet)
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2015-10-21 02:02:31


Hello,

Following previous discussion regarding utf-8 facet in boost.

I merged the changes to master branch. New utf8 codecvt facet that properly handles
both UTF-16 and UTF-32 encoding for wchar_t (or char16_t/char32_t) is there.

The major goal is to replace existing broken (*) utf8 facet existing today in
boost/details/utf8_codecvt_facet.hpp/ipp

It is implemented in header only so all you need is to include

    #include <boost/locale/utf8_facet.hpp>

And install it as usual:

    std::locale new_locale(std::locale(),new boost::locale::utf8_codecvt<wchar_t>());

It *does not require* a separate compilation part like the one in details

Note it is implemented in terms of boost::locale::generic_codecvt

    template<typename CharType,typename CodecvtImpl,int CharSize=sizeof(CharType)>
    class generic_codecvt;

That has non-trivial specialization for CharSize=2 and CharSize=4 for UTF-16
and UTF-32 wchar_t/char16_t/char32_t character handling.

boost::locale::generic_codecvt provides an interface for creating a range
of facets for various character encodings. For example boost.locale
uses it to implement various facets:

- utf8 codecvt
- single byte character set like ISO-8859-* or Windows-125*
- wrap ICU ucnv_* and POSIX iconv APIs to create standard codecvt facet.

That is why I decided to keep the implementation withing Boost.Locale library
as the place that actually deals with different encoding.

-----------

Once boost 1.60 will be released I encourage every library maintainer
that incorporates broken boost/details/utf8_codecvt_facet.*pp to replace
one with proper one from boost.locale

Note: it is HEADER ONLY part and does not require any part of compiled
library.
 

Artyom Beilis

(*) Current implementation does not handle utf-16 properly and can actually produce
    invalid utf-8


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk