Boost logo

Boost :

Subject: [boost] [nowide] Library Updates and Boost's broken UTF-8 codecvt facet
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2015-10-07 10:49:39


Some updated regarding boost.nowide
1. Library moved to github and its format is converted to modular boost layout: https://github.com/artyom-beilis/nowide2. Fixed unsupported std::ios::ate flag by boost::nowide::fstream3. Added some C++11 interfaces to boost::nowide::fstream4. Added integration functionality with boost::filesystem: https://github.com/artyom-beilis/nowide/blob/master/include/boost/nowide/integration/filesystem.hpp
And another important update is that I implemented proper utf8 to utf-16/utf-32 codecvt facet
https://github.com/artyom-beilis/nowide/blob/master/include/boost/nowide/utf8_codecvt.hpp

It implemented as template version working with wchar_t, char16_t and char32_t.
Now I explain.
There is widely used utf8 codecvt facet in various parts of code:
https://github.com/boostorg/detail/blob/master/include/boost/detail/utf8_codecvt_facet.hpp
https://github.com/boostorg/detail/blob/master/include/boost/detail/utf8_codecvt_facet.ipp

However it is buggy and actually broken for 3 reasons:
1. It supports UCS-2 instead of UTF-16 - i.e it does not code properly Unicode characters outside BMP i.e. code points with values abouve 0xFFFF2. It allows invalid code points in UTF-32/UCS-4 i.e. above 10FFFF or that are reserved for surrogate pairs of UTF-163. And actually allows UTF-8 sequences longer than 4 (which is wrong)
As a result, for example you can't use boost::filesystem::path with characters like "𝒞" U+1D49E or may actually create wrong encodings trying to read/write filesystem objects.
Independently of Boost.Nowide I would like to propose replacement of boost/detail/utf8_codecvt_facet by one thatactually takes in account proper Unicode handling.

Artyom Beilis--------------CppCMS - C++ Web Framework:   http://cppcms.com/CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk