Boost logo

Boost :

Subject: [boost] [nowide] Library Updates and Boost's broken UTF-8 codecvt facet
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2015-10-07 10:49:39

Some updated regarding boost.nowide
1. Library moved to github and its format is converted to modular boost layout: Fixed unsupported std::ios::ate flag by boost::nowide::fstream3. Added some C++11 interfaces to boost::nowide::fstream4. Added integration functionality with boost::filesystem:
And another important update is that I implemented proper utf8 to utf-16/utf-32 codecvt facet

It implemented as template version working with wchar_t, char16_t and char32_t.
Now I explain.
There is widely used utf8 codecvt facet in various parts of code:

However it is buggy and actually broken for 3 reasons:
1. It supports UCS-2 instead of UTF-16 - i.e it does not code properly Unicode characters outside BMP i.e. code points with values abouve 0xFFFF2. It allows invalid code points in UTF-32/UCS-4 i.e. above 10FFFF or that are reserved for surrogate pairs of UTF-163. And actually allows UTF-8 sequences longer than 4 (which is wrong)
As a result, for example you can't use boost::filesystem::path with characters like "𝒞" U+1D49E or may actually create wrong encodings trying to read/write filesystem objects.
Independently of Boost.Nowide I would like to propose replacement of boost/detail/utf8_codecvt_facet by one thatactually takes in account proper Unicode handling.

Artyom Beilis--------------CppCMS - C++ Web Framework: - C++ SQL Connectivity:

Boost list run by bdawes at, gregod at, cpdaniel at, john at