Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost's broken UTF-8 codecvt facet
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2015-10-08 16:32:43


> I meant:
>
> https://github.com/boostorg/serialization/blob/develop/test/test_utf8_codecvt.cpp
 
I adopted it and It passes - no problem there.

> creating of code convert facets including
> documentation about what they
> are and how
> to use them.  Examples of usage would be the new
> utf8codecvt
> facet we're talking
> about.
 
How can I say this gently. codecvt facet is used there because it is "Standard"
way of doing things and it is far from being flawless [1] but it exists and it
is the ultimate way to convert between encodings in C++.

std::locale is complex stuff with many issues by design including its codecvt
facet - they are both hard to create and use.

Most of users don't really need them - ideally you just run

std::locale::global(std::locale(""))

And everything just works for any stuff that needs to handle encoding.

But in reality it does not. So you need to put workarounds and create stuff like utf8 facet
because some standard libraries on some very well known operating system do
not support UTF-8 locales. And consider it "confusing" that std::string
is accidentally becomes utf-8 encoded string.

All the char const *str = u8"привет-שלום" stuff to encode UTF-8 string was born in sin and probably
die this way because of some specific vendors that ignore what the world had learned well.

 So ideally end users should not care about codecvt - that is why originally in nowide
there is just a function called boost::nowide::nowide_filesystem()

And magic happens.

The problem to understand how the magic works you need to learn a lots of things and
simple tutorial isn't enough - even entire library like Boost.Nowide isn't always enough.

Regards,

  Artyom Beilis

[1] One of the sings it does not allow to implement stateful encodings (such that can compose and decompose some characters
the way iconv does)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk