Boost logo

Boost :

Subject: Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8 codecvt facet
From: Robert Ramey (ramey_at_[hidden])
Date: 2015-10-08 12:07:26


On 10/8/15 7:54 AM, Artyom Beilis wrote:
> ----- Original Message -----
>
> [BEGIN: Long description regarding <codecvt> ]
>
...
> So... Boost community - please give yourself a favor Don't use <codecvt> unless you really
> understand what are you doing.

Well, I use <codecvt> and boost::utf8_codecvt and I definitely don't
know what I'm doing. That (and the fact that I don't have any extra
time) is the reason for using a library in first place.

The whole, locale/facet/codecvt saga is long and very difficult to
fathom. To make things worse it has a tortured history of library
writers not getting it right. If one looks at the utf_codecvt facet
there's lot's of workaround for older compilers and libraries. So it's
high time this be rationalized. I think the concept has merit and would
do well with a good library and educational documentation to match.

>
> [END: Long description regarding <codecvt> ]
>
>
> If you want to covert utf8 files properly to native wide character like for example for boost::filesystem,
>
> boost::serialization or std::fstream you need to use facet that converts to utf-16 or utf-32
> according to what wchar_t holds and <codecvt> does not provide one (without platform specific tricks)

I see that, but we could easily select which codecvt facet depending on
the size of the wchar on the specific platform. I dislike libraries
which do "too much" in order to "just" work. codecvt library should be

a) A tool kit ot create codecvt facets
b) some generated examples which will cover what most users need
c) a bunch of tutorial information about how codecvt can be used -
especially outside of stream i/o
d) anything else which is useful.

Note I'm aware that this is a huge task to do right - I certainly
wouldn't blame anyone for not taking it on.

>
> So I'm not going to implement C++11 <codecvt> because IMHO it is broken by design in first
> place.

Hmm - I'd have to think more about this. If <codecvt> is ill concieved
- I'm sure one could propose an alternative.

>
> Boost.Locale provides one but currently it is deep internal and complex part of library.

Hmmm - very interesting. Maybe it's a question of factoring out this
part and repackaging it in a more digestible form. That would be
interesting.

> The code I written for Boost.Nowide or one I suggest to put into Boost.Locale header-only part
> is codecvt that converts between utf8 and utf-16/32 according to size of character:

> boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows) utf-32 (posix)
> boost::(nowide|or locale)::utf8_facet<char16_t> - utf-8 to utf-16 on any platform
> boost::(nowide|or locale)::utf8_facet<char32_t> - utf-8 to utf-32 on any platform
>
> That's it. It isn't <codecvt> because C++11 <codecvt> does not actually do the job needed.

I'll have to take your word for it.
>
>
>
> Artyom Beilis
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk