Boost logo

Boost :

Subject: Re: [boost] [locale] Composing asymmetric locale for character encoding conversion
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2013-03-05 03:13:50


On Mon, Mar 4, 2013 at 1:32 AM, Jan Hudec <bulb_at_[hidden]> wrote:
> On Sat, Mar 02, 2013 at 14:56:52 +0400, Andrey Semashev wrote:
>> Suppose I have a logging application that writes log records in wide
>> (wchar_t, UTF-16)
>
> wchar_t does not have to be UTF-16. On most non-Windows platforms it is
> UCS-4.
>
> The standard also seems to expect each wchar_t to contain complete codepoint,
> which isn't the case with UTF-16, so UTF-16 isn't supported. That said
> everybody uses it as UTF-16 on Windows, because Microsoft jumped on the
> Unicode bandwagon too fast and baked 2-byte wchar_t into the API so that
> using UTF-16 is now the only option to support unicode after 2.0 there.

Yes, I'm aware of that. I have Windows in mind.

>> However, it seems that the locale should be the same as the one imbued
>> into the stream (basic_ostream::imbue makes sure of that).
>
> Now why do you think? basic_ios::imbue makes it the *default*, but I don't
> think it forbids overriding the buffer locale.

Come to think of it, you may be right. I cannot find any further
indication of that the same locale is expected.

>> What this
>> leads to is that in order to achieve my goal the locale should be able
>> to convert narrow characters of UTF-8 to wide characters of UTF-16 and
>> wide characters of UTF-16 to narrow characters representing byte
>> sequence of UTF16LE. Is it possible to make such an asymmetric locale
>> with Boost.Locale? Or maybe there is another way of doing this?
>
> It's not needed. Just imbue two different locales. You only have to be
> careful about the order, because the stream overwrites the buffer's locale.
>
> As I said above, wchar_t does not have to be utf-16, so the buffer needs to
> use locale with codecvt_utf16 facet and the stream needs to use locale with
> codecvt_utf8 facet.
>
> Alternatively you can use boost::iostreams::file_sink wrapped in explicit
> boost::iostreams::code_converter using codecvt_utf16 and imbue the outer
> stream with codecvt_utf8.

All these assume the availability of codecvt_utf16 from C++11
(codecvt_utf8 can be replaced with Boost.Locale-generated facet, I
guess). Also, there seem to be no codecvt_utf32 for some reason, in
case if I wanted to write UTF-32 encoded files.

As far as I can see, Boost.Locale does not provide C++11 codecvt
facets. Is that right? Is this support planned?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk