Boost logo

Boost :

Subject: Re: [boost] [locale] Composing asymmetric locale for character encoding conversion
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2013-03-02 16:42:53


On Sun, Mar 3, 2013 at 12:30 AM, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
>>________________________________
>> From: Andrey Semashev <andrey.semashev_at_[hidden]>
>>To: boost_at_[hidden]
>>Sent: Saturday, March 2, 2013 12:56 PM
>>Subject: [boost] [locale] Composing asymmetric locale for character encoding conversion
>>
>>Hi,
>>
>>Suppose I have a logging application that writes log records in wide
>>(wchar_t, UTF-16) and narrow (char, UTF-8) encodings and I want these
>>logs to be stored in a UTF-16LE encoded file. For simplicity, let's
>>assume that I write log files with std::wofstream. Now, the standard
>>says that the file stream buffer is supposed to convert wide
>>characters to byte sequences using the locale imbued into the buffer.
>
> In generally it is done by codecvt facet, but it id designed to covert
> wide characters to 8 bit encode and vise versa.
>
>>However, it seems that the locale should be the same as the one imbued
>>into the stream (basic_ostream::imbue makes sure of that).
>
> No you can install your own codecvt to existing locale object and than
> imbue it into the stream.

I'm not sure you understood. I was pointing out there are two locales
in the stream: the one in the stream and the one in the stream buffer.
And apparently, they should be the same.

>> What this
>>leads to is that in order to achieve my goal the locale should be able
>>to convert narrow characters of UTF-8 to wide characters of UTF-16 and
>>wide characters of UTF-16 to narrow characters representing byte
>>sequence of UTF16LE. Is it possible to make such an asymmetric locale
>>with Boost.Locale? Or maybe there is another way of doing this?
>>
>
> No, the stuff you are probably looking for is in an interface
> that provides both `std::basic_ostream<char>` and `std::basic_ostream<wchar_t>~

Hmm, why two streams? This will add operator<< ambiguity, won't it?

I can already output narrow strings to wide streams, which results in
character conversion (to UTF-16 wchar_t). The problem left is to
convert it to UTF-16LE byte sequence. I tried to create a locale that
would perform this conversion (I tried
boost::locale::generator()("en_US.UTF-16LE");) and it didn't work.
Does Boost.Locale support this kind of conversion?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk