Boost logo

Boost :

Subject: Re: [boost] [locale] Composing asymmetric locale for character encoding conversion
From: Jan Hudec (bulb_at_[hidden])
Date: 2013-03-03 16:32:23


On Sat, Mar 02, 2013 at 14:56:52 +0400, Andrey Semashev wrote:
> Suppose I have a logging application that writes log records in wide
> (wchar_t, UTF-16)

wchar_t does not have to be UTF-16. On most non-Windows platforms it is
UCS-4.

The standard also seems to expect each wchar_t to contain complete codepoint,
which isn't the case with UTF-16, so UTF-16 isn't supported. That said
everybody uses it as UTF-16 on Windows, because Microsoft jumped on the
Unicode bandwagon too fast and baked 2-byte wchar_t into the API so that
using UTF-16 is now the only option to support unicode after 2.0 there.

> and narrow (char, UTF-8) encodings and I want these
> logs to be stored in a UTF-16LE encoded file. For simplicity, let's
> assume that I write log files with std::wofstream. Now, the standard
> says that the file stream buffer is supposed to convert wide
> characters to byte sequences using the locale imbued into the buffer.

Yes, right. And the `operator<<(std::wostream &, const char *)` uses the
locale imbued in the stream.

> However, it seems that the locale should be the same as the one imbued
> into the stream (basic_ostream::imbue makes sure of that).

Now why do you think? basic_ios::imbue makes it the *default*, but I don't
think it forbids overriding the buffer locale.

> What this
> leads to is that in order to achieve my goal the locale should be able
> to convert narrow characters of UTF-8 to wide characters of UTF-16 and
> wide characters of UTF-16 to narrow characters representing byte
> sequence of UTF16LE. Is it possible to make such an asymmetric locale
> with Boost.Locale? Or maybe there is another way of doing this?

It's not needed. Just imbue two different locales. You only have to be
careful about the order, because the stream overwrites the buffer's locale.

As I said above, wchar_t does not have to be utf-16, so the buffer needs to
use locale with codecvt_utf16 facet and the stream needs to use locale with
codecvt_utf8 facet.

Alternatively you can use boost::iostreams::file_sink wrapped in explicit
boost::iostreams::code_converter using codecvt_utf16 and imbue the outer
stream with codecvt_utf8.

> An additional question. Is it possible to to achieve my goal with
> std::ofstream (as opposed to std::wofstream)? I have a very strong
> suspicion that the answer is no because the narrow characters will
> pass on unconverted to the file instead of being translated from UTF-8
> to UTF-16LE, but maybe I'm missing something.

All streams accept their character type and plain char, but not other
character types. So you can't write wide string into narrow stream at all.

-- 
						 Jan 'Bulb' Hudec <bulb_at_[hidden]>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk