Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2000-08-22 01:19:09


on 8/18/00 9:37 PM, Karl Nelson at kenelson_at_[hidden] wrote:

> With the recent conversion of gtk+ to UTF-8 for i18n issues, I have been
> writing some classes to assist in use of C++ libraries. (As I am the
> maintainer of Gtk-- the C++ bindings for gtk+.) The issue specifically is that
> streams do not work well for i18n.
>
> Streams have a bad tendency to cause pieces by breaking up text. This presents
> a problem for translation as phrases often must be translated as one unit with
> gettext. Further, the use of a stream fixes the order of which variables
> will be displayed which makes translation extremely awkward for languages
> where the order does not match that of the original program. As a result most
> codes which are designed for i18n can't use streams.
[SNIP]

The fact that you're reading UTF-8 has nothing to do with your real problem:
how to print messages such that the programming takes international
differences in order into account. As plenty of people have responded about
the real problem, I'm going to discuss the UTF-8 situation.

To simplify matters, let's use the popular computer convention of "char"
being one octet and "wchar_t" being two octets in size. If they're not
those sizes, use more appropriate types (look at cstdint.hpp). The UTF-8
stream you have works in "char," but you should really be working with the
final representation in "wchar_t;" by the time we get to your formatting
problem, the UTF-8 stuff should already be taken care of with your
formatting solution ignorant of it.

To get the true Unicode stream, make a basic_streambuf based of "wchar_t"
and (possibly) a Unicode-customized char_traits class. Have this streambuf
take any "char"-based streambuf as a parameter member. Your custom
streambuf should read "char" data from the inner streambuf, assumed to be
UTF-8 octets, and convert it to Unicode "wchar_t" data. Make the reverse
provisions for writing. You could also add a locale with "wchar_t"-based
and Unicode-customized facets.

The streams that will be presented to your formatting problem will be for
type "wchar_t" with the special streambuf, char_traits, and locale given
above. You may want to make sure that the formatting solution has enough
templates to deal with any character type.

-- 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk