Boost logo

Boost :

From: Karl Nelson (kenelson_at_[hidden])
Date: 2000-08-22 10:49:26


>
> The fact that you're reading UTF-8 has nothing to do with your real problem:
> how to print messages such that the programming takes international
> differences in order into account. As plenty of people have responded about
> the real problem, I'm going to discuss the UTF-8 situation.

My problem may not be representative. The library for which I write a wrapper
uses UTF-8 and all interfaces expect char*. There is no concept of wide
strings here at all. Everything is in multibyte format. To make
matters worse the users of the wrapper will be getting this passed back
to them as strings and be expected to pass strings back to all interfaces.
Unless the users desire to widen the string to wstring for their uses
none of my code will see it. Thus at least for my problem the
users will have to deal with UTF-8 with their internal strings or
call a converter to interact to mine. Of course if they widen the
strings themselves and wish to use format that should work properly as
well and thus some template version is needed.

> To simplify matters, let's use the popular computer convention of "char"
> being one octet and "wchar_t" being two octets in size. If they're not
> those sizes, use more appropriate types (look at cstdint.hpp). The UTF-8
> stream you have works in "char," but you should really be working with the
> final representation in "wchar_t;" by the time we get to your formatting
> problem, the UTF-8 stuff should already be taken care of with your
> formatting solution ignorant of it.

Fortunately, UTF-8 is close enough to ASCII that my code is basically
ignorant of it. If this was Shift-JIS or some other format where
the meaning of the text may change based on switches then the format
would likely have to be more aware.

 
> To get the true Unicode stream, make a basic_streambuf based of "wchar_t"
> and (possibly) a Unicode-customized char_traits class. Have this streambuf
> take any "char"-based streambuf as a parameter member. Your custom
> streambuf should read "char" data from the inner streambuf, assumed to be
> UTF-8 octets, and convert it to Unicode "wchar_t" data. Make the reverse
> provisions for writing. You could also add a locale with "wchar_t"-based
> and Unicode-customized facets.

Neither of these are good solutions at this time in gcc/Unix world.
There are other STL implementations that you can drop in but
the STL the majority of users have is totally defective. This
is temporary in that gcc within the next year will have charT streams.
(or at least that is my current understanding.)

> The streams that will be presented to your formatting problem will be for
> type "wchar_t" with the special streambuf, char_traits, and locale given
> above. You may want to make sure that the formatting solution has enough
> templates to deal with any character type.

I agree this will be true when the streams come to gcc land.
However, for my kit because of the interaction with the lower layer
mine will really be using basic_format<char> with no fancy options.
I am perfectly happy to alter my code to include the templates
for other boost users thought obviously because of my application
I won't be using the template version myself.

--Karl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk