Boost logo

Boost :

From: Edward Diener (eddielee_at_[hidden])
Date: 2003-04-23 12:56:17

Beman Dawes wrote:
> At 09:35 PM 4/22/2003, David Abrahams wrote:
> >Beman Dawes <bdawes_at_[hidden]> writes:
> >
> >> Remember that the C++ committee includes active long-time members
> from >> Japan, and that as one of the ten or twelve voting
> delegations to the >> WG21 ISO portion of the committee, their views
> carry a great deal of >> weight. Even if everyone else is asleep,
> the Japanese delegation will >> politely remind us of the importance
> of internationalization, and >> particularly character-width, issues.
> >
> >On the other hand, everyone I've spoken to who's had to do serious
> >internationalization for Japanese environments has said that Unicode
> >was ultimately useless for them -- what they really needed was to be
> >able to deal with the various variable-length encoding schemes that
> >have evolved there over the years.
> I think what they mean is that the UTF-32 or UTF-16 encodings are
> useless
> to them as an external encoding of Unicode. Rather, they need UTF-8,
> shift-JIS, or other MBCS external encodings of Unicode.
> That doesn't mean that UTF-32 or UTF-16 Unicode encodings are useless
> as internal program data types. In fact, the main complaint about
> wchar_t is that it doesn't handle UTF-32 or UTF-16 reliably. I'm
> under the impression that Mori, the author of the C language Unicode
> TR proposal which is
> supposed to remedy that, is from Japan and that his proposal has wide
> support there.

My offhand guess, from my own small experience in dealing with
internationalization issues for a program which had to run in Japan, is that
MBCS and other non-Unicode encodings were used in Japanese programming for
many years before Unicode encodings like UTF-16 and UTF-32 became a reality.
Now with Unicode able to handle the various Japanese ideographic sets, the
Japanese themselves do not want to move to Unicode encodings in their
programs as they have become so used to identifying their previous
encodings. But I am no expert in this area and I am sure others have the
definitive reason why UTF-16 and UTF-32 are useless as external encodings.

I still feel that a fixed width Unicode encoding has to be an advance over
variable width encodings like MBCS for any character set.

Boost list run by bdawes at, gregod at, cpdaniel at, john at