Boost logo

Boost :

Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Soares Chen Ruo Fei (crf_at_[hidden])
Date: 2011-04-19 10:31:37


Ryou Ezoe wrote:
> What I want is translate() accept wchar_t const * and std::wstring as
> a parameter. just like it accept char const * and std::string.
> Then, it return the corresponding translated text.
> Although the encoding of wchar_t is unspecified in the Standard.
> In the current MS-Windows environment, it should be treated as UTF-16.
>
> Converting it to UTF-8 is a implementation details.
> I don't care which UTF it internally use.
> As long as it support real UCS(all code points defined in UCS)
>
> But treating it as UCS rather than binary string is better.
>
> Assuming we have C++0x compiler and encoding of wchar_t is UTF-16,
> translate(u8"text"), translate(u"text"), translate(U"text") and
> translate(L"text")
> all returns the same mapped translated text according to the locale.
> This is a good.

I suppose that you are probably fine with the requirement that the
supplied text must be in one of the Unicode encodings, because
otherwise translating from text in shift-JIS or arbitrary encodings is
probably be a mess from a technical perspective.

I think that what we really need is to enforce the character set used
in Boost.Locale, not the language. It just happen that Artyom chose
the ASCII character set which don't support most other languages. I
don't see any technical reasons to enforce the language used for
translating, but there are many technical reasons to enforce a
particular encoding. We can just change the encoding used from ASCII
to UCS, and that wouldn't technically make much difference. The only
problem for using Unicode as the translation key is the normalization
issues. Since normalization is too heavyweight, the translation system
should probably operate at code point level, though translations of
identical original text with different code points will then fail.

I have one suggestion to overcome GNU Gettext's limitation. Perhaps we
can automatically convert the text into Unicode escaped sequences
before passing to GNU Gettext, so "日本語" in UTF-8 will become
"\\u65E5\\u672C\\u8A9E" in ASCII.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk