|
Boost : |
Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-26 02:56:26
> From: Jeremy Maitin-Shepard <jeremy_at_[hidden]>
>
> The most significant complaint seems to be the fact that
> the translation interface is limited to ASCII (or maybe UTF-8
> is also supported, it isn't entirely clear).
>
> [snip]
>
> I imagine relative to the work required for the whole library,
> these changes would be quite trivial, and might very well
> transform the library from completely unacceptable to
> acceptable for a number of objectors on the list,
> while having essentially no impact on those that
> are happy to use the library as is.
>
I can say few words on what can be done and what will never be done.
I will never support wide, char16_t or char32_t strings as keys.
Current interface provides facet that has
template<typename CharType>
class messages_facet {
...
CharType const *get(int domain_id,char const *msg) const = 0.
...
And 2 or 4 types of it installed messages_facet<char>, messages_facet<wchar_t>,
messages_facet<char16_t> and messages_facet<char32_t>
Supporting
CharType const *get(int domain_id,char const *msg) const = 0.
CharType const *get(int domain_id,wchar_t const *msg) const = 0.
CharType const *get(int domain_id,char16_t const *msg) const = 0.
CharType const *get(int domain_id,char32_t const *msg) const = 0.
Is just waste of memory as each source string for fastest comparison
should be converted to 4 variants or converted in runtime... Wasteful.
Thus I would only consider supporting "char const *" literals.
One possibility is to provide per-domain basis a key in po file
"X-Boost-Locale-Source-Encoding" so user would be able to specify in
special record (which exists in all message catalogs) something
like:
"X-Boost-Locale-Source-Encoding: windows-936"
or
"X-Boost-Locale-Source-Encoding: UTF-8"
Then when the catalog would be loaded its keys would be converted
to the X-Boost-Locale-Source-Encoding.
So if you are MSVC user and you really want to have localized keys
you have following options:
Option A:
---------
source.cpp: // without bom windows-936 encoded
#pragma setlocale("Japanese_Japan.936")
translate("å¹³å"); // L"å¹³å" works well
wcout << translate("ãå¹³åã"); // convert in runtime from cp939 to UTF-16
cout << translate("ãå¹³åã"); // convert in runtime from cp939 to UTF-8
myprogram.po:
msgid ""
msgstr ""
"X-Boost-Locale-Source-Encoding: windows-936\n"
"Content-Type: charset=UTF-8\n"
msgid "å¹³å"
msgstr "ש×××"
# not translated
msgid "ãå¹³åã"
msgstr ""
Option B:
---------
source.cpp: // with BOM UTF-8 encoded, still windows-936 locale
#pragma setlocale("Japanese_Japan.936")
translate("å¹³å"); // MSVC would be actually cp936
// L"å¹³å" works well
wcout << translate("ãå¹³åã"); // convert in runtime from cp939 to UTF-16
cout << translate("ãå¹³åã"); // convert in runtime from cp939 to UTF-8
myprogram.po:
msgid ""
msgstr ""
"X-Boost-Locale-Source-Encoding: windows-936\n"
"Content-Type: charset=UTF-8\n"
msgid "å¹³å"
msgstr "ש×××"
# not translated
msgid "ãå¹³åã"
msgstr ""
Option C (in future C++11):
---------
source.cpp: // with BOM UTF-8 encoded
translate(u8"å¹³å"); // Would be utf-8
// L"å¹³å" works well
wcout << translate(u8"ãå¹³åã"); // convert in runtime from UTF-8 to UTF-16
cout << translate(u8"ãå¹³åã"); // convert just copy to the stream as is
myprogram.po:
msgid ""
msgstr ""
"Content-Type: charset=UTF-8\n"
# it would assume UTF-8 sources
msgid "å¹³å"
msgstr "ש×××"
# not translated
msgid "ãå¹³åã"
msgstr ""
Option D (works now):
---------
source.cpp: // without BOM, UTF-8 encoded
translate("å¹³å"); // MSVC would convert use it as UTF-8
// L"å¹³å" does not works!!
wcout << translate("ãå¹³åã"); // convert in runtime from UTF-8 to UTF-16
cout << translate("ãå¹³åã"); // convert just copy to the stream as is
myprogram.po:
msgid ""
msgstr ""
"Content-Type: charset=UTF-8\n"
# it would assume UTF-8 sources
msgid "å¹³å"
msgstr "ש×××"
# not translated
msgid "ãå¹³åã"
msgstr ""
This can be done and I can implement it.
But do not expect anything beyond this.
Also note that converting a message from cp936 to for example
windows-1255 (Hebrew narrow windows encoding) would swap out all
non-ASCII characters...
But this is developer's problem who had chosen to use non-ASCII
keys.
Artyom
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk