Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-26 02:56:26


> From: Jeremy Maitin-Shepard <jeremy_at_[hidden]> > > The most significant complaint seems to be the fact that > the translation interface is limited to ASCII (or maybe UTF-8 > is also supported, it isn't entirely clear). > > [snip] > > I imagine relative to the work required for the whole library, > these changes would be quite trivial, and might very well > transform the library from completely unacceptable to > acceptable for a number of objectors on the list, > while having essentially no impact on those that > are happy to use the library as is. > I can say few words on what can be done and what will never be done. I will never support wide, char16_t or char32_t strings as keys. Current interface provides facet that has template<typename CharType> class messages_facet { ... CharType const *get(int domain_id,char const *msg) const = 0. ... And 2 or 4 types of it installed messages_facet<char>, messages_facet<wchar_t>, messages_facet<char16_t> and messages_facet<char32_t> Supporting CharType const *get(int domain_id,char const *msg) const = 0. CharType const *get(int domain_id,wchar_t const *msg) const = 0. CharType const *get(int domain_id,char16_t const *msg) const = 0. CharType const *get(int domain_id,char32_t const *msg) const = 0. Is just waste of memory as each source string for fastest comparison should be converted to 4 variants or converted in runtime... Wasteful. Thus I would only consider supporting "char const *" literals. One possibility is to provide per-domain basis a key in po file "X-Boost-Locale-Source-Encoding" so user would be able to specify in special record (which exists in all message catalogs) something like: "X-Boost-Locale-Source-Encoding: windows-936" or "X-Boost-Locale-Source-Encoding: UTF-8" Then when the catalog would be loaded its keys would be converted to the X-Boost-Locale-Source-Encoding. So if you are MSVC user and you really want to have localized keys you have following options: Option A: --------- source.cpp: // without bom windows-936 encoded #pragma setlocale("Japanese_Japan.936") translate("平和"); // L"平和" works well wcout << translate("「平和」"); // convert in runtime from cp939 to UTF-16 cout << translate("「平和」"); // convert in runtime from cp939 to UTF-8 myprogram.po: msgid "" msgstr "" "X-Boost-Locale-Source-Encoding: windows-936\n" "Content-Type: charset=UTF-8\n" msgid "平和" msgstr "שלום" # not translated msgid "「平和」" msgstr "" Option B: --------- source.cpp: // with BOM UTF-8 encoded, still windows-936 locale #pragma setlocale("Japanese_Japan.936") translate("平和"); // MSVC would be actually cp936 // L"平和" works well wcout << translate("「平和」"); // convert in runtime from cp939 to UTF-16 cout << translate("「平和」"); // convert in runtime from cp939 to UTF-8 myprogram.po: msgid "" msgstr "" "X-Boost-Locale-Source-Encoding: windows-936\n" "Content-Type: charset=UTF-8\n" msgid "平和" msgstr "שלום" # not translated msgid "「平和」" msgstr "" Option C (in future C++11): --------- source.cpp: // with BOM UTF-8 encoded translate(u8"平和"); // Would be utf-8 // L"平和" works well wcout << translate(u8"「平和」"); // convert in runtime from UTF-8 to UTF-16 cout << translate(u8"「平和」"); // convert just copy to the stream as is myprogram.po: msgid "" msgstr "" "Content-Type: charset=UTF-8\n" # it would assume UTF-8 sources msgid "平和" msgstr "שלום" # not translated msgid "「平和」" msgstr "" Option D (works now): --------- source.cpp: // without BOM, UTF-8 encoded translate("平和"); // MSVC would convert use it as UTF-8 // L"平和" does not works!! wcout << translate("「平和」"); // convert in runtime from UTF-8 to UTF-16 cout << translate("「平和」"); // convert just copy to the stream as is myprogram.po: msgid "" msgstr "" "Content-Type: charset=UTF-8\n" # it would assume UTF-8 sources msgid "平和" msgstr "שלום" # not translated msgid "「平和」" msgstr "" This can be done and I can implement it. But do not expect anything beyond this. Also note that converting a message from cp936 to for example windows-1255 (Hebrew narrow windows encoding) would swap out all non-ASCII characters... But this is developer's problem who had chosen to use non-ASCII keys. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk