Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-26 02:56:26

Next message: Kohei Takahashi: "[boost] [Config] A proposal to Boost.Config."
Previous message: Rutger ter Borg: "Re: [boost] [Memory Managed Pointer] Review Request"
In reply to: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"
Next in thread: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"
Reply: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"

> From: Jeremy Maitin-Shepard <jeremy_at_[hidden]> > > The most significant complaint seems to be the fact that > the translation interface is limited to ASCII (or maybe UTF-8 > is also supported, it isn't entirely clear). > > [snip] > > I imagine relative to the work required for the whole library, > these changes would be quite trivial, and might very well > transform the library from completely unacceptable to > acceptable for a number of objectors on the list, > while having essentially no impact on those that > are happy to use the library as is. > I can say few words on what can be done and what will never be done. I will never support wide, char16_t or char32_t strings as keys. Current interface provides facet that has template<typename CharType> class messages_facet { ... CharType const *get(int domain_id,char const *msg) const = 0. ... And 2 or 4 types of it installed messages_facet<char>, messages_facet<wchar_t>, messages_facet<char16_t> and messages_facet<char32_t> Supporting CharType const *get(int domain_id,char const *msg) const = 0. CharType const *get(int domain_id,wchar_t const *msg) const = 0. CharType const *get(int domain_id,char16_t const *msg) const = 0. CharType const *get(int domain_id,char32_t const *msg) const = 0. Is just waste of memory as each source string for fastest comparison should be converted to 4 variants or converted in runtime... Wasteful. Thus I would only consider supporting "char const *" literals. One possibility is to provide per-domain basis a key in po file "X-Boost-Locale-Source-Encoding" so user would be able to specify in special record (which exists in all message catalogs) something like: "X-Boost-Locale-Source-Encoding: windows-936" or "X-Boost-Locale-Source-Encoding: UTF-8" Then when the catalog would be loaded its keys would be converted to the X-Boost-Locale-Source-Encoding. So if you are MSVC user and you really want to have localized keys you have following options: Option A: --------- source.cpp: // without bom windows-936 encoded #pragma setlocale("Japanese_Japan.936") translate("å¹³å’Œ"); // L"å¹³å’Œ" works well wcout << translate("ã€Œå¹³å’Œã€"); // convert in runtime from cp939 to UTF-16 cout << translate("ã€Œå¹³å’Œã€"); // convert in runtime from cp939 to UTF-8 myprogram.po: msgid "" msgstr "" "X-Boost-Locale-Source-Encoding: windows-936\n" "Content-Type: charset=UTF-8\n" msgid "å¹³å’Œ" msgstr "×©×œ×•×" # not translated msgid "ã€Œå¹³å’Œã€" msgstr "" Option B: --------- source.cpp: // with BOM UTF-8 encoded, still windows-936 locale #pragma setlocale("Japanese_Japan.936") translate("å¹³å’Œ"); // MSVC would be actually cp936 // L"å¹³å’Œ" works well wcout << translate("ã€Œå¹³å’Œã€"); // convert in runtime from cp939 to UTF-16 cout << translate("ã€Œå¹³å’Œã€"); // convert in runtime from cp939 to UTF-8 myprogram.po: msgid "" msgstr "" "X-Boost-Locale-Source-Encoding: windows-936\n" "Content-Type: charset=UTF-8\n" msgid "å¹³å’Œ" msgstr "×©×œ×•×" # not translated msgid "ã€Œå¹³å’Œã€" msgstr "" Option C (in future C++11): --------- source.cpp: // with BOM UTF-8 encoded translate(u8"å¹³å’Œ"); // Would be utf-8 // L"å¹³å’Œ" works well wcout << translate(u8"ã€Œå¹³å’Œã€"); // convert in runtime from UTF-8 to UTF-16 cout << translate(u8"ã€Œå¹³å’Œã€"); // convert just copy to the stream as is myprogram.po: msgid "" msgstr "" "Content-Type: charset=UTF-8\n" # it would assume UTF-8 sources msgid "å¹³å’Œ" msgstr "×©×œ×•×" # not translated msgid "ã€Œå¹³å’Œã€" msgstr "" Option D (works now): --------- source.cpp: // without BOM, UTF-8 encoded translate("å¹³å’Œ"); // MSVC would convert use it as UTF-8 // L"å¹³å’Œ" does not works!! wcout << translate("ã€Œå¹³å’Œã€"); // convert in runtime from UTF-8 to UTF-16 cout << translate("ã€Œå¹³å’Œã€"); // convert just copy to the stream as is myprogram.po: msgid "" msgstr "" "Content-Type: charset=UTF-8\n" # it would assume UTF-8 sources msgid "å¹³å’Œ" msgstr "×©×œ×•×" # not translated msgid "ã€Œå¹³å’Œã€" msgstr "" This can be done and I can implement it. But do not expect anything beyond this. Also note that converting a message from cp936 to for example windows-1255 (Hebrew narrow windows encoding) would swap out all non-ASCII characters... But this is developer's problem who had chosen to use non-ASCII keys. Artyom

Next message: Kohei Takahashi: "[boost] [Config] A proposal to Boost.Config."
Previous message: Rutger ter Borg: "Re: [boost] [Memory Managed Pointer] Review Request"
In reply to: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"
Next in thread: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"
Reply: Jeremy Maitin-Shepard: "Re: [boost] [locale] Review results for Boost.Locale library"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk