Boost logo

Boost :

Subject: Re: [boost] [locale] Support of non US-ASCII character set for messages keys
From: Ryou Ezoe (boostcpp_at_[hidden])
Date: 2011-04-28 12:57:52


On Thu, Apr 28, 2011 at 5:17 AM, Artyom <artyomtnk_at_[hidden]> wrote:
> Hello,
>
> After reviewing all the discussion I've decided
> to do following changes in the interface to
> provide better support for non-US-ASCII keys.
>
> The actual thing that convinced me is a requirement
> to be able to include chars like © into the text...
>
> Currently there are following classes:
>
>   template<typename CharType>
>   class message_format :  public std::locale::facet {
>   public:
>     ...
>     typedef CharType char_type;
>     virtual char_type const *get(int domain_id,char const *context,char const
> *id) const = 0;
>     ...
>   };
>
>   class message {
>   public:
>     ...
>     explicit message(char const *id);
>     ...
>     // convert message to localized message
>     template<typename CharType>
>     std::basic_string<CharType> str(std::locale const &locale) const;
>
>   };
>
>   ...
>   inline message translate(char const *id);
>   inline std::string gettext(char const *id,std::locale const
> &loc=std::locale());
>   inline std::wstring wgettext(char const *id,std::locale const
> &loc=std::locale());
>   ...
>
> Basically message is created using narrow id only and can be converted
> to multiple output formats narrow, wide and so on.
>
>   std::cout << translate("Hello") << std::endl
>   std::wcout << translate("Hello") << std::endl;
>
> And you could call:
>
>   message msg = translate("Hello");
>   std::string hello = msg.str<char>();
>   std::wstring whello = msg.str<wchar_t>();
>
> Work together.
>
> I'll change it in following way:
>
>   template<typename CharType>
>   class message_format :  public std::locale::facet {
>   public:
>     ...
>     typedef CharType char_type;
>     virtual char_type const *get(int domain_id,char_type const
> *context,char_type const *id) const = 0;
>     ...
>   };
>
>   template<typename CharType>
>   class basic_message {
>   public:
>     typedef CharType char_type;
>     typedef std::basic_string<char_type> string_type;
>     ...
>     explicit message(char_type const *id);
>     ...
>     // convert message to localized message
>     string_type str(std::locale const &locale) const;
>
>   };
>   typedef basic_message<char> message;
>   typedef basic_message<wchar_t> wmessage;
>   typedef basic_message<char16_t> u16message;
>   typedef basic_message<char32_t> u32message;
>
>   ...
>   inline message translate(char const *id);
>   inline wmessage translate(wchar_t const *id);
>   inline std::string gettext(char const *id,std::locale const
> &loc=std::locale());
>   inline std::wstring wgettext(wchar_t const *id,std::locale const
> &loc=std::locale());
>   ...
>
>
> Now you would have to:
>
>   std::cout << translate("Hello") << std::endl
>   std::wcout << translate(L"Hello") << std::endl;
>
> And you should call:
>
>   message msg = translate("Hello");
>   wmessage wmsg = translate(L"Hello");
>   std::string hello = msg.str();
>   std::wstring whello = msg.str();
>
>
> Additionally you would be able to specify the encoding
> of the source strings when adding domain.
>
>  boost::locale::generator gen;
>  gen.add_messages_domain("myprogram","windows-936");
>
> While the default would always be UTF-8.
>
> So if you write in the program:
>
>  std::cout << translate("平和") << std::cout
>
> Under GCC using UTF-8 sources you have anythig to do.
>
> If you are using MSVC then you'll have to provide
> a charset name as shown above or use u8"平和"
>
> Of course this would break the API for users who
> currently use Boost.Locale (and I know at least several
> project who will suffer).
>
> But this would probably bring it so some logical
> point and prevent rising these questions.
>
> If course you should remember that untranslated
> non-US-ASCII strings would be converted in the
> run-time to current locale's encoding.
>
> Regards,
>
>  Artyom Beilis
>
> P.S.: Of course the documentation will still discourage
>      programmers from using non-US-ASCII keys as they
>      may not be displayed properly in local character
>      sets and may confuse users.
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Just some thoughts about design.

What is the format of string which specify encoding such as "windows-936"?
Does it have to be string rather than, say, enum?

Why there is no "translate" that takes string of char16_t or char32_t?
Although I think it will take years other compilers support C++0x's
new encoding prefixes.

-- 
Ryou Ezoe

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk