[Boost-bugs] [Boost C++ Libraries] #9827: Missing support for some code page(e.g 949, 950) in windows conversion with std backend

Subject: [Boost-bugs] [Boost C++ Libraries] #9827: Missing support for some code page(e.g 949, 950) in windows conversion with std backend
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2014-04-02 09:55:42


#9827: Missing support for some code page(e.g 949, 950) in windows conversion with
std backend
-------------------------------------------------+-------------------------
 Reporter: hucaonju@… | Owner: artyom
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: locale
  Version: Boost 1.55.0 | Severity: Problem
 Keywords: locale,code page,Korean,Traditional |
  Chinese,exception |
-------------------------------------------------+-------------------------
 There is a table windows_encoding all_windows_encodings[] in
 wconv_codepage.ipp. It contains several code page definitions. However, it
 misses some code pages, such as the Korean code page(949) or Traditional
 Chinese Big5 code page(950), which will cause an invalid_charset_error
 when running in that windows for the following code:

 {{{
 // Assuming we are using the std backend so it supports ansi encodings
 boost::locale::generator gen;
 gen.use_ansi_encoding(true);

 std::locale loc(gen(""));
 // Throws invalid_charset_error when running in Korean windows but OK in
 English windows.
 // The charset is "windows949" in Korean windows, which is not in the
 table.
 std::string us = boost::locale::conv::to_utf<char>("abcdefg", loc);
 }}}

 The root cause of this exception is that the generated code page string is
 not in the table. When the locale generator with std backend in windows
 platform generates a locale, it calls
 boost::locale::util::get_system_locale(bool use_utf8). This function will
 use the following code to generate the locale string(in
 default_locale.cpp):
 {{{
 if(GetLocaleInfoA(LOCALE_USER_DEFAULT,LOCALE_IDEFAULTANSICODEPAGE,buf,sizeof(buf))!=0)
 {
     if(atoi(buf)==0)
         lc_name+=".UTF-8";
     else {
         lc_name +=".windows-";
         lc_name +=buf;
     }
 }
 }}}
 So the encoding part of the lc_name is windows-(code page). In a system
 with Korean(949) or Traditional Chinese(950) code page, this will generate
 an encoding string like "windows-949" or "windows-950". However, when
 wconv_from_utf::open() initializes, it tries to search "windows949" or
 "windows950" in array all_windows_encodings[]. Obviously it will not find
 the string, and the open() fails, then the exception is thrown.

 For a quick fix, I suggest adding the missing code page to the table:
 {{{
 { "cp949", 949, 0 }, // Korean
 { "uhc", 949, 0 }, // From "iconv -l"
 { "windows949", 949, 0 }, // Korean
 // "big5" already in the table
 { "windows950", 950, 0 }, // TC, big5
 }}}

 However the list may not be complete, and we may encounter problems when
 running in a system with code page that does not exist in the list. So we
 may probably add the following code to function int
 encoding_to_windows_codepage(char const *ccharset) in wconv_codepage.ipp:

 {{{
 --- E:\Build1\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp
 2014-04-02 16:34:52.000000000 +0800
 +++ E:\Build2\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp
 2014-04-02 17:31:37.000000000 +0800
 @@ -206,12 +206,18 @@
                  return ptr->codepage;
              }
              else {
                  return -1;
              }
          }
 + if(ptr==end && charset.size()>7 &&
 charset.substr(0,7)=="windows") {
 + int cp = atoi(charset.substr(7).c_str());
 + if(IsValidCodePage(cp)) {
 + return cp;
 + }
 + }
          return -1;

      }

      template<typename CharType>
      bool validate_utf16(CharType const *str,unsigned len)
 }}}

 This piece of code directly parses and validates the encoding string. The
 concern is that the call to IsValidCodePage may decrease the
 performance(not tested).

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/9827>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:15 UTC