Boost logo

Boost :

Subject: [boost] [locale] [filesystem] Windows local 8 bit encoding
From: Thiel, Bjoern (bjoern.thiel_at_[hidden])
Date: 2012-10-31 10:07:15


Hi,

developing platform independent code I really like the convenience functions
conv::to_utf, conv::from_utf, and conv::utf_to_utf from locale.
Why not add something like conv::local8bit_to_utf and conv::local8bit_from_utf
following the rational from filesystem (path encoding conversions):

template < typename CharType >
std::basic_string< CharType > local8bit_to_utf
( std::string const & text, method_type how = default_method )
{
  char const * encoding = impl::local8bit_encoding() ;
  return to_utf< CharType >( text, encoding, how ) ;
}

template< typename CharType >
std::string local8bit_from_utf
( std::basic_string< CharType > const & text, method_type how = default_method )
{
  char const * encoding = impl::local8bit_encoding() ;
  return from_utf< CharType >( text, encoding, how ) ;
}

with

char const * local8bit_encoding()
{
#ifdef WIN32
  UINT codepage = AreFileApisANSI() ? GetACP() : GetOEMCP() ;
  return windows_codepage_to_encoding( codepage ) ;
#else
  return "UTF-8" ;
#endif
}

and with (better using a map)

char const * windows_codepage_to_encoding( int const codepage )
{
  switch (codepage)
  {
  case 874: return "windows-874" ;

  case 932: return "Shift_JIS" ; // but should be "Windows-31J" ;
  case 936: return "GB2312" ;
  case 949: return "KS_C_5601-1987" ;
  case 950: return "Big5" ;

  case 1250: return "windows-1250" ;
  case 1251: return "windows-1251" ;
  case 1252: return "windows-1252" ;
  case 1253: return "windows-1253" ;
  case 1254: return "windows-1254" ;
  case 1255: return "windows-1255" ;
  case 1256: return "windows-1256" ;
  case 1257: return "windows-1257" ;
  case 1258: return "windows-1258" ;

  case 20127: return "US-ASCII" ;

  case 20866: return "KOI8-R" ;
  case 20932: return "EUC-JP" ;
  case 21866: return "KOI8-U" ;

  case 28591: return "ISO-8859-1" ;
  case 28592: return "ISO-8859-2" ;
  case 28593: return "ISO-8859-3" ;
  case 28594: return "ISO-8859-4" ;
  case 28595: return "ISO-8859-5" ;
  case 28596: return "ISO-8859-6" ;
  case 28597: return "ISO-8859-7" ;
  case 28598: return "ISO-8859-8" ;
  case 28599: return "ISO-8859-9" ;
  case 28603: return "ISO-8859-13" ;
  case 28605: return "ISO-8859-15" ;

  case 50220: return "ISO-2022-JP" ;
  case 50225: return "ISO-2022-KR" ;

  case 51949: return "EUC-KR" ;
  case 54936: return "GB18030" ;

  case 65001: return "UTF-8" ;

  default:
    {
      std::ostringstream message ;
      message << "Unknown codepage " << codepage ;
      throw std::invalid_argument( message.str() ) ;
    }
  }
}

Best regards

Bjoern.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk