|
Boost : |
Subject: [boost] [locale] [filesystem] Windows local 8 bit encoding
From: Thiel, Bjoern (bjoern.thiel_at_[hidden])
Date: 2012-10-31 10:07:15
Hi,
developing platform independent code I really like the convenience functions
conv::to_utf, conv::from_utf, and conv::utf_to_utf from locale.
Why not add something like conv::local8bit_to_utf and conv::local8bit_from_utf
following the rational from filesystem (path encoding conversions):
template < typename CharType >
std::basic_string< CharType > local8bit_to_utf
( std::string const & text, method_type how = default_method )
{
char const * encoding = impl::local8bit_encoding() ;
return to_utf< CharType >( text, encoding, how ) ;
}
template< typename CharType >
std::string local8bit_from_utf
( std::basic_string< CharType > const & text, method_type how = default_method )
{
char const * encoding = impl::local8bit_encoding() ;
return from_utf< CharType >( text, encoding, how ) ;
}
with
char const * local8bit_encoding()
{
#ifdef WIN32
UINT codepage = AreFileApisANSI() ? GetACP() : GetOEMCP() ;
return windows_codepage_to_encoding( codepage ) ;
#else
return "UTF-8" ;
#endif
}
and with (better using a map)
char const * windows_codepage_to_encoding( int const codepage )
{
switch (codepage)
{
case 874: return "windows-874" ;
case 932: return "Shift_JIS" ; // but should be "Windows-31J" ;
case 936: return "GB2312" ;
case 949: return "KS_C_5601-1987" ;
case 950: return "Big5" ;
case 1250: return "windows-1250" ;
case 1251: return "windows-1251" ;
case 1252: return "windows-1252" ;
case 1253: return "windows-1253" ;
case 1254: return "windows-1254" ;
case 1255: return "windows-1255" ;
case 1256: return "windows-1256" ;
case 1257: return "windows-1257" ;
case 1258: return "windows-1258" ;
case 20127: return "US-ASCII" ;
case 20866: return "KOI8-R" ;
case 20932: return "EUC-JP" ;
case 21866: return "KOI8-U" ;
case 28591: return "ISO-8859-1" ;
case 28592: return "ISO-8859-2" ;
case 28593: return "ISO-8859-3" ;
case 28594: return "ISO-8859-4" ;
case 28595: return "ISO-8859-5" ;
case 28596: return "ISO-8859-6" ;
case 28597: return "ISO-8859-7" ;
case 28598: return "ISO-8859-8" ;
case 28599: return "ISO-8859-9" ;
case 28603: return "ISO-8859-13" ;
case 28605: return "ISO-8859-15" ;
case 50220: return "ISO-2022-JP" ;
case 50225: return "ISO-2022-KR" ;
case 51949: return "EUC-KR" ;
case 54936: return "GB18030" ;
case 65001: return "UTF-8" ;
default:
{
std::ostringstream message ;
message << "Unknown codepage " << codepage ;
throw std::invalid_argument( message.str() ) ;
}
}
}
Best regards
Bjoern.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk