
I have some code that does conversion between UTF16 and MBCSs on Windows only: template<typename FROM, typename TO> struct convert { basic_string<TO> operator()(const basic_string<FROM>& from) { return from; } }; template<> struct convert<wchar_t, char> { string operator()(const wstring& from) { return utf16_to_mbcs(from); } private: string utf16_to_mbcs(const wstring& ws) { if(ws.empty()) return string(); const size_t BUFFER_SIZE = (ws.size() << 1) + 1; shared_array<char> p_mcb(new char[BUFFER_SIZE]); bool has_utf16le_bom = (0xFEFF == ws[0]); int count = ::WideCharToMultiByte( AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP, WC_NO_BEST_FIT_CHARS, ( has_utf16le_bom ? ws.substr(1) : ws).c_str(), has_utf16le_bom ? ws.size() - 1 : ws.size(), p_mcb.get(), BUFFER_SIZE, 0, 0 ); return (0 == count) ? string() : string(p_mcb.get(), count ); } }; template<> struct convert<char, wchar_t> { wstring operator()(const string& from) { return mbcs_to_utf16(from); } private: wstring mbcs_to_utf16(const string& s) { if(s.empty()) return wstring(); const size_t BUFFER_SIZE = (s.size() << 1) + 1; shared_array<wchar_t> p_ws(new wchar_t[BUFFER_SIZE]); int count = ::MultiByteToWideChar( AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP, MB_PRECOMPOSED, s.c_str(), s.size(), p_ws.get(), BUFFER_SIZE ); return (0 == count) ? wstring() : wstring(p_ws.get(), count ); } };
Date: Thu, 16 Jul 2009 10:39:31 +0200 From: plarroy <plarroy@promax.es>
My approach is using std::string, etc. all the time and using UTF-8 internally, only converting to other charsets when it's needed.
I use IBM icu library and made a boost::iostreams filter to convert encoding, once it's done takes a lot of complexity away, I use it like:
// setup a conversion from charset to utf-8 filt_streamb.push(ucnv_filter(charset.c_str(), "utf-8")); istream is(&filt_streamb);
Perhaps there's interest to push this charset conversion into boost::iostreams filters examples.
Regards.
Robert Dailey wrote:
Oh, I also forgot to mention, I am also using boost::filesystem::path. I guess this means I need to use wchar_t everywhere (std::wstring, boost::filesystem::wpath, etc) and just let wxWidgets do the encoding/decoding? If I don't have to do any encoding/decoding myself, then there really is no need for a special object. But just in case I would like to have the encoding/decoding abilities.
On Sun, Jun 14, 2009 at 12:27 PM, Robert Dailey <rcdailey@gmail.com> wrote:
Hi everyone, I did a bit of googling to see if Boost 1.39 as any portable support for UTF-16 encoded strings, but I did not find any. I'm currently using wxWidgets in my application, and I need a decent string object to use. I know that wxWidgets has UTF-16 string support through wxString, however I do not want to expose this object in my interfaces. I want to remain as abstracted away from wxWidgets as possible. Having said that, if someone could tell me if there is any existing UTF-16 string support in Boost, I'd appreciate it. I did not find anything in the vault, sandbox, or trunk in Boost.
If boost has no such string object, could someone give me a head start on where to look? Thanks.
participants (1)
-
Tan, Tom (Shanghai)