Boost logo

Boost Users :

Subject: Re: [Boost-users] UTF16
From: Tan, Tom (Shanghai) (TTan_at_[hidden])
Date: 2009-07-17 05:40:28


I have some code that does conversion between UTF16 and MBCSs on Windows
only:

                        template<typename FROM, typename TO>
                        struct convert
                        {
                                basic_string<TO> operator()(const
basic_string<FROM>& from)
                                {
                                        return from;
                                }
                        };

                        template<>
                        struct convert<wchar_t, char>
                        {
                                string operator()(const wstring& from)
                                {
                                        return utf16_to_mbcs(from);
                                }
                        private:
                                string utf16_to_mbcs(const wstring& ws)
                                {
                                        if(ws.empty()) return string();

                                        const size_t BUFFER_SIZE =
(ws.size() << 1) + 1;

                                        shared_array<char> p_mcb(new
char[BUFFER_SIZE]);

                                        bool has_utf16le_bom = (0xFEFF
== ws[0]);
                                        int count =
::WideCharToMultiByte(
                        AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP,
                                                WC_NO_BEST_FIT_CHARS,
                                                ( has_utf16le_bom ?
ws.substr(1) : ws).c_str(),
                                                has_utf16le_bom ?
ws.size() - 1 : ws.size(),
                                                p_mcb.get(),
                                                BUFFER_SIZE,
                                                0,
                                                0 );
                                        return (0 == count)
                    ? string()
                    : string(p_mcb.get(), count );

                                }
                        };

                        template<>
                        struct convert<char, wchar_t>
                        {
                                wstring operator()(const string& from)
                                {
                                        return mbcs_to_utf16(from);
                                }
                        private:
                                wstring mbcs_to_utf16(const string& s)
                                {
                                        if(s.empty()) return wstring();

                                        const size_t BUFFER_SIZE =
(s.size() << 1) + 1;

                                        shared_array<wchar_t> p_ws(new
wchar_t[BUFFER_SIZE]);
                                        int count =
::MultiByteToWideChar(
                        AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP,
                                                MB_PRECOMPOSED,
s.c_str(),
                                                s.size(),
                                                p_ws.get(),
                                                BUFFER_SIZE
                                                );

                                        return (0 == count)
                                                    ? wstring()
                                                    : wstring(p_ws.get(),
count );
                                }
                        };

>Date: Thu, 16 Jul 2009 10:39:31 +0200
>From: plarroy <plarroy_at_[hidden]>
>
>My approach is using std::string, etc. all the time and using UTF-8
>internally, only converting to other charsets when it's needed.

>I use IBM icu library and made a boost::iostreams filter to convert
>encoding, once it's done takes a lot of complexity away, I use it like:

> // setup a conversion from charset to utf-8
> filt_streamb.push(ucnv_filter(charset.c_str(), "utf-8"));
> istream is(&filt_streamb);

>Perhaps there's interest to push this charset conversion into
>boost::iostreams filters examples.

>Regards.

Robert Dailey wrote:
> Oh, I also forgot to mention, I am also using boost::filesystem::path.
I
> guess this means I need to use wchar_t everywhere (std::wstring,
> boost::filesystem::wpath, etc) and just let wxWidgets do the
> encoding/decoding? If I don't have to do any encoding/decoding myself,
then
> there really is no need for a special object. But just in case I would
like
> to have the encoding/decoding abilities.
>
> On Sun, Jun 14, 2009 at 12:27 PM, Robert Dailey <rcdailey_at_[hidden]>
wrote:
>
>
>> Hi everyone,
>> I did a bit of googling to see if Boost 1.39 as any portable support
for
>> UTF-16 encoded strings, but I did not find any. I'm currently using
>> wxWidgets in my application, and I need a decent string object to
use. I
>> know that wxWidgets has UTF-16 string support through wxString,
however I do
>> not want to expose this object in my interfaces. I want to remain as
>> abstracted away from wxWidgets as possible. Having said that, if
someone
>> could tell me if there is any existing UTF-16 string support in
Boost, I'd
>> appreciate it. I did not find anything in the vault, sandbox, or
trunk in
>> Boost.
>>
>> If boost has no such string object, could someone give me a head
start on
>> where to look? Thanks.
>>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net