Boost logo

Boost Users :

Subject: Re: [Boost-users] something about UTF8
From: John Maddock (john_at_[hidden])
Date: 2008-12-18 04:39:08


wind world wrote:
>> hi guys,
>> I want to use boost::regex in Windows XP to match Japanese kanji.
>> The encoding of kanji is UTF-8 I want to make sure after I use the
>> funcation: MultibyteToWideChar to change the UTF-8 Kanji
>> string->wstring, I can directly use boost::wregex(from wstring) to
>> match Japanese?

You would need to check the Windows API docs to make sure you're using the
API correctly (does it work with UTF-8 as source? No idea on that), but
yes, once you have the text encoded as UTF-16 then wregex will behave as you
expect.

Otherwise you could build regex with ICU support and then match UTF-8
directly: the downside is that you then have a dependency to ICU which is
not a small library.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net