Working with wstrings with the regex lib should work without problems, except you cannot rely on unicode specific character classes. Just make sure you convert correctly between UTF-8 and wide-char strings.

Rune

On Thu, Dec 18, 2008 at 10:39 AM, John Maddock <john@johnmaddock.co.uk> wrote:
wind world wrote:
hi guys,
I want to use boost::regex in Windows XP  to match Japanese kanji.
The encoding of kanji is UTF-8 I want to make sure after I use the
funcation: MultibyteToWideChar to change the UTF-8 Kanji
string->wstring, I can directly use boost::wregex(from wstring) to
match Japanese?

You would need to check the Windows API docs to make sure you're using the API correctly (does it work with UTF-8 as source?  No idea on that), but yes, once you have the text encoded as UTF-16 then wregex will behave as you expect.

Otherwise you could build regex with ICU support and then match UTF-8 directly: the downside is that you then have a dependency to ICU which is not a small library.

HTH, John.
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users