|
Boost Users : |
Subject: Re: [Boost-users] something about UTF8
From: Rune Lund Olesen (rune.olesen_at_[hidden])
Date: 2008-12-18 05:27:21
Working with wstrings with the regex lib should work without problems,
except you cannot rely on unicode specific character classes. Just make sure
you convert correctly between UTF-8 and wide-char strings.
Rune
On Thu, Dec 18, 2008 at 10:39 AM, John Maddock <john_at_[hidden]>wrote:
> wind world wrote:
>
>> hi guys,
>>> I want to use boost::regex in Windows XP to match Japanese kanji.
>>> The encoding of kanji is UTF-8 I want to make sure after I use the
>>> funcation: MultibyteToWideChar to change the UTF-8 Kanji
>>> string->wstring, I can directly use boost::wregex(from wstring) to
>>> match Japanese?
>>>
>>
> You would need to check the Windows API docs to make sure you're using the
> API correctly (does it work with UTF-8 as source? No idea on that), but
> yes, once you have the text encoded as UTF-16 then wregex will behave as you
> expect.
>
> Otherwise you could build regex with ICU support and then match UTF-8
> directly: the downside is that you then have a dependency to ICU which is
> not a small library.
>
> HTH, John.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net