Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2007-10-01 04:42:09


Anjaly wrote:
> In the regex document it was said that the size of data type of the
> variable passed to the make_u32regex that determines character
> encoding (utf8,utf16 or utf32) .

*For construction of the regex object*.

The search algorithms operate independently on any of UTF8/16/32.

> I passed wchar_t (which i think size
> is 4) so that the buffer encoding is considered as utf8 by
> u32regex_search irrespectively. Actually i am trying to do a utf8
> search.

Except the data file you sent *was not valid UTF8* !

It looks like it's probably UTF16LE, it's up to you in that case to decode
the byte order mark and read the text into something that Boost.Regex can
handle (for example platform-native UTF16). ICU should have some file IO
routines for doing that kind of thing: for example for loading a file into a
UnicodeString type.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net