Boost logo

Boost Users :

From: Anjaly (anjaly_at_[hidden])
Date: 2007-10-01 05:02:18


I am sorry the last message had an mistake.I wanted to say that I want
to do a search that would take all the data as though it is Utf32
rather than utf8 ( as i incorrectly wrote). I don't know whether i am
making myself clear (I am not very good in expressing the opnion).

What i really want to do is a unicode search on the available data.
                

                                                Anjaly G S

On Mon, 2007-10-01 at 09:42 +0100, John Maddock wrote:
> Anjaly wrote:
> > In the regex document it was said that the size of data type of the
> > variable passed to the make_u32regex that determines character
> > encoding (utf8,utf16 or utf32) .
>
> *For construction of the regex object*.
>
> The search algorithms operate independently on any of UTF8/16/32.
>
> > I passed wchar_t (which i think size
> > is 4) so that the buffer encoding is considered as utf8 by
> > u32regex_search irrespectively. Actually i am trying to do a utf8
> > search.
>
> Except the data file you sent *was not valid UTF8* !
>
> It looks like it's probably UTF16LE, it's up to you in that case to decode
> the byte order mark and read the text into something that Boost.Regex can
> handle (for example platform-native UTF16). ICU should have some file IO
> routines for doing that kind of thing: for example for loading a file into a
> UnicodeString type.
>
> HTH, John.
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users

______________________________________
Scanned and protected by Email scanner


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net