Boost logo

Boost Users :

Subject: Re: [Boost-users] [regex] wildcard matching byte not character
From: Richard Clokie (richard.clokie_at_[hidden])
Date: 2010-03-01 14:50:05


Yup, spotted it about 20 minutes after posting. Sorry about that, and
thanks for the help.

Richard

John Maddock wrote:
>> I'm having trouble with the behaviour of the wildcard character when
>> using boost regex and unicode strings. I would expect a . to match a
>> character, not a byte, but that's not the behaviour I'm seeing. I would
>> have thought one wildcard would match any previous character, but for
>> multi-byte characters in UTF-8 I have to use multiple wildcards to match
>> them.
>>
>> I would appreciate it if someone could explain whether this is expected
>> behaviour or not, or if there are flags that control this.
>>
>> What I'm trying to accomplish is to match a pattern (in UTF-8 ) against
>> a string (in UTF-8). I'm creating icu UnicodeStrings since I'm having
>> other problems with straight UTF-8 char*s and my platform doesn't
>> support w_chars. I can show examples of the non-UnicodeString problems
>> if desired.
>
> You're constructing invalid UnicodeString's: the const char* constructor
> does not convert from UTF-8, if the strings are constructed as:
>
> UnicodeString s(buf, "UTF8");
>
> Then the output changes to
>
> Success!
> Failed
>
> Which is what you expected.
>
> HTH, John.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net