|
Boost Users : |
From: John Maddock (john_at_[hidden])
Date: 2004-03-19 07:23:37
>I am using Gtkmm. I want to do boost regular expression searching on
>Glib::ustring.
>http://www.gtkmm.org/gtkmm2/docs/reference/html/classGlib_1_1ustring.html
>
>This class represents characters in UTF-8, so each character in the
>buffer is represented by a varriable number of bytes. But it does
>have a bidirectional iterator.
>
>How would you set up boost regex to search if both the regular expression
>and the string to be searched is a ustring?
>
>Does one need to override any of the types defiened in regex_traits?
I wouldn't do it that way: Boost.Regex works only with character sets, where
each code point is an "atom", where as UTF8 is a multibyte sequence that
requires multiple characters to be considered as atoms.
One way of handling this is to define a conversion iterator that translates
on-the-fly between UTF-8 characters and wide character atoms, then use
boost::wregex and feed it your converting iterator, rather than raw UTF-8
data.
John.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net