Boost logo

Boost Users :

From: Ben Hutchings (ben.hutchings_at_[hidden])
Date: 2004-03-18 11:08:59


Paul Elliott <pelliott_at_[hidden]> wrote:
> I am using Gtkmm. I want to do boost regular expression searching on
> Glib::ustring.
>
http://www.gtkmm.org/gtkmm2/docs/reference/html/classGlib_1_1ustring.htm
l
>
> This class represents characters in UTF-8, so each character in the
> buffer is represented by a varriable number of bytes. But it does
> have a bidirectional iterator.
>
> How would you set up boost regex to search if both the regular
> expression and the string to be searched is a ustring?
>
> Does one need to override any of the types defiened in regex_traits?

The character type should be gunichar.

> Thank you.
>
> BTW.
> Quote from:http://www.boost.org/libs/regex/doc/regex_traits.html
>
> "Under construction.
> The current boost.regex traits class design will be migrated to
> that specified in the regular expression standardization proposal."
>
> This is not very useful to someone trying to use the boost_regex
> library now!

I am using Boost 1.30, in which the traits are documented in
<boost/libs/regex/traits_class_ref.htm>. I don't know whether that
documentation still applies to 1.31.

You don't necessarily need custom traits to use custom iterator
types. However, you do need to define custom traits to make
character classes, case folding and so on work for all Unicode
characters.

To create a regex from a ustring I would write something like this:

    boost::reg_expression<gunichar, unicode_regex_traits> re;
    re.assign(s.begin(), s.end(), flags);

When searching in a ustring I would write code like:

    my_predicate pred;
    boost::regex_grep(pred, s.begin(), s.end(), re, flags);

    boost::match_results<Glib::ustring::const_iterator,
                         std::allocator<gunichar> > results;
    boost::regex_search<Glib::ustring::const_iterator,
                        std::allocator<gunichar>,
                        gunichar,
                        unicode_regex_traits,
                        std::allocator<gunichar> >(
        s.begin(), s.end(), results, re, flags);

    Glib::ustring result;
    boost::regex_merge(
        std::back_inserter(result), source.begin(), source.end(),
        re,
        std::basic_string<gunichar>(repl.begin(), repl.end())),
        flags);

(The above is untested as we're not actually using Gtkmm but it's
based on what we're doing with our own UTF-8 and UTF-16 strings.)


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net