Boost logo

Boost Users :

Subject: Re: [Boost-users] regex traits for glib::ustring
From: John Maddock (john_at_[hidden])
Date: 2009-07-14 10:53:48


> I'm attempting to build an application which requires multibyte
> character support, for which I'm using the Glib::ustring, and needs
> also to use regex.
>
> From what I understand you need to tell boost regex exactly how to
> iterate over your strings.
>
> I've got as far as:
>
> typedef boost::match_results<Glib::ustring::iterator,
> std::allocator<gunichar> > umatch;

You could ditch the allocator parameter from that typedef.

> typedef boost::reg_expression<Glib::ustring,
> boost::regex_traits<Glib::ustring> > uregex;

No the first parameter to the regex type is the *character type*, not a
*string type*. Since Glib::ustring is a sequence of bytes, you could just
use boost::regex here, but then that would not be Unicode and UTF-8 aware.
To get full Unicode support in boost::regex you nead to use it in
conjunction with the ICU library, see:
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non_std_strings/icu.html

But I also note that Glib has a PCRE based regex engine of it's own, which
may well provide you with what you need?

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net