Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-03-18 19:05:44


I have a question and a bug report regarding the format_perl flag. First
the question ...

I see that, when you specify format_perl, match_results::format()
recognizes the escape sequences \l \L \u and \U, which do uppercasing or
lowercasing. These are necessarily locale-dependent character
transformations, but match_results does not have a Traits parameter. How
should the transformations be done?

I note that the basic_regex<> class template has a traits parameter, and
that match_results<>::format() can only be called after a successful
regex match. One reasonable approach is that match_results<> holds a
(shared) pointer to the regex object's traits. It would have to be a
polymorphic base pointer, since match_results can't know the exact type
of the traits object at the time format() is called.

That doesn't exactly work because the RegexTraits concept doesn't have
toupper() and tolower() functions. I suggest adding them.

This isn't only a problem for format_perl, strictly speaking.
match_results::format() also needs to know how to turn characters into
integers (eg. to parse format strings like "$1"). That is the reason for
RegexTraits::value()'s existence, so match_results<>::format() should
use it.

(Incidentally, I just implemented all this in xpressive, so I can
confirm that this strategy works. It incurs a virtual call for each
tolower(), toupper(), and value(), but there doesn't seem to be any
other way without changing the interface in a non-TR1 compatible way.)

Finally, a bug report. Consider the following code:

     std::string str ("fOO bAr BaZ");
     regex rx ("\\w+");

     str = regex_replace( str, rx, "\\L\\u$&", format_perl );
     std::cout << str << std::endl;

This prints:

     FOO BAr BaZ

However, the equivalent perl:

     $str= 'fOO bAr BaZ';
     $str =~ s/\w+/\L\u$&/g;
     print "$str\n";

Prints this:

     Foo Bar Baz

Looks like in boost::regex, the \u is stomping the \L rather than merely
overriding it for the next character.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk