Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-03-18 19:05:44

I have a question and a bug report regarding the format_perl flag. First
the question ...

I see that, when you specify format_perl, match_results::format()
recognizes the escape sequences \l \L \u and \U, which do uppercasing or
lowercasing. These are necessarily locale-dependent character
transformations, but match_results does not have a Traits parameter. How
should the transformations be done?

I note that the basic_regex<> class template has a traits parameter, and
that match_results<>::format() can only be called after a successful
regex match. One reasonable approach is that match_results<> holds a
(shared) pointer to the regex object's traits. It would have to be a
polymorphic base pointer, since match_results can't know the exact type
of the traits object at the time format() is called.

That doesn't exactly work because the RegexTraits concept doesn't have
toupper() and tolower() functions. I suggest adding them.

This isn't only a problem for format_perl, strictly speaking.
match_results::format() also needs to know how to turn characters into
integers (eg. to parse format strings like "$1"). That is the reason for
RegexTraits::value()'s existence, so match_results<>::format() should
use it.

(Incidentally, I just implemented all this in xpressive, so I can
confirm that this strategy works. It incurs a virtual call for each
tolower(), toupper(), and value(), but there doesn't seem to be any
other way without changing the interface in a non-TR1 compatible way.)

Finally, a bug report. Consider the following code:

     std::string str ("fOO bAr BaZ");
     regex rx ("\\w+");

     str = regex_replace( str, rx, "\\L\\u$&", format_perl );
     std::cout << str << std::endl;

This prints:

     FOO BAr BaZ

However, the equivalent perl:

     $str= 'fOO bAr BaZ';
     $str =~ s/\w+/\L\u$&/g;
     print "$str\n";

Prints this:

     Foo Bar Baz

Looks like in boost::regex, the \u is stomping the \L rather than merely
overriding it for the next character.

Eric Niebler
Boost Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at