[boost] [regex] format_perl conundrum

18 Mar 2007

      I have a question and a bug report regarding the format_perl flag. First 
the question ...

I see that, when you specify format_perl, match_results::format() 
recognizes the escape sequences \l \L \u and \U, which do uppercasing or 
lowercasing. These are necessarily locale-dependent character 
transformations, but match_results does not have a Traits parameter. How 
should the transformations be done?

I note that the basic_regex<> class template has a traits parameter, and 
that match_results<>::format() can only be called after a successful 
regex match. One reasonable approach is that match_results<> holds a 
(shared) pointer to the regex object's traits. It would have to be a 
polymorphic base pointer, since match_results can't know the exact type 
of the traits object at the time format() is called.

That doesn't exactly work because the RegexTraits concept doesn't have 
toupper() and tolower() functions. I suggest adding them.

This isn't only a problem for format_perl, strictly speaking. 
match_results::format() also needs to know how to turn characters into 
integers (eg. to parse format strings like "$1"). That is the reason for 
RegexTraits::value()'s existence, so match_results<>::format() should 
use it.

(Incidentally, I just implemented all this in xpressive, so I can 
confirm that this strategy works. It incurs a virtual call for each 
tolower(), toupper(), and value(), but there doesn't seem to be any 
other way without changing the interface in a non-TR1 compatible way.)

Finally, a bug report. Consider the following code:

     std::string str ("fOO bAr BaZ");
     regex rx ("\\w+");

     str = regex_replace( str, rx, "\\L\\u$&", format_perl );
     std::cout << str << std::endl;

This prints:

     FOO BAr BaZ

However, the equivalent perl:

     $str= 'fOO bAr BaZ';
     $str =~ s/\w+/\L\u$&/g;
     print "$str\n";

Prints this:

     Foo Bar Baz

Looks like in boost::regex, the \u is stomping the \L rather than merely 
overriding it for the next character.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com