
I have a question and a bug report regarding the format_perl flag. First the question ... I see that, when you specify format_perl, match_results::format() recognizes the escape sequences \l \L \u and \U, which do uppercasing or lowercasing. These are necessarily locale-dependent character transformations, but match_results does not have a Traits parameter. How should the transformations be done? I note that the basic_regex<> class template has a traits parameter, and that match_results<>::format() can only be called after a successful regex match. One reasonable approach is that match_results<> holds a (shared) pointer to the regex object's traits. It would have to be a polymorphic base pointer, since match_results can't know the exact type of the traits object at the time format() is called. That doesn't exactly work because the RegexTraits concept doesn't have toupper() and tolower() functions. I suggest adding them. This isn't only a problem for format_perl, strictly speaking. match_results::format() also needs to know how to turn characters into integers (eg. to parse format strings like "$1"). That is the reason for RegexTraits::value()'s existence, so match_results<>::format() should use it. (Incidentally, I just implemented all this in xpressive, so I can confirm that this strategy works. It incurs a virtual call for each tolower(), toupper(), and value(), but there doesn't seem to be any other way without changing the interface in a non-TR1 compatible way.) Finally, a bug report. Consider the following code: std::string str ("fOO bAr BaZ"); regex rx ("\\w+"); str = regex_replace( str, rx, "\\L\\u$&", format_perl ); std::cout << str << std::endl; This prints: FOO BAr BaZ However, the equivalent perl: $str= 'fOO bAr BaZ'; $str =~ s/\w+/\L\u$&/g; print "$str\n"; Prints this: Foo Bar Baz Looks like in boost::regex, the \u is stomping the \L rather than merely overriding it for the next character. -- Eric Niebler Boost Consulting www.boost-consulting.com