From: Eric Niebler (eric_at_[hidden])
Date: 2007-03-18 19:05:44
I have a question and a bug report regarding the format_perl flag. First
the question ...
I see that, when you specify format_perl, match_results::format()
recognizes the escape sequences \l \L \u and \U, which do uppercasing or
lowercasing. These are necessarily locale-dependent character
transformations, but match_results does not have a Traits parameter. How
should the transformations be done?
I note that the basic_regex<> class template has a traits parameter, and
that match_results<>::format() can only be called after a successful
regex match. One reasonable approach is that match_results<> holds a
(shared) pointer to the regex object's traits. It would have to be a
polymorphic base pointer, since match_results can't know the exact type
of the traits object at the time format() is called.
That doesn't exactly work because the RegexTraits concept doesn't have
toupper() and tolower() functions. I suggest adding them.
This isn't only a problem for format_perl, strictly speaking.
match_results::format() also needs to know how to turn characters into
integers (eg. to parse format strings like "$1"). That is the reason for
RegexTraits::value()'s existence, so match_results<>::format() should
(Incidentally, I just implemented all this in xpressive, so I can
confirm that this strategy works. It incurs a virtual call for each
tolower(), toupper(), and value(), but there doesn't seem to be any
other way without changing the interface in a non-TR1 compatible way.)
Finally, a bug report. Consider the following code:
std::string str ("fOO bAr BaZ");
regex rx ("\\w+");
str = regex_replace( str, rx, "\\L\\u$&", format_perl );
std::cout << str << std::endl;
FOO BAr BaZ
However, the equivalent perl:
$str= 'fOO bAr BaZ';
$str =~ s/\w+/\L\u$&/g;
Foo Bar Baz
Looks like in boost::regex, the \u is stomping the \L rather than merely
overriding it for the next character.
-- Eric Niebler Boost Consulting www.boost-consulting.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk