|
Boost : |
From: John Maddock (john_at_[hidden])
Date: 2007-03-19 06:28:45
Eric Niebler wrote:
> I have a question and a bug report regarding the format_perl flag.
> First the question ...
>
> I see that, when you specify format_perl, match_results::format()
> recognizes the escape sequences \l \L \u and \U, which do uppercasing
> or lowercasing. These are necessarily locale-dependent character
> transformations, but match_results does not have a Traits parameter.
> How should the transformations be done?
>
> I note that the basic_regex<> class template has a traits parameter,
> and that match_results<>::format() can only be called after a
> successful regex match. One reasonable approach is that
> match_results<> holds a (shared) pointer to the regex object's
> traits. It would have to be a polymorphic base pointer, since
> match_results can't know the exact type of the traits object at the
> time format() is called.
>
> That doesn't exactly work because the RegexTraits concept doesn't have
> toupper() and tolower() functions. I suggest adding them.
Right, but format_perl isn't part of TR1, so this is all in the realms of
vendor-specific extensions. I added some *optional* extra members to the
traits class to deal with this: the code detects at compile time whether the
member are there, and uses them if they are, otherwise uses some sensible
defaults.
> This isn't only a problem for format_perl, strictly speaking.
> match_results::format() also needs to know how to turn characters into
> integers (eg. to parse format strings like "$1"). That is the reason
> for RegexTraits::value()'s existence, so match_results<>::format()
> should use it.
>
> (Incidentally, I just implemented all this in xpressive, so I can
> confirm that this strategy works. It incurs a virtual call for each
> tolower(), toupper(), and value(), but there doesn't seem to be any
> other way without changing the interface in a non-TR1 compatible way.)
Yep, for regex_replace you can pass the regex object through to the code
that does the formatting, but match_replace::format has no such object. I
use the default locale in this case, but your approach is probably better.
> Finally, a bug report. Consider the following code:
>
> std::string str ("fOO bAr BaZ");
> regex rx ("\\w+");
>
> str = regex_replace( str, rx, "\\L\\u$&", format_perl );
> std::cout << str << std::endl;
>
> This prints:
>
> FOO BAr BaZ
>
> However, the equivalent perl:
>
> $str= 'fOO bAr BaZ';
> $str =~ s/\w+/\L\u$&/g;
> print "$str\n";
>
> Prints this:
>
> Foo Bar Baz
>
> Looks like in boost::regex, the \u is stomping the \L rather than
> merely overriding it for the next character.
Yep, fixed in cvs, thanks for the report.
John.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk