|
Boost Users : |
From: John Maddock (john_at_[hidden])
Date: 2007-08-02 04:19:55
Arne Babnik wrote:
> Hi all,
>
> I am trying to perform some simple (as I thought) text manipulation.
> The task was to prepend a text with some other, e.g. 1234 -> 01234. I
> tried to do this with the expression ".*" and format "0$&".
>
> I used regex_replace with the following parameters:
>
> std::basic_string<char> string("1234");
> std::basic_string<char> format("0$&");
> boost::regex exp(".*");
> std::basic_string<char> result;
>
> result = boost::regex_replace(string, exp, format);
>
>
> I was very surprised when I got the result "012340", i.e. the format
> string was applied twice. After some testing, I found out that
> regex_replace matches a second time at the end of the string, where
> the expression matched an empty string (therefore only the zero was
> appended at the end).
>
> When using the (undocumented) option match_not_initial_null, it
> basically works - but then empty input strings would not match either.
> This also applies to the expression ".+", which works correct on the
> input but would not match on empty strings as well.
> The only solution I found so far was using "^.*" as expression.
>
> Is this a bug or a feature? Or what did I get wrong with this regular
> expression?
It's a feature: or at least a Perl-compatibility feature.
When a match is found, it always checks for the next possible match starting
from the end of the previous match: even if the end of the previous match is
at the end of the string, so:
[[:digit:]]* against 1234 always finds two matches "1234" and then the empty
string "" after the "4", irrespective of whether the "1234" occurs in the
middle of a text or at the end.
John.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net