Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost-regex: Weird behaviour with non-greedy matching operator in regex_replace in boost 1.40?
From: John Maddock (john_at_[hidden])
Date: 2009-09-24 05:14:35


> Somehow I just don't get it.
> When I match "hallo" with "(.*?)o?" and "xhallo" with "x(.*?)o?", I
> expect that $1 will in both cases be the same. But this is not the case.
> In the former the result is "hall" while in the later its "hallo", which
> seems weird to me...

No that's not what's happening, remember the .*? part is non-greedy and will
match as few characters as possible (zero if possible) that still results in
an overall match. Consider the program below that enumerates all the
possible matches in the string - this is what regex_replace basically does
internally - but in this case you get to see all the individual matches,
output is as follows:

Enumerating all the matches of "(.*?)o?" in the text "Hallo"
$0 = "" $1 = "" Position = 0
$0 = "H" $1 = "H" Position = 0
$0 = "" $1 = "" Position = 1
$0 = "a" $1 = "a" Position = 1
$0 = "" $1 = "" Position = 2
$0 = "l" $1 = "l" Position = 2
$0 = "" $1 = "" Position = 3
$0 = "lo" $1 = "l" Position = 3
$0 = "" $1 = "" Position = 5

Enumerating all the matches of "x(.*?)o?" in the text "xHallo"
$0 = "x" $1 = "" Position = 0

So in this latter case there is only one match found, and in the case or
regex_replace the unmatched part (all of "Hallo") gets output unchanged.

Here's the example program:

int main ( int argc, char** argv )
{
   std::string input = "xHallo";
   boost::regex test ( "x(.*?)o?" );
   boost::sregex_iterator it ( input.begin(), input.end (), test);
   boost::sregex_iterator none;

   std::cout << "Enumerating all the matches of \"" << test.str() << "\" in
the text \"" << input << "\"" << std::endl;

   while ( it != none )
   {
      std::cout << "$0 = \"" << it->str(0) << "\" $1 = \"" << it->str(1) <<
"\" Position = " << it->position() << std::endl;
      ++it;
   }
   return 0;
}

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net