Boost logo

Boost Users :

Subject: [Boost-users] Boost-regex: Weird behaviour with non-greedy matching operator in regex_replace in boost 1.40?
From: Florian Schwarz (florian.schwarz_at_[hidden])
Date: 2009-09-23 04:11:55


Hi,

the non-greedy matching seems to have a weird behaviour (maybe
non-determeterministic?) to work if the pattern is preceeded with something.
E.g. when I want to get all characters in a string except the last on,
if its an 'o', I would use
      regex matchExpr("(.*?)o?");
So if I write
      string text("hallo");
      regex matchExpr("(.*?)o?");
      string valueExpr("$1");
      string result;
      regex_replace(back_inserter(resul), text.begin(), text.end(),
matchExpr, valueExpr);
      cout << "Match \"" << result << "\"" << endl;
it will print the expected "hall". If I now use instead
      string text(" hallo");
      regex matchExpr(" (.*?)o?");
it will print "hallo". And for
      string text("hhallo");
      regex matchExpr("h(.*?)o?");
      string valueExpr("$1");
      regex_replace(back_inserter(resul), text.begin(), text.end(),
matchExpr, valueExpr);
      cout << "Match \"" << result << "\"" << endl;
it will print "allo" which seems even more strange to me.
I'm using boost1.40 on ubuntu with g++ 4.2.4

To give you a complete example:

#include <boost/regex.hpp>
#include <iostream>

using namespace std;
using namespace boost;

void test(string& text, regex& matchExpr, string valueExpr){
   if (regex_match(text, matchExpr)){
      string result;
      regex_replace(back_inserter(result), text.begin(), text.end(),
matchExpr, valueExpr);
      cerr << "Match \"" << result << "\"" << endl;
   }
}

int main(){
   { // Test 1
      string text("hallo");
      regex matchExpr("(.*?)o?");
      string valueExpr("$1");
      test(text, matchExpr, valueExpr);
   }
   { // Test 2
      string text(" hallo");
      regex matchExpr(" (.*?)o?");
      string valueExpr("$1");
      test(text, matchExpr, valueExpr);
   }
   {// Test 3
      string text(" hallo");
      regex matchExpr("^ (.*?)o?$");
      string valueExpr("$1");
      test(text, matchExpr, valueExpr);
   }
   { // Test 4
      string text("hhallo");
      regex matchExpr("h(.*?)o?");
      string valueExpr("$1");
      test(text, matchExpr, valueExpr);
   }
   { // Test 5
      string text("hhallo");
      regex matchExpr("^h(.*?)o?$");
      string valueExpr("$1");
      test(text, matchExpr, valueExpr);
   }
}

The program will print
Match "hall"
Match "hallo"
Match "hall"
Match "allo"
Match "hall"

I have the following questions:
- why does test 1 match the expected "hall" while test 2 matches "hallo"
- why does test 1 match the whole string while test 4 matches only a
part of it.

Many thanks for your help and best regards
Florian Schwarz


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net