|
Boost Users : |
Subject: [Boost-users] Boost-regex: Weird behaviour with non-greedy matching operator in regex_replace in boost 1.40?
From: Florian Schwarz (florian.schwarz_at_[hidden])
Date: 2009-09-23 04:11:55
Hi,
the non-greedy matching seems to have a weird behaviour (maybe
non-determeterministic?) to work if the pattern is preceeded with something.
E.g. when I want to get all characters in a string except the last on,
if its an 'o', I would use
regex matchExpr("(.*?)o?");
So if I write
string text("hallo");
regex matchExpr("(.*?)o?");
string valueExpr("$1");
string result;
regex_replace(back_inserter(resul), text.begin(), text.end(),
matchExpr, valueExpr);
cout << "Match \"" << result << "\"" << endl;
it will print the expected "hall". If I now use instead
string text(" hallo");
regex matchExpr(" (.*?)o?");
it will print "hallo". And for
string text("hhallo");
regex matchExpr("h(.*?)o?");
string valueExpr("$1");
regex_replace(back_inserter(resul), text.begin(), text.end(),
matchExpr, valueExpr);
cout << "Match \"" << result << "\"" << endl;
it will print "allo" which seems even more strange to me.
I'm using boost1.40 on ubuntu with g++ 4.2.4
To give you a complete example:
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
using namespace boost;
void test(string& text, regex& matchExpr, string valueExpr){
if (regex_match(text, matchExpr)){
string result;
regex_replace(back_inserter(result), text.begin(), text.end(),
matchExpr, valueExpr);
cerr << "Match \"" << result << "\"" << endl;
}
}
int main(){
{ // Test 1
string text("hallo");
regex matchExpr("(.*?)o?");
string valueExpr("$1");
test(text, matchExpr, valueExpr);
}
{ // Test 2
string text(" hallo");
regex matchExpr(" (.*?)o?");
string valueExpr("$1");
test(text, matchExpr, valueExpr);
}
{// Test 3
string text(" hallo");
regex matchExpr("^ (.*?)o?$");
string valueExpr("$1");
test(text, matchExpr, valueExpr);
}
{ // Test 4
string text("hhallo");
regex matchExpr("h(.*?)o?");
string valueExpr("$1");
test(text, matchExpr, valueExpr);
}
{ // Test 5
string text("hhallo");
regex matchExpr("^h(.*?)o?$");
string valueExpr("$1");
test(text, matchExpr, valueExpr);
}
}
The program will print
Match "hall"
Match "hallo"
Match "hall"
Match "allo"
Match "hall"
I have the following questions:
- why does test 1 match the expected "hall" while test 2 matches "hallo"
- why does test 1 match the whole string while test 4 matches only a
part of it.
Many thanks for your help and best regards
Florian Schwarz
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net