
I sent this originally to James Maddock, but realized this is probably a better place to post it. At work we've been testing out regex (nice work BTW) in some of our code, and appear to have found a bug. We ran into it parsing HTML, and I've written a test C++ app to reproduce it. In the program below, the output should be the same for both searches as far as I can tell, but it's not. I don't know if it's some interaction with the quote character or something like that. We attempted to use other quantifiers (after '?', we tried '*', '{0,1}', ["]?) to no avail. I'm confident this is not user error. The extra grouping is annoying (in "goodPatternStr"), but is an acceptable workaround. The strange thing is that a non-capturing group doesn't fix it. Ideas? --Mark Ping -------------------------------------------------------------------------------------- output: input: <input type="hidden" name="MfcISAPICommand" value="SellYourItem" badOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem badOutName : badOutValue:SellYourItem goodOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem goodOutName :MfcISAPICommand goodOutValue:SellYourItem Press any key to continue -------------------------------------------------------------------------------------- #include <iostream> #include <string> #include <boost/regex.hpp> using namespace boost; using namespace std; int main() { const char* badPatternStr = "<input[^>]*name=\"?([^> \"]*)[^>]*value=\"?([^> \"]*)"; const char* goodPatternStr = "<input[^>]*name=(\"?)([^> \"]*)[^>]*value=\"?([^> \"]*)"; // ^^^^^ //note that the only difference between bad and good is that "good" //has grouping around the optional " after 'name=' // //In both versions of the matches, the second group is matched. //Only the first group has this problem. boost::match_results<std::string::const_iterator> what; std::string in = "<input type=\"hidden\" name=\"MfcISAPICommand\" value=\"SellYourItem\""; std::string::const_iterator start, end; start = in.begin(); end = in.end(); unsigned int flags = boost::match_default; std::string badOut, badOutName, badOutValue; if(regex_search(start, end, what, regex(badPatternStr), flags)) { badOut = what[0]; int numGroups = what.size(); // name badOutName = what[1]; badOutValue = what[2]; } std::string goodOut, goodOutName, goodOutValue; if(regex_search(start, end, what, regex(goodPatternStr), flags)) { goodOut = what[0]; int numGroups = what.size(); // name goodOutName = what[2]; goodOutValue = what[3]; } cout << "input: "; cout << in << endl << endl; cout << "badOut :" << badOut << endl; cout << "badOutName :" << badOutName << endl; cout << "badOutValue:" << badOutValue << endl; cout << endl; cout << "goodOut :" << goodOut << endl; cout << "goodOutName :" << goodOutName << endl; cout << "goodOutValue:" << goodOutValue << endl; return 0; }