Boost logo

Boost Users :

From: emarkp (mping_at_[hidden])
Date: 2002-08-15 17:30:44


I sent this originally to James Maddock, but realized this is
probably a better place to post it.

At work we've been testing out regex (nice work BTW) in some of our
code, and appear to have found a bug. We ran into it parsing HTML,
and I've written a test C++ app to reproduce it.

In the program below, the output should be the same for both searches
as far as I can tell, but it's not. I don't know if it's some
interaction with the quote character or something like that. We
attempted to use other quantifiers (after '?', we tried '*', '{0,1}',
["]?) to no avail. I'm confident this is not user error. The extra
grouping is annoying (in "goodPatternStr"), but is an acceptable
workaround. The strange thing is that a non-capturing group doesn't
fix it.

Ideas?

--Mark Ping

--------------------------------------------------------------------------------------
output:
input: <input type="hidden" name="MfcISAPICommand" value="SellYourItem"

badOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem
badOutName :
badOutValue:SellYourItem

goodOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem
goodOutName :MfcISAPICommand
goodOutValue:SellYourItem
Press any key to continue
--------------------------------------------------------------------------------------
#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace boost;
using namespace std;

int main()
{
 const char* badPatternStr = "<input[^>]*name=\"?([^> \"]*)[^>]*value=\"?([^> \"]*)";
 const char* goodPatternStr = "<input[^>]*name=(\"?)([^> \"]*)[^>]*value=\"?([^> \"]*)";
 // ^^^^^
 //note that the only difference between bad and good is that "good"
 //has grouping around the optional " after 'name='
 //
 //In both versions of the matches, the second group is matched.
 //Only the first group has this problem.

 boost::match_results<std::string::const_iterator> what;

 std::string in = "<input type=\"hidden\" name=\"MfcISAPICommand\" value=\"SellYourItem\"";
 std::string::const_iterator start, end;
 start = in.begin();
 end = in.end();

 unsigned int flags = boost::match_default;

 std::string badOut, badOutName, badOutValue;
 if(regex_search(start, end, what, regex(badPatternStr), flags))
 {
   badOut = what[0];

   int numGroups = what.size();

   // name
   badOutName = what[1];
   badOutValue = what[2];
 }

 std::string goodOut, goodOutName, goodOutValue;
 if(regex_search(start, end, what, regex(goodPatternStr), flags))
 {
   goodOut = what[0];

   int numGroups = what.size();

   // name
   goodOutName = what[2];
   goodOutValue = what[3];
 }

 cout << "input: ";
 cout << in << endl << endl;

 cout << "badOut :" << badOut << endl;
 cout << "badOutName :" << badOutName << endl;
 cout << "badOutValue:" << badOutValue << endl;

 cout << endl;

 cout << "goodOut :" << goodOut << endl;
 cout << "goodOutName :" << goodOutName << endl;
 cout << "goodOutValue:" << goodOutValue << endl;

 return 0;
}


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net