|
Boost Users : |
From: emarkp (mping_at_[hidden])
Date: 2002-08-15 17:30:44
I sent this originally to James Maddock, but realized this is
probably a better place to post it.
At work we've been testing out regex (nice work BTW) in some of our
code, and appear to have found a bug. We ran into it parsing HTML,
and I've written a test C++ app to reproduce it.
In the program below, the output should be the same for both searches
as far as I can tell, but it's not. I don't know if it's some
interaction with the quote character or something like that. We
attempted to use other quantifiers (after '?', we tried '*', '{0,1}',
["]?) to no avail. I'm confident this is not user error. The extra
grouping is annoying (in "goodPatternStr"), but is an acceptable
workaround. The strange thing is that a non-capturing group doesn't
fix it.
Ideas?
--Mark Ping
--------------------------------------------------------------------------------------
output:
input: <input type="hidden" name="MfcISAPICommand" value="SellYourItem"
badOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem
badOutName :
badOutValue:SellYourItem
goodOut :<input type="hidden" name="MfcISAPICommand" value="SellYourItem
goodOutName :MfcISAPICommand
goodOutValue:SellYourItem
Press any key to continue
--------------------------------------------------------------------------------------
#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace boost;
using namespace std;
int main()
{
const char* badPatternStr = "<input[^>]*name=\"?([^> \"]*)[^>]*value=\"?([^> \"]*)";
const char* goodPatternStr = "<input[^>]*name=(\"?)([^> \"]*)[^>]*value=\"?([^> \"]*)";
// ^^^^^
//note that the only difference between bad and good is that "good"
//has grouping around the optional " after 'name='
//
//In both versions of the matches, the second group is matched.
//Only the first group has this problem.
boost::match_results<std::string::const_iterator> what;
std::string in = "<input type=\"hidden\" name=\"MfcISAPICommand\" value=\"SellYourItem\"";
std::string::const_iterator start, end;
start = in.begin();
end = in.end();
unsigned int flags = boost::match_default;
std::string badOut, badOutName, badOutValue;
if(regex_search(start, end, what, regex(badPatternStr), flags))
{
badOut = what[0];
int numGroups = what.size();
// name
badOutName = what[1];
badOutValue = what[2];
}
std::string goodOut, goodOutName, goodOutValue;
if(regex_search(start, end, what, regex(goodPatternStr), flags))
{
goodOut = what[0];
int numGroups = what.size();
// name
goodOutName = what[2];
goodOutValue = what[3];
}
cout << "input: ";
cout << in << endl << endl;
cout << "badOut :" << badOut << endl;
cout << "badOutName :" << badOutName << endl;
cout << "badOutValue:" << badOutValue << endl;
cout << endl;
cout << "goodOut :" << goodOut << endl;
cout << "goodOutName :" << goodOutName << endl;
cout << "goodOutValue:" << goodOutValue << endl;
return 0;
}
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net