[Boost-bugs] [Boost C++ Libraries] #8392: Too complex regex for boost::regex (perl, extended) isn't so complex

Subject: [Boost-bugs] [Boost C++ Libraries] #8392: Too complex regex for boost::regex (perl, extended) isn't so complex
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-04-04 13:41:16


#8392: Too complex regex for boost::regex (perl, extended) isn't so complex
------------------------------+---------------------------------------------
 Reporter: bkalinczuk@… | Type: Bugs
   Status: new | Milestone: To Be Determined
Component: None | Version: Boost 1.52.0
 Severity: Problem | Keywords:
------------------------------+---------------------------------------------
 Here I have regular expression:
 "[a-e]+[b-f]+[ac-f]+[abd-f]+[a-cef]+[a-df]+"

 This expression raises exception: "The complexity of matching the regular
 expression exceeded predefined bounds. Try refactoring the regular
 expression to make each choice made by the state machine unambiguous.
 This exception is thrown to prevent "eternal" matches that take an
 indefinite period time to locate."

 Obviously perl regex and grep has no problem with this.
 According to my calculations this regex should use up to 2^7 states, which
 is barely 128.
 However boost implementation states, that it exceeds maximum of ptrdiff.

 Here is code in c++ (compiled with gcc-4.7.2-11 and boost-1.53.0 and no
 additional flags (except -I-L-l)):
 #define BOOST_REGEX_MAX_STATE_COUNT
 std::ptrdiff_t(std::numeric_limits<std::ptrdiff_t>::max)

 #include <iostream>
 #include <boost/regex.hpp>
 #include <string>

 int main()
 {
     boost::regex exp("[a-e]+[b-f]+[ac-f]+[abd-f]+[a-cef]+[a-df]+",
 boost::regex::perl);
     std::string text1 =
 "bbbbbbbccccccccccccccccccccccccbbbbbbbbbbbbbbbbcccccccccccccccccccccbbbbbbbbaaaaa";
     std::string text2 =
 "bbbbbbbccccccccccccccccccccccccbbbbbbbbbbbbbbbbcccccccccccccccccccccbbbbbbbbaaaaae";
     std::cout << BOOST_REGEX_MAX_STATE_COUNT << std::endl;
     std::cout << boost::regex_match(text1, exp) << std::endl;
     std::cout << boost::regex_match(text2, exp) << std::endl;
 }

 Here analog grep match:
 echo
 bbbbbbccccccccccccccccccccccccbbbbbbbbbbbbbbbbcccccccccccccccccccccbbbbbbbbaaaaae
 | grep "[a-e]\+[b-f]\+[ac-f]\+[abd-f]\+[a-cef]\+[a-df]\+"

 And analog perl code:
 if
 ("bbbbbbbccccccccccccccccccccccccbbbbbbbbbbbbbbbbcccccccccccccccccccccbbbbbbbbaaaaa"
 =~ m/^[a-e]+[b-f]+[ac-f]+[abd-f]+[a-cef]+[a-df]+$/)
 {
     print "MATCHED\n";
 }
 else
 {
     print "NOT MATCHED\n";
 }
 if
 ("bbbbbbbccccccccccccccccccccccccbbbbbbbbbbbbbbbbcccccccccccccccccccccbbbbbbbbaaaaae"
 =~ m/^[a-e]+[b-f]+[ac-f]+[abd-f]+[a-cef]+[a-df]+$/)
 {
     print "MATCHED\n";
 }
 else
 {
     print "NOT MATCHED\n";
 }

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/8392>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:12 UTC