From: John Maddock (john_at_[hidden])
Date: 2007-03-22 06:07:31
Michael Goldshteyn wrote:
> ... this actually does highlight an apparent bug/deficiency. A much
> simpler example is given below:
> boost::regex test_regex("(?:.[a-z0-9]+)+");
> boost::regex_match("www.appwebsite1.mydomain.com/1/1/", test_regex);
> This throws the same exception, at least in a VC 8.0 build using the
> DLL version of the reg-ex lib (debug), in
> perl_matcher_non_recursive.hpp at line 164 (raise_error(traits_inst,
> regex_constants::error_space);), because the state_count is 107429
> and the max_state_count is 107425.
Right, the behaviour is deliberate and by design: it tries to protect you
from pathological regexes that may take "forever" to try and match.
Obviously the logic is heuristic based and therefore imperfect, but in this
case there is still a bug in your regex: I assume you meant to use
"(?:\\.[a-z0-9]+)+" to match a literal "." rather than the regex meaning of
"any character". This would have been OK.
The reason your example is pathological is because given a long string of
characters, then for each character the state machine can either take the
inner repeat or the outer repeat and the number of states visited grows
exponentially with the number of characters in the string: the expression is
effectively the same as (.+)+ which is the classic patholgical test case for
Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk