Boost logo

Boost :

From: Michael Goldshteyn (mgoldshteyn_at_[hidden])
Date: 2007-03-22 08:53:45


"John Maddock" <john_at_[hidden]> wrote in message
news:014401c76c69$ef3f56e0$877c1f56_at_fuji...
> Michael Goldshteyn wrote:
>> ... this actually does highlight an apparent bug/deficiency. A much
>> simpler example is given below:
>>
>> ---
>> boost::regex test_regex("(?:.[a-z0-9]+)+");
>>
>> boost::regex_match("www.appwebsite1.mydomain.com/1/1/", test_regex);
>> ---
>>
>> This throws the same exception, at least in a VC 8.0 build using the
>> DLL version of the reg-ex lib (debug), in
>> perl_matcher_non_recursive.hpp at line 164 (raise_error(traits_inst,
>> regex_constants::error_space);), because the state_count is 107429
>> and the max_state_count is 107425.
>
> Right, the behaviour is deliberate and by design: it tries to protect you
> from pathological regexes that may take "forever" to try and match.
> Obviously the logic is heuristic based and therefore imperfect, but in
> this
> case there is still a bug in your regex: I assume you meant to use
> "(?:\\.[a-z0-9]+)+" to match a literal "." rather than the regex meaning
> of
> "any character". This would have been OK.
>
> The reason your example is pathological is because given a long string of
> characters, then for each character the state machine can either take the
> inner repeat or the outer repeat and the number of states visited grows
> exponentially with the number of characters in the string: the expression
> is
> effectively the same as (.+)+ which is the classic patholgical test case
> for
> backtracking regexes.
>
> HTH, John.

I understand the issues, now, but am wondering whether control over the
"maximum number of states" should be given to the caller or if the issue is
one of running out of memory?

Michael Goldshteyn


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk