Boost logo

Boost Users :

Subject: Re: [Boost-users] PB with Regex Syntax in boost::regex 1.38 cause crashwith memory exhausted .
From: John Maddock (john_at_[hidden])
Date: 2009-05-13 11:59:30


>The problem is probably here : (\d*)(\S*).+Referer:(.+) because when i
>delete this part it work correctly.
>
>I try this expression and this file with a perl script and it work
>correctly , but with boost not .
>
>Can you help me ?

This is a deliberate "feature" in that what's happening is the complexity of
matching the regex has exceeded "safe" expectations: Perl in contrast will
just keep churning away trying to find a match even if it take "forever".
In the middle are a few cases where Perl eventually finds a match (albeit
with poor performance), and Boost.Regex throws an exception.

The way to fix this is to make the expression more explicit so that less
backtracking occurs. Judicious use of independent sub-expressions can help,
as can changing your repeats so that each branch in the state machine is
mutually exclusive, for example:

(\d*)(\S*).+Referer:(.+)

Could be better written as:

(\d+)(\D\S*)\s.*Referer:(.+)

which is not quite the same thing, or:

(\d*)(\D\S*)?\s.*Referer:(.+)

which will do the same thing, but with each branch there is only one choice
the machine can make.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net