Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2007-02-06 07:31:15


Bob Kline wrote:
> We have encountered a problem in the regex++ package, which reports
> having exhausted memory after examining a very short string with a
> regular expression of only modest complexity. I realize that the
> documentation for the package does not specify how much memory usage
> is too much, but since the same combination of regular expression and
> test string works without problems with a number of other programming
> toolsets (e.g., Python, Perl, C#) I'm guessing that the maintainers of
> the package would be interested in tracking down the problem (I would
> if it were my software). Here's a repro case which boils the problem
> down to the tiniest example:

This is probably not what you want to hear, but the behaviour is by design:
Perl-regexes are in the worst case NP-Hard problems to solve, so there is a
built in state-count that tracks how many states in the machine are visited
and throws an exception if the number of states visited looks too large
compared to the size of the string. The aim is to weed out "bad" regexes
that may cause problems with certain texts and not others before they can do
too much damage. Obviously this is a heuristic that sometimes throws too
early or too late.

When this issue occurs it's almost always a problem with the expression
being ambiguous: each time a repeat or other decision has to be made then
either alternative can be taken, so if no match is found the regex state
machine has to thrash around trying ever possible combination.

Is there any reason why you can't simplify your expression to just
"^[^\\s]+( ([^\\s]+)*)?$" which does the same thing much more efficiently?

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net