Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2004-07-21 05:44:22


> using the latest regex patch and vc7.1 i've accidentally encountered the
> following:
>
> ...
> regex re("([^\n]*\\n+\\s+)+NEEDEDSUBITEM2:[^\\s]");
> bool matched = regex_search(text, re); // bad_expression

I think the problem is that the first repeated section:

([^\n]*\\n+\\s+)+

starts and ends with repeats either of which can match repeated whitespace -
this is what causes the matcher to thrash trying to find a match, eventually
leading to it giving up and throwing an exception, I think you could make
your expression much more precise by using:

 regex re("([^\n]*\\n+)+\\s+NEEDEDSUBITEM2:[^\\s]");

By moving the \s+ out side of the repeat like this the expression is now
much more deterministic - it can only do one thing for any given input
character.

> and one question:
> having "DATA.*?ITEM1(ITEM2)?" and an input like "DATA ITEM1 ITEM1ITEM2"
> should ITEM2 be extracted?
> i think it would be good to make a note on this case in the doc.

No for Perl regexes, not sure for POSIX regexes (non-greedy repeats don't
sit will with POSIX semantics in cases like this, I'd advise using Perl
regexes only with non-greedy repeats).

John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net