Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2005-02-03 06:30:36


> Trust me, the regex works. I changed my flags to
> boost::regex_constants::normal, and it all worked just fine, however it
> does not explain why when I was using just the flags
> boost::regex_constants::char_classes | boost::regex_constants::intervals,
> it refused to match more than two words.

The problem is that you need to specify what kind of regular expression it
is (basic, extended, Perl etc), if you don't specify anything it defaults to
something like POSIX-Basic semantics, which leads to leftmost longest
matching being selected.

Now onto your expression, the problem here is the [[:space:]]* part: because
this can match zero times, your expression could be reduced down to
something equivalent to: "([[:alnum:]]+)+" *in the worst case*, and this
is the classic "may take forever to match example".

It works when you use Perl matching semantics, because the matcher stops as
soon as a match is found, if no match is found, then it may well thrash
indefinitely (leading to an exception eventually). When POSIX matching
semantics are selected, the leftmost longest rule causes the matcher to
thrash looking for the "best" possible match, again leading to the
exception.

So to conclude: specify that you want Perl-style regexes (unless you really
want POSIX leftmost longest rules).
And, change your expression to something like:
"^([[:alpha:]][-[:alnum:]]*(?:[[:space:]]+|$))+$"

Hope this helps,

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk