Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2004-04-09 11:00:57


> I'm not sure I understand what you're saying. A regular expression either
(completely) matches a given text, or it doesn't; if we ignore >parentheses
for a moment, then there's no room for different "matching strategies"
(like, say, POSIX or not).
>
> When searching for (i.e., grepping) a regular expression, POSIX states
"left-most longest", so it shouldn't be ambiguous to determine the left-most
longest match among all possible matches (with parenthesis indices breaking
ties), even if the RE contains a non-greedy repeat.
>
> Do you have a specific example at hand why you think non-greedy repeats
are inappropriate for POSIX-style greps? Personally, I find >it very
convenient to write REs like "<b>.*?</b>", especially in POSIX mode.

That's actually quite a good example - it doesn't produce a
"leftmost-longest" match does it?

I forgot to mention that you can mix modes if you really want to, by
compiling the expression as a Perl regex and then passing match_posix to the
matching functions (although there are a couple of Perl specific features
that don't work in POSIX matching mode - independent sub-expressions is one
that comes to mind, and more will be added in the future).

In all seriousness though, why not use Perl-regexes when you want
Perl-compatible features? This is the default if you don't specify a mode
anyway, and it's also faster than POSIX leftmost-longest mode.

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk