Boost logo

Boost Users :

From: Dean (dean_at_[hidden])
Date: 2003-04-25 15:27:54


--- In Boost-Users_at_[hidden], "Joshua B. Smith" <josh_at_n...>
wrote:
> On Fri, Apr 25, 2003 at 05:35:06PM -0000, Dean wrote:
<snip>
>
> > While I can believe that the design intention was that "\d{3}-"
> > should be found in "1234567-" (at the fifth character), it seems
> > inconsistent that it is *not* also found in "123456-"
and "12345678-
> > ". I'm seeing that inconsistent behavior.
>
> It is not inconsistant because it fails to match then keeps going.
> It's all about greediness. For example:
>
> searching for a{1}b in strings
>
> 1) ab
> 2) aab
> 3) aaab
>
> searches correctly on 1 and incorrectly on 3 but not on 2 because
>
> a{1}b ab searches (correct)
> a{1}b aab Fails because it matched the two a's and then stopped
because the
> string is done
> a{1}b aaab Fails on aa then begins to scan again and finds ab which
> fits the regex a{1}b
>
> Makes sense?
<snip>

That's what I (eventually) guessed was happening. Thanks for
confirming my suspicion. However, it still seems possible that this
was not the original design intention. I suppose only John Maddock
can answer that question...

It seems to me that when the code finds more than 1 "a", it should
either:

1) skip past all subsequent "a"s before starting the scan again.
This would cause "a{1}b" to be found in "ab" but
not "aab", "aaab", "aaaab", etc. This would be very "greedy". :-)

Or:

2) restart the scan 1 character after where the previous scan
started. This would cause "a{1}b" to be found
in "ab", "aab", "aaab", "aaaab", etc.

FWIW, I'm told that the regex searcher in the .NET Framework exhibits
behavior #1. I mention that only as a point of reference -- I
realize that different implementations can have somewhat different
correct behaviors.

Anyway, it is either a bug or a "gotcha". I've been using regexs
occasionaly for over 10 years and it "got" me. :-)

Thanks again for the help!

--Dean


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net