|
Boost Users : |
From: Dean (dean_at_[hidden])
Date: 2003-04-25 15:27:54
--- In Boost-Users_at_[hidden], "Joshua B. Smith" <josh_at_n...>
wrote:
> On Fri, Apr 25, 2003 at 05:35:06PM -0000, Dean wrote:
<snip>
>
> > While I can believe that the design intention was that "\d{3}-"
> > should be found in "1234567-" (at the fifth character), it seems
> > inconsistent that it is *not* also found in "123456-"
and "12345678-
> > ". I'm seeing that inconsistent behavior.
>
> It is not inconsistant because it fails to match then keeps going.
> It's all about greediness. For example:
>
> searching for a{1}b in strings
>
> 1) ab
> 2) aab
> 3) aaab
>
> searches correctly on 1 and incorrectly on 3 but not on 2 because
>
> a{1}b ab searches (correct)
> a{1}b aab Fails because it matched the two a's and then stopped
because the
> string is done
> a{1}b aaab Fails on aa then begins to scan again and finds ab which
> fits the regex a{1}b
>
> Makes sense?
<snip>
That's what I (eventually) guessed was happening. Thanks for
confirming my suspicion. However, it still seems possible that this
was not the original design intention. I suppose only John Maddock
can answer that question...
It seems to me that when the code finds more than 1 "a", it should
either:
1) skip past all subsequent "a"s before starting the scan again.
This would cause "a{1}b" to be found in "ab" but
not "aab", "aaab", "aaaab", etc. This would be very "greedy". :-)
Or:
2) restart the scan 1 character after where the previous scan
started. This would cause "a{1}b" to be found
in "ab", "aab", "aaab", "aaaab", etc.
FWIW, I'm told that the regex searcher in the .NET Framework exhibits
behavior #1. I mention that only as a point of reference -- I
realize that different implementations can have somewhat different
correct behaviors.
Anyway, it is either a bug or a "gotcha". I've been using regexs
occasionaly for over 10 years and it "got" me. :-)
Thanks again for the help!
--Dean
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net