Boost logo

Boost Users :

From: Joshua B. Smith (josh_at_[hidden])
Date: 2003-04-25 16:52:17


On Fri, Apr 25, 2003 at 08:27:54PM -0000, Dean wrote:
> That's what I (eventually) guessed was happening. Thanks for
> confirming my suspicion. However, it still seems possible that this
> was not the original design intention. I suppose only John Maddock
> can answer that question...

Does he read this list?

> It seems to me that when the code finds more than 1 "a", it should
> either:
>
> 1) skip past all subsequent "a"s before starting the scan again.
> This would cause "a{1}b" to be found in "ab" but
> not "aab", "aaab", "aaaab", etc. This would be very "greedy". :-)
>
> Or:
>
> 2) restart the scan 1 character after where the previous scan
> started. This would cause "a{1}b" to be found
> in "ab", "aab", "aaab", "aaaab", etc.
> FWIW, I'm told that the regex searcher in the .NET Framework exhibits
> behavior #1. I mention that only as a point of reference -- I
> realize that different implementations can have somewhat different
> correct behaviors.

Perl and Python both exhibit behavior #2. I think emacs does too.
and it doesn't surprise me that .Net is the greediest. :P

I think a lot of regex engines have been converging on a perlish
implementation. In fact, I'd never taken seriously the thought of
another way, but most of my regex work is done in python/perl (until
recently at any rate, I like to boost regex lib a lot).

> Anyway, it is either a bug or a "gotcha". I've been using regexs
> occasionaly for over 10 years and it "got" me. :-)

I wouldn't say it "got" you. Regex's are still 50% voodoo 50% trick and 2%
butterscotch ripple.

-jbs


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net