Boost logo

Boost Users :

From: Joshua B. Smith (josh_at_[hidden])
Date: 2003-04-25 13:00:52


On Fri, Apr 25, 2003 at 05:35:06PM -0000, Dean wrote:
> --- In Boost-Users_at_[hidden], "Joshua B. Smith" <josh_at_n...>
> wrote:
> I'm not sure what you were trying to say above, but my understanding
> is that the 2 patterns you just mentioned are equivalent. The docs
> say "{3}" is equivalent to "{3,3}" not "{3,}".

That is what I was trying to say, just not very clearly :)

> I'm doing a search because I don't want to know whether the whole
> string matches but whether the regex is found in the string.
> Specifically, I'm doing:
>
> m_regex.Search( sampleBody, boost::match_default | boost::match_any)

OK. That's kinda what I figured.

> While I can believe that the design intention was that "\d{3}-"
> should be found in "1234567-" (at the fifth character), it seems
> inconsistent that it is *not* also found in "123456-" and "12345678-
> ". I'm seeing that inconsistent behavior.

It is not inconsistant because it fails to match then keeps going.
It's all about greediness. For example:

searching for a{1}b in strings

1) ab
2) aab
3) aaab

searches correctly on 1 and incorrectly on 3 but not on 2 because

a{1}b ab searches (correct)
a{1}b aab Fails because it matched the two a's and then stopped because the
      string is done
a{1}b aaab Fails on aa then begins to scan again and finds ab which
      fits the regex a{1}b
      
Makes sense?

> I realize there is more than one way to do it, and I'd be interested
> in what you'd recommend.
>
> FWIW, in our SSN-matching case, we'll probably just use "\b\d{3}-\d
> {2}-\d{4}\b".

I too would probably use boundries. Or, you can use a regex_match on the
the string returned on the regex_search. Or do both, it depends on how
much I wanted to test the data for correctness. I tend do a search then match
when I'm using hairy inputs. You can also use spaces
like:

\s*\d{3}-\d{2}-\d{4}\s*

I tend to not use \b for no good reason
or something like this maybe [\s,\.]*\d{3}-\d{2}-\d{4}[\s,\.]*

-jbs


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net