Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2008-01-05 04:53:57


Arne Babnik wrote:
> Hello,
>
> I'm using boost::regex 1.34.0 and experience behaviour with mixed
> greedy and non-greedy operators in the same expression which I do not
> understand.
>
> I have either one of the following input strings:
> user1_at_[hidden]
> user2
>
> and want to keep only the usernames, so I use the following
> expression: (.*?)(@.*)?
> with the format string $1.
>
> I would expect that basically the regex machine would first try to
> satisfy the greedy operator "?", and then fill up the non-greedy "*?"
> with the remaing part of the input.

No that's not how Perl-Regexes work, they move from left to right through
the expression matching each part in turn, and then backtracking if they
can't satisfy something.

In the case of (.*?)(@.*)? the (.*?) part can suceesfully match zero
characters by repeating zero times as can (@.*)?, so there are multiple
matches to the string possible, each of zero characters in length (It's more
complicated still when there are zero length matches, but that will do for
now!).

So instead try:

([^@]+)(?:@.*)?

which I believe will do as you want.

HTH, John.

> However, the following does NOT work (the hostname is not removed):
> std::string output;
> output = boost::regex_replace(std::string("user1_at_[hidden]"),
> boost::regex("(.*?)(@.*)?"),
> std::string("$1"),
>
> boost::regex_constants::format_first_only);
>
> However, if I omit the "format_first_only", it works as expected.
>
> What do I miss?
>
>
> Thanks,
>
> Arne
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net