Boost logo

Boost Users :

Subject: Re: [Boost-users] [regex] why partial match / early break ?
From: U.Mutlu (for-gmane_at_[hidden])
Date: 2011-11-01 16:51:04


Anthony Foiani wrote, On 2011-11-01 19:10:
> "U.Mutlu"<for-gmane_at_[hidden]> writes:
>
>> why is this regex
>> const string sRe =
>> "((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9])\\.)+"
>> "((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9]))";
>>
>> not matching this string wholly?
>> "a1a.a2a.a3a.a4aaaa"
>>
>> It rather matches only this part:
>> "a1a.a2a.a3a.a4"
>>
>
> I think you're getting caught by "first match, not longest match".
> Put differently, the regex engine isn't backtracking where you think
> it is.

Thanks for the detailed info, it helped me much.

> I'm also curious if you're capturing those substrings for some
> purpose?

Actually it was my first attempt for writing a "regex for hostnames",
because the regex I had found on the web had some bugs.
I know there are RFC's for hostname, and that nowadays hostname elements
can also start with a digit, but for me a simple (ie. the old) solution suffices.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net