Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2005-05-19 05:14:04


>> This one applies to Boost.RegEx, too, but I'll ask you: Why have
>> both regex_match() and regex_search() when the latter can behave
>> like the former by adding two anchors?
>
>
> This is true. I'm following the lead of the regex std proposal here, but
> I've never felt comfortable with regex_match, to be honest. A common
> noobie mistake is to use regex_match instead of regex_search. Perl, for
> instance, doesn't distinguish between "search" and "match" operations, and
> "search" is the default. What makes it worse is that in Perl circles, the
> semantic equivalent of regex_search is called /matching/, hence the
> disconnect. Not sure what to do. Perhaps John could comment.

If I remember correctly the original terminology was inherited from the GNU
regex package, and later got refined as a result of user feedback. But
Eric's correct it is a major source of confusion *for those migrating from
Perl*.

The aim was that the code should be quite explicit about what it's doing: a
programmer that sees regex_match would know that the code is looking to
match all of the text and not just some part of it.

>> Why does the regex_token_iterator<> ctor use a magic number like
>> -1 to indicate behavior rather than a named value? (I just
>> clicked through to the reference and see that it takes a
>> regex_constants::match_flag_type, but
>> http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examples.html#examples.split_a_string_using_a_regex_as_a_delimiter
>> shows passing -1 -- with an explanatory comment -- instead. This
>> leads to confusion.)
>
>
> Again, I'm just following the standard here, but providing a named
> constant would be a nice addition. The -1 is an optional 4th parameter,
> and the match_flag_type is an optional 5th parameter -- so there should be
> no confusion.

The -1 means "the thing before 0" and 0 is the whole of what matched, so -1
is the string before the bit that matched. Well that's the logic anyway.
Doesn't seem to have caused any confusion in practice, but there's no harm
in adding a named constant.

> The regex std proposal has match flags match_not_bol and match_not_eol, so
> I'm reusing this terminology. Boost.Regex also has match_not_bob for
> "beginning of buffer". This is not proposed for standardization, and I
> don't think the term "buffer" is appropriate anyway. You like "input" but
> I prefer "sequence". I dislike "input" becauase it might suggest to people
> that input iterators are acceptable to the regex algorithms, where as a
> bidirectional sequence is what is required.

Historically, those terms (or very similar) are used by GNU regex and the
BSD (Henry Spencer) packages. Renaming them would probably start a
bicycle-shed style discussion I guess. Good names are hard, especially if
the answer isn't immediately obvious!

HTH,

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk