Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2005-05-18 15:00:32


Thanks for the feedback. Answers inline...

Rob Stewart wrote:
> From: "Eric Niebler" <eric_at_[hidden]>
>
>>Docs at:
>>http://boost-sandbox.sf.net/libs/xpressive
>
>
> I was reading through a portion of the docs and a few issues came
> to mind.
>
> This one applies to Boost.RegEx, too, but I'll ask you: Why have
> both regex_match() and regex_search() when the latter can behave
> like the former by adding two anchors?

This is true. I'm following the lead of the regex std proposal here, but
I've never felt comfortable with regex_match, to be honest. A common
noobie mistake is to use regex_match instead of regex_search. Perl, for
instance, doesn't distinguish between "search" and "match" operations,
and "search" is the default. What makes it worse is that in Perl
circles, the semantic equivalent of regex_search is called /matching/,
hence the disconnect. Not sure what to do. Perhaps John could comment.

>
> Why does the regex_token_iterator<> ctor use a magic number like
> -1 to indicate behavior rather than a named value? (I just
> clicked through to the reference and see that it takes a
> regex_constants::match_flag_type, but
> http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examples.html#examples.split_a_string_using_a_regex_as_a_delimiter
> shows passing -1 -- with an explanatory comment -- instead. This
> leads to confusion.)

Again, I'm just following the standard here, but providing a named
constant would be a nice addition. The -1 is an optional 4th parameter,
and the match_flag_type is an optional 5th parameter -- so there should
be no confusion.

>
> The following items are from the "Perl syntax vs. Static
> xpressive syntax" table in
> http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creating_a_regex_object.html:
>
> You seem to suggest that the xpressive equivalent of Perl's
> "a|b" must be spelled "a | b" but as far as I can see, the
> whitespace is irrelevant, so calling attention to it suggests
> a difference that doesn't exist.

Naturally whitespace is irrelevant. That's how C++ works. I don't think
this should be a source of confusion for people.

>
> "bos" and "eos" are a little odd. First, it seems like
> "sequence" should be "input." Second, I usually think of
> SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL.
> Thus, I'd have gone with "soi" and "eoi" at the least.
> Unfortunately, in an effort to keep them short, they aren't
> terribly mnemonic. How about "start" and "end" (or "beg" and
> "end" if you want to go with just three letters)?

The regex std proposal has match flags match_not_bol and match_not_eol,
so I'm reusing this terminology. Boost.Regex also has match_not_bob for
"beginning of buffer". This is not proposed for standardization, and I
don't think the term "buffer" is appropriate anyway. You like "input"
but I prefer "sequence". I dislike "input" becauase it might suggest to
people that input iterators are acceptable to the regex algorithms,
where as a bidirectional sequence is what is required.

>
> . appears twice in the table with two different equivalences.
> It may be that the two are effectively the same, but they
> aren't grouped and the "Meaning" doesn't point out their
> equivalence.

Yes the docs are misleading here. In perl, . can have two meanings,
depending on the /s modifier. xpressive's docs should be more specific.

>
> Considering how much you compare xpressive to Perl's REs, I'm
> surprised you opted for ~_d instead of _D, for example. I'm
> not saying that would be better, but the disconnect from Perl
> didn't seem necessary in this case.

It is necessary. _D is an illegal identifier, reserved to the
implementation. All identifiers that begin with an underscore and a
capital letter are illegal in user code. Even if that were not the case,
ALL CAPS is reserved for macros by convention. That's how I ended up
with ~_d.

>
> For "[abc]," you show to different xpressive equivalents, each
> in its own row of the table. Why not combine them into a
> single row? (Same for any other cases like that.)

Sure.

>
> A tool that converts a Perl-style RE to xpressive (static
> notation certainly, and dynamic if there are any differences)
> would be quite helpful (for those that know Perl's REs).
>

Total agreement. It's on my list, but reaching v1.0 is a higher priority
for me right now.

Thanks!

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk