Boost logo

Boost :

From: Rob Stewart (stewart_at_[hidden])
Date: 2005-05-18 14:14:15


From: "Eric Niebler" <eric_at_[hidden]>
>
> Docs at:
> http://boost-sandbox.sf.net/libs/xpressive

I was reading through a portion of the docs and a few issues came
to mind.

This one applies to Boost.RegEx, too, but I'll ask you: Why have
both regex_match() and regex_search() when the latter can behave
like the former by adding two anchors?

Why does the regex_token_iterator<> ctor use a magic number like
-1 to indicate behavior rather than a named value? (I just
clicked through to the reference and see that it takes a
regex_constants::match_flag_type, but
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/examples.html#examples.split_a_string_using_a_regex_as_a_delimiter
shows passing -1 -- with an explanatory comment -- instead. This
leads to confusion.)

The following items are from the "Perl syntax vs. Static
xpressive syntax" table in
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/xpressive/creating_a_regex_object.html:

   You seem to suggest that the xpressive equivalent of Perl's
   "a|b" must be spelled "a | b" but as far as I can see, the
   whitespace is irrelevant, so calling attention to it suggests
   a difference that doesn't exist.

   "bos" and "eos" are a little odd. First, it seems like
   "sequence" should be "input." Second, I usually think of
   SOF/EOF and SOL/EOL pairs rather than BOF/EOF and BOL/EOL.
   Thus, I'd have gone with "soi" and "eoi" at the least.
   Unfortunately, in an effort to keep them short, they aren't
   terribly mnemonic. How about "start" and "end" (or "beg" and
   "end" if you want to go with just three letters)?

   . appears twice in the table with two different equivalences.
   It may be that the two are effectively the same, but they
   aren't grouped and the "Meaning" doesn't point out their
   equivalence.

   Considering how much you compare xpressive to Perl's REs, I'm
   surprised you opted for ~_d instead of _D, for example. I'm
   not saying that would be better, but the disconnect from Perl
   didn't seem necessary in this case. (I do recognize that
   you're using ~ to mean negation of the following subexpression
   in many other cases, so perhaps you just determined that being
   consistent in expressing negation was more important.)

   For "[abc]," you show to different xpressive equivalents, each
   in its own row of the table. Why not combine them into a
   single row? (Same for any other cases like that.)

A tool that converts a Perl-style RE to xpressive (static
notation certainly, and dynamic if there are any differences)
would be quite helpful (for those that know Perl's REs).

-- 
Rob Stewart                           stewart_at_[hidden]
Software Engineer                     http://www.sig.com
Susquehanna International Group, LLP  using std::disclaimer;

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk