Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2003-12-02 22:23:00


Spinka, Kristofer wrote:
> It would be nice if the sub_match struct had a collection member, also
> accessible by an index operator ([]), that would normally be null,
> unless there were sub_match?s. Also, the existing cast operator for the
> string could be expanded to iterate through them.
>
> Is something like this in the works? Or maybe even already available?

John can correct me if I'm wrong, but I believe that boost.regex behaves
as perl 5 does in this regard; that is, when capturing parens match more
than once, only the last match is returned. And nested regexes are not
allowed.

As Larry Wall said in Apocalypse 5:
"... It also breaks down when the capturing parentheses match more than
once. Perl handles this currently by returning only the last match. This
is slightly better than useless, but not by much."

Perl 6 will give users the option to assign repeated captures to a list.
I agree something like that would be useful in C++. I don't know if
there are plans to add this functionality to boost.regex.

There is another option, but I hesitate to recommend it because the code
is only about 1/8th baked at this point, but I'm developing a new
regex-like library that handles nested expression matching. To write a
nested regex, you need to write it as an expression template, however.

For instance, imagine you wanted to extract words from a sentence:

using namespace xpressive;
cregex rx_word = +alnum;
cregex rx_sentence = *( rx_word >> !as_xpr(',') >> space )
>> rx_word >> punct;
char const str[] = "I came, I saw, I conquered!";
char const *begin=str, *end=str+sizeof(str)-1;
cmatch what;
regex_search( rx_sentence, begin, end, what );
for( int n=0; what(rx_word,n); ++n )
{
     std::cout << what(rx_word,n)[0] << std::endl;
}

In this example, the cmatch struct "what" receives the results of the
search. It behaves both like a vector of backreferences, and also like a
tree of nested cmatch structures. "what(rx_word,n)" returns the cmatch
associated with the Nth invocation of the nested rule "rx_word". And
"what(rx_word,n)[0]" returns the 0th backreference (whole match) of that
cmatch structure. The above code displays:

I
came
I
saw
I
conquered

The code is currently checked into the boost-sandbox at boost/xpressive,
available via anonymous CVS.

WARNING: using half-baked code can be hazardous to your mental health. ;-)

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk