Boost logo

Boost :

Subject: Re: [boost] [xpressive] Support for multi-capture and balancing groups
From: Eric Niebler (eric_at_[hidden])
Date: 2010-10-02 23:12:33


On 10/2/2010 5:52 PM, Erik Rydgren wrote:
> Hi!
>
> My company have been using pcre for a long while but there has been
> gripes about it only returning the last matched value from a capture
> group. Because of this I have been searching for a C++ regex engine
> that can handle the same stuff as the .NET implementation can do, to
> no avail. During my searches I've stumbled on several forum threads
> where others have been searching for the same thing but it doesn't
> seem to exist a regular expression library in C/C++ that handles both
> named captures and multicapture.
>
> Found boost.xpressive and it had almost everything we need. It's
> open source, fast, got flexible api and named captures. But alas,
> just as all other C and C++ based implementations I have found, it
> lacked multiple captures.

It sort of has them, just not in the form you happen to be looking for.
In xpressive, you can call named regexes from other regexes. When you do
that, you end up with nested match_results. If you quantify a named
regex, you end up with a sequence of match_results, kind of like
multicapture. Sadly, it's not very efficient to create a tree of
match_results, and xpressive gives you no help in navigating this tree.
It's a bit of an ugly hack.

FWIW, Boost.Regex has multicapture if you compile with a certain flag, IIRC.

> So, I added it.

Whoa, cool!

> On top of that I added support for balancing groups
> (http://blog.stevenlevithan.com/archives/balancing-groups). But the
> syntax for the pop capture and capture conditional is slightly
> different then the .NET version to better fit xpressive.
>
> Syntax for pop capture:
> dynamic: (?P<-name>stuff)
> static: (name -= stuff)
>
> Syntax for capture conditional:
> dynamic: (?P(name)stuff)
> static: (name &= stuff)
>
> There is no support for the (?<name-othername>stuff) construct.

I'll need to read up on what those constructs do. Can you send some
pointers?

> All captures made by a group is stored in sub_match::captures which
> is a vector of sub_match_capture objects. A sub_match_capture behaves
> like a stripped down sub_match. It can be put in an ostream and has a
> length and helper function for returning a string.
>
> The changes are in the vault and can be found here:
> http://tinyurl.com/3aak7mp
>
> It can be unpacked against trunk from 2010-10-02 or the 1.44.0
> release. I've run the dynamic regression tests without errors and I
> have added some tests for the new functionality. The code it only
> tested on Visual Studio 2010 since I don't have access to any other
> compiler.
>
> Please give feedback on my changes since I would love to see them in
> an official release. Thanks in advance.

This sounds really great and I have every intention of taking this
change once I grok it and look over the code. Can you open a feature
request ticket at http://svn.boost.org so I don't forget, because I'm a
little busy at the moment.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk