Boost logo

Boost :

Subject: Re: [boost] [xpressive] Support for multi-capture and balancinggroups
From: Erik Rydgren (erik_at_[hidden])
Date: 2010-10-03 04:44:19


> On 10/2/2010 5:52 PM, Erik Rydgren wrote:
>> Hi!
>>
>> My company have been using pcre for a long while but there has been
>> gripes about it only returning the last matched value from a capture
>> group. Because of this I have been searching for a C++ regex engine
>> that can handle the same stuff as the .NET implementation can do, to
>> no avail. During my searches I've stumbled on several forum threads
>> where others have been searching for the same thing but it doesn't
>> seem to exist a regular expression library in C/C++ that handles both
>> named captures and multicapture.
>>
>> Found boost.xpressive and it had almost everything we need. It's
>> open source, fast, got flexible api and named captures. But alas,
>> just as all other C and C++ based implementations I have found, it
>> lacked multiple captures.
>
> It sort of has them, just not in the form you happen to be looking for.
> In xpressive, you can call named regexes from other regexes. When you do
> that, you end up with nested match_results. If you quantify a named
> regex, you end up with a sequence of match_results, kind of like
> multicapture. Sadly, it's not very efficient to create a tree of
> match_results, and xpressive gives you no help in navigating this tree.
> It's a bit of an ugly hack.
>
Yea, I realized that but it wasn't practical for our needs.
I also did a static solution that used actions to make the captures just to
try them out.

> FWIW, Boost.Regex has multicapture if you compile with a certain flag,
> IIRC.
>
Ok, I didn't know that. Will take a second look at Boost.Regex then.

>> So, I added it.
>
> Whoa, cool!
>
That is the respose I was hoping for :)

>> On top of that I added support for balancing groups
>> (http://blog.stevenlevithan.com/archives/balancing-groups). But the
>> syntax for the pop capture and capture conditional is slightly
>> different then the .NET version to better fit xpressive.
>>
>> Syntax for pop capture:
>> dynamic: (?P<-name>stuff)
>> static: (name -= stuff)
>>
>> Syntax for capture conditional:
>> dynamic: (?P(name)stuff)
>> static: (name &= stuff)
>>
>> There is no support for the (?<name-othername>stuff) construct.
>
> I'll need to read up on what those constructs do. Can you send some
> pointers?
>
I already did, this blogpost explains them without fuss
http://blog.stevenlevithan.com/archives/balancing-groups.
But the very short version is that a (?P<-tag>exp) first matches exp then
removes the last capture from an earlier group named tag. If the tag group
haven't captured anything yet it fails and backtracks.
The (?P(tag)exp) is a shorthand if-then-else where the else part always
matches. Pseudo code: if (tag has matched) { exp must match } else { true }.

To demonstrate here is the regression definitions I've made

; multi capture
[test175]
str=aabb
pat=(..)*
br0=aabb
cp0_0=aabb
br1=bb
cp1_0=aa
cp1_1=bb
[end]

; multi capture several groups
[test176]
str=abba
pat=(.){2}(.){2}
br0=abba
cp0_0=abba
br1=b
cp1_0=a
cp1_1=b
br2=a
cp2_0=b
cp2_1=a
[end]

; multi capture, pop capture with backreference, check capture
[test177]
str=startabccbarest
pat=^(.*?)(?P<n>.)+(?P<-n>(?P=n))+(?P(n)(?!))(.*)$
br0=startabccbarest
cp0_0=startabccbarest
br1=start
cp1_0=start
br2=
br3=rest
cp3_0=rest
[end]

; match count
[test178]
str=aabb
pat=^(?P<n>a)*(?P<-n>b)*(?P(n)(?!))$
br0=aabb
br1=
[end]

; match count, fail on pop
[test179]
str=aabbb
pat=^(?P<n>a)*(?P<-n>b)*$
[end]

; match count, fail on check
[test180]
str=aab
pat=^(?P<n>a)*(?P<-n>b)*(?P(n)(?!))$
[end]

>> All captures made by a group is stored in sub_match::captures which
>> is a vector of sub_match_capture objects. A sub_match_capture behaves
>> like a stripped down sub_match. It can be put in an ostream and has a
>> length and helper function for returning a string.
>>
>> The changes are in the vault and can be found here:
>> http://tinyurl.com/3aak7mp
>>
>> It can be unpacked against trunk from 2010-10-02 or the 1.44.0
>> release. I've run the dynamic regression tests without errors and I
>> have added some tests for the new functionality. The code it only
>> tested on Visual Studio 2010 since I don't have access to any other
>> compiler.
>>
>> Please give feedback on my changes since I would love to see them in
>> an official release. Thanks in advance.
>
> This sounds really great and I have every intention of taking this
> change once I grok it and look over the code. Can you open a feature
> request ticket at http://svn.boost.org so I don't forget, because I'm a
> little busy at the moment.
>
Will do.

> --
> Eric Niebler
> BoostPro Computing
> http://www.boostpro.com
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk