Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2006-01-01 23:12:48


I've been porting some test cases from Boost.Regex to Boost.Xpressive
and tracking down the discrepencies (very few, thankfully). I've turned
up what appear to be a couple of bugs in Boost.Regex.

The regex "a(b)?c\\1d" successfully matches the string "acd". It
shouldn't. A back-reference to a sub-matche that didn't participate in
the match should not match. Perl, python and xpressive all agree on this
point.

As discussed previously, Boost.Regex treats [a-Z] as a legal regex, but
it isn't. 'a' is 97 and 'Z' is 90, which makes this character range
ill-formed, even when icase is specified.

When matching "a(b+|((c)*))+d" against "abcd", Boost.Regex says the
third sub-match should be "c", but perl says it should not participate
in the match. I think perl is right here. The logic is: the quantified
group 1 will match at least 3 times: first, it eats the b, next it eats
the c, and finally, it matches an empty string. On this last iteration,
the quantified group 3 will match zero times; hence, it has not
participated in the match. (FWIW, xpressive has a bug in this area too,
which I'm working on fixing.)

This last bug plagues several of the test cases in test_tricky_cases.cpp.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk