Boost logo

Boost Users :

From: Dean (dean_at_[hidden])
Date: 2003-04-24 18:37:34


Hi all,

I'm using regexp from boost-1.28 and experience the following
behavior. Consider the following (somewhat artificial) regexp
pattern:

a{1}b

As I expected, that pattern is found in "ab" but not "aab".

To my suprise however, the same pattern *is* found
in "aaab", "aaaaab", and any other string consisting of an odd number
of "a"s followed by a "b". It is not found in strings consisting of
an even number of "a"s followed by a "b". This seems odd (no pun
intended).

I see the same sort of behavior with quantifiers other than "{1}" and
where the quantified expression matches other single characters.
(Oddly enough, the behavior changes when using a quantified
expression that matches multiple characters. "(ab){1}c" is found
in "abc", "ababc", "abababc", and any other string containing "abc".)

FWIW, I first observed the behavior when trying to find social
security numbers with the following pattern:

\d{3}-\d{2}-\d{4}

As expected, that pattern was found in "123-12-1234" but not in "1234-
12-1234". However it *was* found in "1234567-12-1234".

Is this behavior by design or is it a bug?

If it's a bug, has it been fixed in a subsequent boost release? Also
what is the correct behavior? Should "a{1}b" be found in "aab"
(albeit starting at the second character)?

FWIW, it's easy enough for me to workaround the current behavior with
a pattern like this:

(^|[^a])a{1}b

Thanks,

--Dean


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net