Boost logo

Boost Users :

From: Eugene M. Kim (gene_at_[hidden])
Date: 2008-02-12 15:52:07


On Tue, Feb 12, 2008 at 08:57:05AM -0000, John Maddock wrote:
> It's much worse than that: Perl style regular expressions are NP-Hard to
> match in general, and the worst case complexity is probably nearer to O(N!).
>
> ...
>
> That's not supported by Boost.Regex or as far as I know Boost.Xpressive:
> Perl style regex matchers need to be able to backtrack over the input in
> order to do their stuff. In order to get O(N) behaviour you would need a
> DFA based regex engine, but would then loose:
>
> * Marked subexpressions.
> * Backreferences
> * All the Perl extensions, non-greedy repeats, and (? constructs etc.
>
> Can you not cache blocks of input rather than reading one character at a
> time (there's an example for searching large files that does that)? Of
> course even then, it is in principal possible to devise a regex that
> requires backtracking over the *whole* of the text before it can determine
> whether there is a match or not :-(
>
> Sorry for not helping much!

Not at all, because what you said is actually good to know... Since all
I need to support is just the POSIX ERE and not the Perl-style RE, I
should probably just use the pcre_dfa_exec() of the PCRE library
instead; its documentation specifically states that the algorithm does
not backtrack.

(Funny, I have to use the PCRE library in order /not/ to use the PCRE
syntax. :-p)

Thank you,
Eugene


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net