Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2008-02-12 03:57:05


>The current "incremental search" code needs to resume search where the a
>possible match starts, and for certain expressions the worst-case time
>(i.e. input bytes come in one at a time) complexity becomes O(n^2) where
>n is the number of characters that match the expression:

It's much worse than that: Perl style regular expressions are NP-Hard to
match in general, and the worst case complexity is probably nearer to O(N!).

>I am looking for a way to feed the input bytes only once, in order to
>reduce the time complexity to approximately O(n). Of course I will lose
>the ability to recover submatches, but the ability is unnecessary for
>this particular application that I am considering. I cannot break the
>regex into two parts ("omg" and "wtf") either because it is specified as
>an input to the program.

That's not supported by Boost.Regex or as far as I know Boost.Xpressive:
Perl style regex matchers need to be able to backtrack over the input in
order to do their stuff. In order to get O(N) behaviour you would need a
DFA based regex engine, but would then loose:

* Marked subexpressions.
* Backreferences
* All the Perl extensions, non-greedy repeats, and (? constructs etc.

Can you not cache blocks of input rather than reading one character at a
time (there's an example for searching large files that does that)? Of
course even then, it is in principal possible to devise a regex that
requires backtracking over the *whole* of the text before it can determine
whether there is a match or not :-(

Sorry for not helping much!

John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net