Boost Users :

Date view	Thread view	Subject view	Author view

From: John Maddock (john_at_[hidden])
Date: 2008-02-12 03:57:05

Next message: Filip Konvička: "[Boost-users] [filesystem] bug in wpath with utf8 locales"
Previous message: Sebastian Weber: "[Boost-users] [graph] edge_descriptor invalidation"
In reply to: Eugene M. Kim: "[Boost-users] [Regex] "Truly" incremental search?"
Next in thread: Eugene M. Kim: "Re: [Boost-users] [Regex] "Truly" incremental search?"
Reply: Eugene M. Kim: "Re: [Boost-users] [Regex] "Truly" incremental search?"

>The current "incremental search" code needs to resume search where the a
>possible match starts, and for certain expressions the worst-case time
>(i.e. input bytes come in one at a time) complexity becomes O(n^2) where
>n is the number of characters that match the expression:

It's much worse than that: Perl style regular expressions are NP-Hard to
match in general, and the worst case complexity is probably nearer to O(N!).

>I am looking for a way to feed the input bytes only once, in order to
>reduce the time complexity to approximately O(n). Of course I will lose
>the ability to recover submatches, but the ability is unnecessary for
>this particular application that I am considering. I cannot break the
>regex into two parts ("omg" and "wtf") either because it is specified as
>an input to the program.

That's not supported by Boost.Regex or as far as I know Boost.Xpressive:
Perl style regex matchers need to be able to backtrack over the input in
order to do their stuff. In order to get O(N) behaviour you would need a
DFA based regex engine, but would then loose:

* Marked subexpressions.
* Backreferences
* All the Perl extensions, non-greedy repeats, and (? constructs etc.

Can you not cache blocks of input rather than reading one character at a
time (there's an example for searching large files that does that)? Of
course even then, it is in principal possible to devise a regex that
requires backtracking over the *whole* of the text before it can determine
whether there is a match or not :-(

Sorry for not helping much!

John.

Next message: Filip Konvička: "[Boost-users] [filesystem] bug in wpath with utf8 locales"
Previous message: Sebastian Weber: "[Boost-users] [graph] edge_descriptor invalidation"
In reply to: Eugene M. Kim: "[Boost-users] [Regex] "Truly" incremental search?"
Next in thread: Eugene M. Kim: "Re: [Boost-users] [Regex] "Truly" incremental search?"
Reply: Eugene M. Kim: "Re: [Boost-users] [Regex] "Truly" incremental search?"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net