|
Boost Users : |
From: Eugene M. Kim (gene_at_[hidden])
Date: 2008-02-11 13:55:24
Hello,
The current "incremental search" code needs to resume search where the a
possible match starts, and for certain expressions the worst-case time
(i.e. input bytes come in one at a time) complexity becomes O(n^2) where
n is the number of characters that match the expression:
Regex: omg.*wtf
Pass 1 input: o
Pass 1 input to matching: o
Pass 1 verdict: Partial, resume at offset 0.
Pass 2 input: m
Pass 2 input to matching: om
Pass 2 verdict: Partial, resume at offset 0.
Pass 3 input: g
Pass 3 input to matching: omg
Pass 3 verdict: Partial, resume at offset 0.
...
Pass n-1 input: t
Pass n-1 input to matching: omg...wt (length n-1)
Pass n-1 verdict: Partial, resume at offset 0.
Pass n input: f
Pass n input to matching: omg...wtf (length n)
Pass n verdict: Match found and starts at offset 0.
Total number of byte matches = n*(n+1)/2, i.e. O(n^2)
I am looking for a way to feed the input bytes only once, in order to
reduce the time complexity to approximately O(n). Of course I will lose
the ability to recover submatches, but the ability is unnecessary for
this particular application that I am considering. I cannot break the
regex into two parts ("omg" and "wtf") either because it is specified as
an input to the program.
Eugene
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net