|
Boost : |
From: Vladimir Pozdyayev (ardatur_at_[hidden])
Date: 2004-10-25 03:08:39
Hello.
Are there any plans to add finite automata-based subengine to the
regex library? If there are, I can submit the algorithm/implementation
of a modified nondeterministic FA which takes care of
* greedy and lazy quantifiers,
* eagerness of the alternation operator,
* non-longest matches (`best' in the Perl-ish sense with respect to
greediness/laziness/eagerness),
* substring capturing.
All these features being included, the time cost of a match is
O(N(E+C)L), where N is the number of states, E is the average
epsilon-closure size per state, C is the number of capturing
parentheses, and L is the input length. As far as I know that's the
lowest order of time complexity in this area (corrections are
welcome)---though, of course, deterministic FA would be even faster.
I suppose there's no point in adding another regex library, but being
able to guarantee linear time cost in an existing one would be nice.
-- Best regards, Vladimir Pozdyayev.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk