Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2004-10-25 18:13:20


Vladimir Pozdyayev wrote:
> Hello.
>
> Are there any plans to add finite automata-based subengine to the
> regex library? If there are, I can submit the algorithm/implementation
> of a modified nondeterministic FA which takes care of
> * greedy and lazy quantifiers,
> * eagerness of the alternation operator,
> * non-longest matches (`best' in the Perl-ish sense with respect to
> greediness/laziness/eagerness),
> * substring capturing.
> All these features being included, the time cost of a match is
> O(N(E+C)L), where N is the number of states, E is the average
> epsilon-closure size per state, C is the number of capturing
> parentheses, and L is the input length. As far as I know that's the
> lowest order of time complexity in this area (corrections are
> welcome)---though, of course, deterministic FA would be even faster.
>
> I suppose there's no point in adding another regex library, but being
> able to guarantee linear time cost in an existing one would be nice.
>

Interesting. You say it supports substring capturing. Does it also
support backreferences? By which I mean, can you refer to the captured
substrings from within the pattern itself, as in the perl regex /(.)\1/?

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk