|
Boost : |
From: Reid Sweatman (drunkardswalk_at_[hidden])
Date: 2004-10-25 17:11:58
> -----Original Message-----
> From: boost-bounces_at_[hidden]
> [mailto:boost-bounces_at_[hidden]] On Behalf Of Vladimir Pozdyayev
> Sent: Monday, October 25, 2004 2:09 AM
> To: boost_at_[hidden]
> Subject: [boost] [regex] any interest in finite
> automata-based regex engine?
>
>
> Hello.
>
> Are there any plans to add finite automata-based subengine to
> the regex library? If there are, I can submit the
> algorithm/implementation of a modified nondeterministic FA
> which takes care of
> * greedy and lazy quantifiers,
> * eagerness of the alternation operator,
> * non-longest matches (`best' in the Perl-ish sense with respect to
> greediness/laziness/eagerness),
> * substring capturing.
> All these features being included, the time cost of a match
> is O(N(E+C)L), where N is the number of states, E is the
> average epsilon-closure size per state, C is the number of
> capturing parentheses, and L is the input length. As far as I
> know that's the lowest order of time complexity in this area
> (corrections are welcome)---though, of course, deterministic
> FA would be even faster.
>
> I suppose there's no point in adding another regex library,
> but being able to guarantee linear time cost in an existing
> one would be nice.
Very nice. I've banged up against that one painfully recently (not with
Boost's regex engine, be it noted <g>), and frankly, that's quite a
guarantee given the features you mention.
Reid
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk