|
Boost : |
From: John Maddock (John_Maddock_at_[hidden])
Date: 2001-07-22 05:54:04
>So far so good. How do I find out that expression2 produced the longest
>match? It may not be $2. I'll have to parse expression1 myself. :-)
The only issue is if the expressions have marked parenthesis themselves -
in your case you don't care what they matched, so either use (?:something)
throughout, or if the expressions aren't authored by you (say you're
reading them from a text file), then you'll have to do a small amount of
transformation on each one (replacing ( with (?: ). If you don't want to
do that, then compile each expression singly first and record how many
sub-expressions it has - then you will know what index to check for each
expression -
expression1 = $1
expression2 = $(2 + subs_in_expression1)
expression3 = $(3 + subs_in_expression1 + suns_in_expression2)
etc
>Hmmm. Have you thought about exposing the NFA so I can 'feed' it a
character
>at a time and get a state back? This solves both problems.
Its currently a backtracking nfa (since that's the only way a backreference
can get matched), and that's not easy to expose without a complete rewrite.
Hopefully if I can ever find the time (!), I want to add some alternative
non-backtracking algorithms that would kick in automatically when the
expression allows them to be used, even so the current algorithm has proved
to be surprisingly robust in practice (unless you more or less deliberately
feed it "pathological" expressions).
- John Maddock
http://ourworld.compuserve.com/homepages/john_maddock/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk