|
Boost : |
From: Peter Dimov (pdimov_at_[hidden])
Date: 2001-07-22 08:57:43
From: "John Maddock" <John_Maddock_at_[hidden]>
> >So far so good. How do I find out that expression2 produced the longest
> >match? It may not be $2. I'll have to parse expression1 myself. :-)
>
> The only issue is if the expressions have marked parenthesis themselves -
Exactly.
> in your case you don't care what they matched, so either use (?:something)
> throughout, or if the expressions aren't authored by you (say you're
> reading them from a text file), then you'll have to do a small amount of
> transformation on each one (replacing ( with (?: ).
\(, [(], [^(]...
> If you don't want to
> do that, then compile each expression singly first and record how many
> sub-expressions it has -
Aha, unsigned regex::mark_count() const. Not easy to find. :-)
> Its currently a backtracking nfa (since that's the only way a
backreference
> can get matched), and that's not easy to expose without a complete
rewrite.
So, if I understand correctly,
* the library requires bidirectional iterators (is this documented? I wasn't
able to find it.)
* the NFA needs the whole [first, last) range since it backtracks, so it
can't operate in 'incremental' mode, one character at a time.
OK. So the only way to make an 'incremental' match currently is:
std::string buffer;
for(;;)
{
buffer += get_next_character();
regex_match(buffer, m, e, match_partial);
// analyze m
}
This would work, but it's very inefficient, since regex_match would have to
're-match' from scratch every time. Would it be possible to save the current
state to eliminate this inefficiency?
> Hopefully if I can ever find the time (!), I want to add some alternative
> non-backtracking algorithms that would kick in automatically when the
> expression allows them to be used, even so the current algorithm has
proved
> to be surprisingly robust in practice (unless you more or less
deliberately
> feed it "pathological" expressions).
This would be overkill. :-)
-- Peter Dimov Multi Media Ltd.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk