Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2001-07-22 08:57:43


From: "John Maddock" <John_Maddock_at_[hidden]>
> >So far so good. How do I find out that expression2 produced the longest
> >match? It may not be $2. I'll have to parse expression1 myself. :-)
>
> The only issue is if the expressions have marked parenthesis themselves -

Exactly.

> in your case you don't care what they matched, so either use (?:something)
> throughout, or if the expressions aren't authored by you (say you're
> reading them from a text file), then you'll have to do a small amount of
> transformation on each one (replacing ( with (?: ).

\(, [(], [^(]...

> If you don't want to
> do that, then compile each expression singly first and record how many
> sub-expressions it has -

Aha, unsigned regex::mark_count() const. Not easy to find. :-)

> Its currently a backtracking nfa (since that's the only way a
backreference
> can get matched), and that's not easy to expose without a complete
rewrite.

So, if I understand correctly,

* the library requires bidirectional iterators (is this documented? I wasn't
able to find it.)

* the NFA needs the whole [first, last) range since it backtracks, so it
can't operate in 'incremental' mode, one character at a time.

OK. So the only way to make an 'incremental' match currently is:

std::string buffer;

for(;;)
{
  buffer += get_next_character();
  regex_match(buffer, m, e, match_partial);
// analyze m
}

This would work, but it's very inefficient, since regex_match would have to
're-match' from scratch every time. Would it be possible to save the current
state to eliminate this inefficiency?

> Hopefully if I can ever find the time (!), I want to add some alternative
> non-backtracking algorithms that would kick in automatically when the
> expression allows them to be used, even so the current algorithm has
proved
> to be surprisingly robust in practice (unless you more or less
deliberately
> feed it "pathological" expressions).

This would be overkill. :-)

--
Peter Dimov
Multi Media Ltd.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk