Boost logo

Boost :

From: John Maddock (jz.maddock_at_[hidden])
Date: 2021-07-23 16:50:15


On 23/07/2021 16:23, Soronel Haetir via Boost wrote:
> Phil Endecott,
>
> Match time linear to the length of the string works fine if the regex
> doesn't have alternations but breaks down completely if they are
> present.
>
> Imagine you have the regex
> "auto|bool|const|continue|...|([A-Z_][A-Za-z0-9_]*)" set up to match
> any C token (the ... in this case are missing keywords, integers, etc)
> but your input is "continuation". This regex will try for each
> alternate, get almost all the way with "continue" but then have to
> backtrack and try the other alternates. It's only when you get to the
> very last that the string will match. And it would be possible to
> contrive even worse examples (regex match over a dictionary for
> example)

We're getting well off topic here, but in your example, backtracking is
NOT required since the pattern can be converted into a DFA that matches
at in linear time.

However, the full Perl syntax definitely can not be converted to a DFA
(and it's not just the backreferences either).

So... complaining that a Perl-compatible engine isn't O(N) is like
complaining that your orange doesn't taste much like an apple ;)

With regard to performance testing, it's very very hard to make a
general comparison between libraries, indeed I would tend to suggest
that it's folly to even try - there are just too many variables in
play.  However, if your problem domain is more limited, then you can
(and indeed should as Phil has done) be able to do so, and Phil's
results are exactly what *he* needs, while others may come to other
conclusions for their particular case.

In short, the orange grove is happy that Phil has found a nice apple :)

Best, John.

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk