Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2021-07-30 16:19:51


Hi Peter,

Peter Dimov wrote:
> Phil Endecott wrote:
>> What I'm trying to do is to sanitise the input to an internet-
>> exposed process, to reject malicious input'); drop table users;
>> As an example I'll look at input that is supposed to be base-64
>> encoded and no more than a couple of kilobytes long.
>>
>> Typical-case performance doesn't matter much as this runs once
>> per process invocation (and hence also caching the compiled
>> regex doesn't help), but I do want to be sure that it doesn't
>> have bad worst-case complexity in the face of pathological
>> input. So my first test is a quick check with a regular
>> expression that should might trigger worst-case behaviour in
>> a non-linear implementation:
>>
>> a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa
>
> But that's a separate case. This is a pathological regexp, not a
> "pathological" input string. If your regexes don't come from an
> external source, the performance of a pathological regex is not
> a potential security issue.

There are certain pathological regexp matching algorithms
that have certain pathological regexp inputs that have
certain pathological input strings. How do I know if a regexp
that I've written in my code is a pathological one for the
particular regexp library that I'm using?

> Are these results available? Apologies if you've already posted them.

I was going to post some numbers but then I spotted that they
got slower as the test system warmed up! Fundamentally, I'm only
interested in the complexity order, not even factors of 10X, so
I'm not trying to produce very detailed numbers.

Previously I made some positive comments about the compile-time
regular expression library CTRE. I had looked at a couple of
presentations from CppCon 2018 & 2019 which presented the
compile-time conversion of the regexp string into an NFA and
then - according to the 2019 presentation - into a DFA, with
O(N) matching complexity. But having looked at the code, it
seems that unless I'm missing something the NFA-to-DFA
conversion is not present! This is yet another backtracking
NFA implementation. Boo.

My recommendation is therefore RE2.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk