Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2021-07-22 06:43:57


On 22/07/2021 5:56 pm, Soronel Haetir wrote:
> I would have thought that '-' would only get confused as a range
> specifier when it follows an opening atom. Here it follows a closing
> atom (the '9' in 0-9').
>
> I did not think for example that "a-g-z" could possibly be equivalent
> to "a-z", that it should only be able to match a, b, c, d, e,f ,g '-'
> and 'z'.

That's not unreasonable, but it's not how the specification is worded.
So you might find that it works on a particular implementation, but it's
risky.

The text of most regex specifications says that the only valid positions
for a minus character that is intended to represent itself is either
immediately following the [ or immediately preceding the ]. Of those,
the former is a bit more traditional and hence safer. (Although if you
want to include ] as well, then ] must be first and so - must be last.)

But there's lots of implementation-defined holes in regexes, so YMMV.
For example, some will accept it anywhere if you escape it with a
backslash. Others don't support backslash escapes inside character sets
at all.

https://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html specifically
calls out a construct such as "a-g-z" as undefined behaviour.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk