Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2024-02-27 23:48:35


On Tue, Feb 27, 2024 at 5:44 PM Zach Laine <whatwasthataddress_at_[hidden]> wrote:
>
> On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost
> <boost_at_[hidden]> wrote:
> >
> > > * My use case involves parsing identifiers that can only contain ASCII
> > > lowercase, uppercase, digits and the underscore.
> >
> > Spirit used to have helpers like this but Parser doesn't seem to have them.
> > I noticed this too but it's actually pretty easy to fill this in yourself.
> >
> > Here's a working example: https://godbolt.org/z/6P6dTbGYY
> >
> > auto const digit = p::char_('0', '9');
> > auto const lower = p::char_('a', 'z');
> > auto const upper = p::char_('A', 'Z');
> > auto const ident = digit | lower | upper | '_';
>
> Parser does have these (digit, lower, upper), but those match more
> than what is desired here. What is desired here is alnum |
> char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _.
> You can spell that out yourself as above, as you've done. You could
> also just use digit | lower | upper | char_('_'). It will be vaguely
> as fast I expect (but certainly measure if it's a perf-critical
> situation).

I should have mentioned -- I recently removed the ascii::* parsers,
which used is_*() from the C standard library. It included
ascii::alnum. I removed them because those is_*() functions are
considered just plain wrong by me and lots of other people from SG-16
(the committee's Unicode study group). They are also technically
dangerous, though most standard libraries I know of patch around the
potential UB because it is so easy to fall afoul of. I don't know if
you're using one of the big three std libs though, so it seems sketchy
to use those, just for safety reasons. They also have wrong semantics
in a Unicode context.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk