Boost logo

Boost Users :

From: John Maddock (john_maddock_at_[hidden])
Date: 2003-02-20 06:44:41


> static const boost::regex find_imgs_with_alt("
> <\\s*img Matches <, 0 or many whitespace, IMG
> \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
> =\\s* Matches = followed by 0 or more whitespace
> \"\\s* Matches " followed by 0 or more whitespace
> [^\"]* Matches any number of chars not "
> \\s+ Matches 1 or more whitespace
> [^alt]* I would like match anything except the word ALT, but the
> regexp stuff interprets this as anything but 'a', 'l',
> or 't'
> alt\\s*= Matches ALT, 0 or whitespace, =
> \"(^\")\" Matches ", anything except a " as a group that I can
> reference, then another "
> [^>]*> Matches any number of chars not >, followed by a >
> ",
> boost::regbase::normal | boost::regbase::icase);

You could use forward lookahead asserts:

"(?!\<alt\>)*"

matches a sequence of chars that are not "\<alt\>", although this is rather
slow I admit...

> So what I want to do is make another regular expression which
> matches "alt", and in the part that says
>
> [^alt]*
>
> do instead something like
>
> [^@alt]*
>
> where '@' would indicate that 'alt' was the name of another
> regular expression, such as
>
> static const boost::regex alt("alt",
> boost::regbase::normal | boost::regbase::icase);
>
> I can see how to do what I want to do without this; I would
> get the whole IMG tag and do a separate regexp_search on the
> match. But it seems to make it so much easier if it were
> possible, especially leaving me with fewer lines of regular
> expression code to have bugs in.
>
> If this is possible I'd like to know. Thanks in advance, and
> I'll post the regular expressions I end up using here if
> anyone might find them of use.

You can't do that right now - the main problem is how would the library find
an expression called "alt"? Interpreted languages with reflexive abilities
can do this (perl for example), but compiled languages can't.

At present I'm in the middle of rewriting the regex matching code (for those
that follow these things it's about 90% done and up to 10x faster than the
current version). Once I've got that out the door there are a couple of
extensions that I will be able to add:

1) recursive regexes (A regex that can jump to an arbitrary part in it's own
state machine).
2) registered/named regexes: you would call boost::regex::register to
register a named regular expression, which can then be called from as many
other regexes as you want (basically it lets one state machine call
another). There are limitations to be figured out, but I'm actually pretty
excited about this one - and it happens to solve your problem as well - or
at least almost, I admit I hadn't thought of referring to negated regexes as
you want to do, that's actually quite tricky :-(

John Maddock
http://ourworld.compuserve.com/homepages/john_maddock/index.htm


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net