Boost logo

Boost Users :

From: Edward Diener (eddielee_at_[hidden])
Date: 2003-02-20 18:25:48


"John Maddock" <john_maddock_at_[hidden]> wrote in message
news:04b801c2d8db$52ab7140$ce7687d9_at_1016031671...
> > static const boost::regex find_imgs_with_alt("
> > <\\s*img Matches <, 0 or many whitespace, IMG
> > \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
> > =\\s* Matches = followed by 0 or more whitespace
> > \"\\s* Matches " followed by 0 or more whitespace
> > [^\"]* Matches any number of chars not "
> > \\s+ Matches 1 or more whitespace
> > [^alt]* I would like match anything except the word ALT, but the
> > regexp stuff interprets this as anything but 'a', 'l',
> > or 't'
> > alt\\s*= Matches ALT, 0 or whitespace, =
> > \"(^\")\" Matches ", anything except a " as a group that I can
> > reference, then another "
> > [^>]*> Matches any number of chars not >, followed by a >
> > ",
> > boost::regbase::normal | boost::regbase::icase);
>
> You could use forward lookahead asserts:
>
> "(?!\<alt\>)*"
>
> matches a sequence of chars that are not "\<alt\>", although this is
rather
> slow I admit...
>
>
> > So what I want to do is make another regular expression which
> > matches "alt", and in the part that says
> >
> > [^alt]*
> >
> > do instead something like
> >
> > [^@alt]*
> >
> > where '@' would indicate that 'alt' was the name of another
> > regular expression, such as
> >
> > static const boost::regex alt("alt",
> > boost::regbase::normal | boost::regbase::icase);
> >
> > I can see how to do what I want to do without this; I would
> > get the whole IMG tag and do a separate regexp_search on the
> > match. But it seems to make it so much easier if it were
> > possible, especially leaving me with fewer lines of regular
> > expression code to have bugs in.
> >
> > If this is possible I'd like to know. Thanks in advance, and
> > I'll post the regular expressions I end up using here if
> > anyone might find them of use.
>
> You can't do that right now - the main problem is how would the library
find
> an expression called "alt"? Interpreted languages with reflexive abilities
> can do this (perl for example), but compiled languages can't.
>
> At present I'm in the middle of rewriting the regex matching code (for
those
> that follow these things it's about 90% done and up to 10x faster than the
> current version). Once I've got that out the door there are a couple of
> extensions that I will be able to add:
>
> 1) recursive regexes (A regex that can jump to an arbitrary part in it's
own
> state machine).
> 2) registered/named regexes: you would call boost::regex::register to
> register a named regular expression, which can then be called from as many
> other regexes as you want (basically it lets one state machine call
> another). There are limitations to be figured out, but I'm actually
pretty
> excited about this one - and it happens to solve your problem as well - or
> at least almost, I admit I hadn't thought of referring to negated regexes
as
> you want to do, that's actually quite tricky :-(

How are you saving 2) ? In memory or permanently in a file ? If permanently
in a file, how does the end-user reuse named regexes in other situations
from the one in which he created a name for a regular expression ? Inquiring
minds want to know <g>.

Named regexes is something I have intermittently thought about for my
Regular Expression Component Library built using Boost Regex++. The
difficulty is a practical decision of saving named regexes so that they can
be used again in other invocations of the Boost Regex++ library. However one
saves them, it seems the end-user must transport such permanent storage
around with the Regex++ implementation, else the named regexes will be lost.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net