Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-06-01 13:36:50


It's a poorly kept secret that I've been adding new features to
Xpressive on CVS HEAD for a while now. So here are the features already
implemented (documentation forthcoming) for Xpressive 2.0:

<< Semantic Actions >>

Specify code to execute when parts of a regex match, a-la Spirit's
semantic actions. Eg.: if you want to parse a string of name/value pairs
into a std::map, you might:

     std::map<std::string, int> result;
     std::string str("aaa=>1 bbb=>23 ccc=>456");

     // Like "(\\w+)=>(\\d+)":
     sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
                   [ ref(result)[s1] = as<int>(s2) ];
     sregex rx = pair >> *(+_s >> pair);

     if(regex_match(str, rx))
     {
         assert(result["aaa"] == 1);
         assert(result["bbb"] == 23);
         assert(result["ccc"] == 456);
     }

The actions are placed on a queue and executed in order only when the
regex match succeeds.

<< Custom Assertions >>

Use the check() function to create a boolean predicate that can
participate in the match. Here's a regex that recognizes two integers
only if the first is less than the second:

   sregex rx = ( (s1= +_d) >> ' ' >> (s2= +_d) )
               [ check( as<int>(s1) < as<int>(s2) ) ];

Unlike actions, predicates execute immediately. You can also define the
predicate out-of-line as a function object.

<< Dynamic Regex Grammars with Named Regexes >>

Using regex_compiler, you can map a name to a regex object, and then
refer to that regex from another by name. In this way, you can build
grammars from regexes at runtime.

     sregex_compiler comp;
     sregex rx = comp.compile("^bar(?$RE)baz$");
     comp.compile("(?$RE=)\\d+ \\d+");

There's an alternate syntax for associating a name with a regex that you
can use to nest a static regex in a dynamic one. Eg., the last line
above could be:

     comp["RE"] = +_d >> ' ' >> +_d;

With these changes, you can now nest static and dynamic regexes within
each other freely, giving you lots of flexibility to build grammars and
modify them on the fly.

<< Named Captures >>

For dynamic regular expressions, you can create a named capture with
(?P<name> ...). You can refer back to the named capture with (?P=name).
In substitution strings (for use with regex_replace()), you can refer
back to a named capture with \\g<name> when using the format_perl or
format_all flags.

And more .... I'd like to give props to Dave Jenkins who has been doing
m4d things with this stuff already, and who has given me a laundry list
of other features he'd like.

The new features are only available in CVS HEAD, for now. Feedback is
welcome.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk