|
Boost-Commit : |
From: eric_at_[hidden]
Date: 2007-10-09 18:25:19
Author: eric_niebler
Date: 2007-10-09 18:25:18 EDT (Tue, 09 Oct 2007)
New Revision: 39867
URL: http://svn.boost.org/trac/boost/changeset/39867
Log:
more user docs for semantic actions
Text files modified:
trunk/libs/xpressive/doc/actions.qbk | 200 +++++++++++++++++++++++++++++++++++++++
1 files changed, 198 insertions(+), 2 deletions(-)
Modified: trunk/libs/xpressive/doc/actions.qbk
==============================================================================
--- trunk/libs/xpressive/doc/actions.qbk (original)
+++ trunk/libs/xpressive/doc/actions.qbk 2007-10-09 18:25:18 EDT (Tue, 09 Oct 2007)
@@ -37,7 +37,7 @@
sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
[ ref(result)[s1] = as<int>(s2) ];
- // Match one or more word/iteger pairs, separated
+ // Match one or more word/integer pairs, separated
// by whitespace.
sregex rx = pair >> *(+_s >> pair);
@@ -85,9 +85,205 @@
say `ref(result)[s1]`, even though `result` doesn't have an `operator[]` that
would accept `s1`.]
+In addition to the sub-match placeholders `s1`, `s2`, etc., you can also use
+the placeholder `_` within an action to refer back to the string matched by
+the sub-expression to which the action is attached. For instance, you can use
+the following regex to match a bunch of digits, interpret them as an integer
+and assign the result to a local variable:
+
+ int i = 0;
+ // Here, _ refers back to all the
+ // characters matched by (+_d)
+ sregex rex = (+_d)[ ref(i) = as<int>(_) ];
+
[h3 Lazy Action Execution]
-TODO
+What does it mean, exactly, to attach an action to part of a regular expression
+and perform a match? When does the action execute? If the action is part of a
+repeated sub-expression, does the action execute once or many times? And if the
+sub-expression initially matches, but ultimately fails because the rest of the
+regular expression fails to match, is the action executed at all?
+
+The answers are that actions are executed /lazily/. When a sub-expression
+matches a string, its action is placed on a queue, along with the current
+values of any sub-matches to which the action refers. If the match algorithm
+must backtrack, actions are popped off the queue as necessary. Only after the
+entire regex has matched successfully are the actions actually exeucted. They
+are executed all at once, in the order in which they were added to the queue,
+as the last step before _regex_match_ returns.
+
+For example, consider the following regex that increments a counter whenever
+it finds a digit.
+
+ int i = 0;
+ std::string str("1!2!3?");
+ // count the exciting digits, but not the
+ // questionable ones.
+ sregex rex = +( _d [ ++ref(i) ] >> '!' );
+ regex_search(str, rex);
+ assert( i == 2 );
+
+The action `++ref(i)` is queued three times: once for each found digit. But
+it is only /executed/ twice: once for each digit that precedes a `'!'`
+character. When the `'?'` character is encountered, the match algorithm
+backtracks, removing the final action from the queue.
+
+[h3 Referring to Local Variables]
+
+As we've seen in the examples above, we can refer to local variables within
+an actions using `xpressive::ref()`. Any such variables are held by reference
+by the regular expression, and care should be taken to avoid letting those
+references dangle. For instance, in the following code, the reference to `i`
+is left to dangle when `bad_voodoo()` returns:
+
+ sregex bad_voodoo()
+ {
+ int i = 0;
+ sregex rex = +( _d [ ++ref(i) ] >> '!' );
+ // ERROR! rex refers by reference to a local
+ // variable, which will dangle after bad_voodoo()
+ // returns.
+ return rex;
+ }
+
+When writing semantic actions, it is your responsibility to make sure that
+all the references do not dangle. One way to do that would be to make the
+variables shared pointers that are held by the regex by value.
+
+ sregex good_voodoo(boost::shared_ptr<int> pi)
+ {
+ // Use val() to hold the shared_ptr by value:
+ sregex rex = +( _d [ ++*val(pi) ] >> '!' );
+ // OK, rex holds a reference count to the integer.
+ return rex;
+ }
+
+In the above code, we use `xpressive::val()` to hold the shared pointer by
+value. That's not normally necessary because local variables appearing in
+actions are held by value by default, but in this case, it is necessary. Had
+we written the action as `++*pi`, it would have executed immediately. That's
+because `++*pi` is not an expression template, but `++*val(pi)` is.
+
+It can be tedious to wrap all your variables in `ref()` and `val()` in your
+semantic actions. Xpressive provides the `reference<>` and `value<>` templates
+to make things easier. The following table shows the equivalencies:
+
+[table reference<> and value<>
+[[This ...][... is equivalent to this ...]]
+[[``int i = 0;
+
+sregex rex = +( _d [ ++ref(i) ] >> '!' );``][``int i = 0;
+reference<int> ri(i);
+sregex rex = +( _d [ ++ri ] >> '!' );``]]
+[[``boost::shared_ptr<int> pi(new int(0));
+
+sregex rex = +( _d [ ++*val(pi) ] >> '!' );``][``boost::shared_ptr<int> pi(new int(0));
+value<boost::shared_ptr<int> > vpi(pi);
+sregex rex = +( _d [ ++*vpi ] >> '!' );``]]
+]
+
+As you can see, when using `reference<>`, you need to first declare a local
+variable and then declare a `reference<>` to it. These two steps can be combined
+into one using `local<>`.
+
+[table local<> vs. reference<>
+[[This ...][... is equivalent to this ...]]
+[[``local<int> i(0);
+
+sregex rex = +( _d [ ++i ] >> '!' );``][``int i = 0;
+reference<int> ri(i);
+sregex rex = +( _d [ ++ri ] >> '!' );``]]
+]
+
+We can use `local<>` to rewrite the above example as follows:
+
+ local<int> i(0);
+ std::string str("1!2!3?");
+ // count the exciting digits, but not the
+ // questionable ones.
+ sregex rex = +( _d [ ++i ] >> '!' );
+ regex_search(str, rex);
+ assert( i.get() == 2 );
+
+Notice that we use `local<>::get()` to access the value of the local
+variable. Also, beware that `local<>` can be uses to create a dangling
+reference, just as `reference<>` can.
+
+[h3 Lazy Functions]
+
+So far, we've seen how to write semantic actions consisting of variables and
+operators. But what if you want to be able to call a function from a semantic
+action? Xpressive provides a mechanism to do this.
+
+The first step is to define a function object type. Here, for instance, is a
+function object type that calls `push()` on its argument:
+
+ struct push_impl
+ {
+ // Result type, needed for tr1::result_of
+ typedef void result_type;
+
+ template<typename Sequence, typename Value>
+ void operator()(Sequence &seq, Value const &val) const
+ {
+ seq.push(val);
+ }
+ };
+
+The next step is to use xpressive's `function<>` template to define a function
+object named `push`:
+
+ // Global "push" function object.
+ function<push_impl>::type const push = {{}};
+
+The initialization looks a bit odd, but this is because `push` is being
+statically initialized. That means it doesn't need to be constructed
+at runtime. We can use `push` in semantic actions as follows:
+
+ std::stack<int> ints;
+ // Match digits, cast them to an int
+ // and push it on the stack.
+ sregex rex = (+_d)[push(ref(ints), as<int>(_))];
+
+You'll notice that doing it this way causes member function invocations
+to look like ordinary function invocations. You can choose to write your
+semantic action in a different way that makes it look a bit more like
+a member function call:
+
+ sregex rex = (+_d)[ref(ints)->*push(as<int>(_))];
+
+Xpressive recognizes the use of the `->*` and treats this expression
+exactly the same as the one above.
+
+When your function object must return a type that depends on its
+arguments, you can use a `result<>` member template instead of the
+`result_type` typedef. Here, for example, is a `first` function object
+that returns the `first` member of a `std::pair<>`:
+
+ // Function object that returns the
+ // first element of a pair.
+ struct first_impl
+ {
+ template<typename Sig> struct result {};
+
+ template<typename This, typename Pair>
+ struct result<This(Pair)>
+ {
+ typedef typename remove_reference<Pair>
+ ::type::first_type type;
+ };
+
+ template<typename Pair>
+ typename Pair::first_type
+ operator()(Pair const &p) const
+ {
+ return p.first;
+ }
+ };
+
+ // OK, use as first(s1) to get the begin iterator
+ // of the sub-match referred to by s1.
+ function<first_impl> const first = {{}};
[endsect]
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk