|
Boost-Commit : |
From: eric_at_[hidden]
Date: 2008-06-12 18:40:01
Author: eric_niebler
Date: 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
New Revision: 46362
URL: http://svn.boost.org/trac/boost/changeset/46362
Log:
document formatter objects and formatter expressions, nuke trailing spaces
Text files modified:
trunk/libs/xpressive/doc/acknowledgements.qbk | 6
trunk/libs/xpressive/doc/actions.qbk | 60 ++++++------
trunk/libs/xpressive/doc/dynamic_regexes.qbk | 2
trunk/libs/xpressive/doc/history.qbk | 2
trunk/libs/xpressive/doc/installation.qbk | 2
trunk/libs/xpressive/doc/introduction.qbk | 2
trunk/libs/xpressive/doc/matching.qbk | 8
trunk/libs/xpressive/doc/nyi.qbk | 2
trunk/libs/xpressive/doc/substitutions.qbk | 173 ++++++++++++++++++++++++++++++++++++++-
trunk/libs/xpressive/doc/symbols.qbk | 4
10 files changed, 211 insertions(+), 50 deletions(-)
Modified: trunk/libs/xpressive/doc/acknowledgements.qbk
==============================================================================
--- trunk/libs/xpressive/doc/acknowledgements.qbk (original)
+++ trunk/libs/xpressive/doc/acknowledgements.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -8,15 +8,15 @@
[section Acknowledgments]
I am indebted to [@http://boost.org/people/joel_de_guzman.htm Joel de Guzman]
-and [@http://boost.org/people/hartmut_kaiser.htm Hartmut Kaiser] for their
+and [@http://boost.org/people/hartmut_kaiser.htm Hartmut Kaiser] for their
expert advice during the early states of xpressive's development. Much of
static xpressive's syntax is owes a large debt to _spirit_, including the
-syntax for xpressive's semantic actions. I am thankful for
+syntax for xpressive's semantic actions. I am thankful for
[@http://boost.org/people/john_maddock.htm John Maddock]'s excellent work on
his proposal to add regular expressions to the standard library, and for
various ideas borrowed liberally from his regex implementation. I'd also like
to thank [@http://moderncppdesign.com/ Andrei Alexandrescu] for his input
-regarding the behavior of nested regex objects, and
+regarding the behavior of nested regex objects, and
[@http://boost.org/people/dave_abrahams.htm Dave Abrahams] for his suggestions
regarding the regex domain-specific embedded language. Noel Belcourt helped
porting xpressive to the Metrowerks CodeWarrior compiler. Markus
Modified: trunk/libs/xpressive/doc/actions.qbk
==============================================================================
--- trunk/libs/xpressive/doc/actions.qbk (original)
+++ trunk/libs/xpressive/doc/actions.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -10,7 +10,7 @@
[h2 Overview]
Imagine you want to parse an input string and build a `std::map<>` from it. For
-something like that, matching a regular expression isn't enough. You want to
+something like that, matching a regular expression isn't enough. You want to
/do something/ when parts of your regular expression match. Xpressive lets
you attach semantic actions to parts of your static regular expressions. This
section shows you how.
@@ -31,12 +31,12 @@
{
std::map<std::string, int> result;
std::string str("aaa=>1 bbb=>23 ccc=>456");
-
+
// Match a word and an integer, separated by =>,
// and then stuff the result into a std::map<>
sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
[ ref(result)[s1] = as<int>(s2) ];
-
+
// Match one or more word/integer pairs, separated
// by whitespace.
sregex rx = pair >> *(+_s >> pair);
@@ -47,7 +47,7 @@
std::cout << result["bbb"] << '\n';
std::cout << result["ccc"] << '\n';
}
-
+
return 0;
}
@@ -59,7 +59,7 @@
456
]
-The regular expression `pair` has two parts: the pattern and the action. The
+The regular expression `pair` has two parts: the pattern and the action. The
pattern says to match a word, capturing it in sub-match 1, and an integer,
capturing it in sub-match 2, separated by `"=>"`. The action is the part in
square brackets: `[ ref(result)[s1] = as<int>(s2) ]`. It says to take sub-match
@@ -73,15 +73,15 @@
between brackets is an expression template. It encodes the action and executes
it later. The expression `ref(result)` creates a lazy reference to the `result`
object. The larger expression `ref(result)[s1]` is a lazy map index operation.
-Later, when this action is getting executed, `s1` gets replaced with the
+Later, when this action is getting executed, `s1` gets replaced with the
first _sub_match_. Likewise, when `as<int>(s2)` gets executed, `s2` is replaced
-with the second _sub_match_. The `as<>` action converts its argument to the
+with the second _sub_match_. The `as<>` action converts its argument to the
requested type using Boost.Lexical_cast. The effect of the whole action is to
insert a new word/integer pair into the map.
[note There is an important difference between the function `boost::ref()` in
-`<boost/ref.hpp>` and `boost::xpressive::ref()` in
-`<boost/xpressive/regex_actions.hpp>`. The first returns a plain
+`<boost/ref.hpp>` and `boost::xpressive::ref()` in
+`<boost/xpressive/regex_actions.hpp>`. The first returns a plain
`reference_wrapper<>` which behaves in many respects like an ordinary
reference. By contrast, `boost::xpressive::ref()` returns a /lazy/ reference
that you can use in expressions that are executed lazily. That is why we can
@@ -137,7 +137,7 @@
operators. But what if you want to be able to call a function from a semantic
action? Xpressive provides a mechanism to do this.
-The first step is to define a function object type. Here, for instance, is a
+The first step is to define a function object type. Here, for instance, is a
function object type that calls `push()` on its argument:
struct push_impl
@@ -158,7 +158,7 @@
// Global "push" function object.
function<push_impl>::type const push = {{}};
-The initialization looks a bit odd, but this is because `push` is being
+The initialization looks a bit odd, but this is because `push` is being
statically initialized. That means it doesn't need to be constructed
at runtime. We can use `push` in semantic actions as follows:
@@ -182,7 +182,7 @@
`result_type` typedef. Here, for example, is a `first` function object
that returns the `first` member of a `std::pair<>` or _sub_match_:
- // Function object that returns the
+ // Function object that returns the
// first element of a pair.
struct first_impl
{
@@ -204,7 +204,7 @@
};
// OK, use as first(s1) to get the begin iterator
- // of the sub-match referred to by s1.
+ // of the sub-match referred to by s1.
function<first_impl>::type const first = {{}};
[h3 Referring to Local Variables]
@@ -238,7 +238,7 @@
}
In the above code, we use `xpressive::val()` to hold the shared pointer by
-value. That's not normally necessary because local variables appearing in
+value. That's not normally necessary because local variables appearing in
actions are held by value by default, but in this case, it is necessary. Had
we written the action as `++*pi`, it would have executed immediately. That's
because `++*pi` is not an expression template, but `++*val(pi)` is.
@@ -263,7 +263,7 @@
As you can see, when using `reference<>`, you need to first declare a local
variable and then declare a `reference<>` to it. These two steps can be combined
-into one using `local<>`.
+into one using `local<>`.
[table local<> vs. reference<>
[[This ...][... is equivalent to this ...]]
@@ -301,15 +301,15 @@
that in the semantic action instead of the map itself. Later, when we
call one of the regex algorithms, we can bind the reference to an actual
map object. The following code shows how.
-
+
// Define a placeholder for a map object:
placeholder<std::map<std::string, int> > _map;
-
+
// Match a word and an integer, separated by =>,
// and then stuff the result into a std::map<>
sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
[ _map[s1] = as<int>(s2) ];
-
+
// Match one or more word/integer pairs, separated
// by whitespace.
sregex rx = pair >> *(+_s >> pair);
@@ -319,7 +319,7 @@
// Here is the actual map to fill in:
std::map<std::string, int> result;
-
+
// Bind the _map placeholder to the actual map
smatch what;
what.let( _map = result );
@@ -340,10 +340,10 @@
456
]
-We use `placeholder<>` here to define `_map`, which stands in for a
+We use `placeholder<>` here to define `_map`, which stands in for a
`std::map<>` variable. We can use the placeholder in the semantic action as if
it were a map. Then, we define a _match_results_ struct and bind an actual map
-to the placeholder with "`what.let( _map = result );`". The _regex_match_ call
+to the placeholder with "`what.let( _map = result );`". The _regex_match_ call
behaves as if the placeholder in the semantic action had been replaced with a
reference to `result`.
@@ -360,27 +360,27 @@
// Define a placeholder for a map object:
placeholder<std::map<std::string, int> > _map;
-
+
// Match a word and an integer, separated by =>,
// and then stuff the result into a std::map<>
sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
[ _map[s1] = as<int>(s2) ];
-
+
// The string to parse
std::string str("aaa=>1 bbb=>23 ccc=>456");
-
+
// Here is the actual map to fill in:
std::map<std::string, int> result;
-
+
// Create a regex_iterator to find all the matches
sregex_iterator it(str.begin(), str.end(), pair, let(_map=result));
sregex_iterator end;
-
+
// step through all the matches, and fill in
// the result map
while(it != end)
++it;
-
+
std::cout << result["aaa"] << '\n';
std::cout << result["bbb"] << '\n';
std::cout << result["ccc"] << '\n';
@@ -397,7 +397,7 @@
You are probably already familiar with regular expression /assertions/. In
Perl, some examples are the [^^] and [^$] assertions, which you can use to
-match the beginning and end of a string, respectively. Xpressive lets you
+match the beginning and end of a string, respectively. Xpressive lets you
define your own assertions. A custom assertion is a contition which must be
true at a point in the match in order for the match to succeed. You can check
a custom assertion with xpressive's _check_ function.
@@ -438,7 +438,7 @@
sregex rx = (bow >> +_w >> eow)[ check(length(_)==3 || length(_)==6) ] ;
In the above, `length()` is a lazy function that calls the `length()` member
-function of its argument, and `_` is a placeholder that receives the
+function of its argument, and `_` is a placeholder that receives the
`sub_match`.
Once you get the hang of writing custom assertions inline, they can be
@@ -451,7 +451,7 @@
mark_tag month(1), day(2);
// find a valid date of the form month/day/year.
- sregex date =
+ sregex date =
(
// Month must be between 1 and 12 inclusive
(month= _d >> !_d) [ check(as<int>(_) >= 1
Modified: trunk/libs/xpressive/doc/dynamic_regexes.qbk
==============================================================================
--- trunk/libs/xpressive/doc/dynamic_regexes.qbk (original)
+++ trunk/libs/xpressive/doc/dynamic_regexes.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -20,7 +20,7 @@
There are two ways to create a dynamic regex: with the _regex_compile_
function or with the _regex_compiler_ class template. Use _regex_compile_
if you want the default locale. Use _regex_compiler_ if you need to
-specify a different locale. In the section on
+specify a different locale. In the section on
[link boost_xpressive.user_s_guide.grammars_and_nested_matches regex grammars],
we'll see another use for _regex_compiler_.
Modified: trunk/libs/xpressive/doc/history.qbk
==============================================================================
--- trunk/libs/xpressive/doc/history.qbk (original)
+++ trunk/libs/xpressive/doc/history.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -66,7 +66,7 @@
[h2 Version 0.0.1, November 16, 2003]
-Announcement of xpressive:
+Announcement of xpressive:
[@http://lists.boost.org/Archives/boost/2003/11/56312.php]
[endsect]
Modified: trunk/libs/xpressive/doc/installation.qbk
==============================================================================
--- trunk/libs/xpressive/doc/installation.qbk (original)
+++ trunk/libs/xpressive/doc/installation.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -16,7 +16,7 @@
The second way is by downloading xpressive.zip at the
[@http://www.boost-consulting.com/vault/index.php?directory=Strings%20-%20Text%20Processing
Boost File Vault] in the ["Strings - Text Processing] directory. In addition to
-the source code and the Boost license, this archive contains a copy of this
+the source code and the Boost license, this archive contains a copy of this
documentation in PDF format. This version will always be stable and at least as
current as the version in the latest Boost release. It may be more recent. The
version in the File Vault is always guaranteed to work with the latest official
Modified: trunk/libs/xpressive/doc/introduction.qbk
==============================================================================
--- trunk/libs/xpressive/doc/introduction.qbk (original)
+++ trunk/libs/xpressive/doc/introduction.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -15,7 +15,7 @@
[@http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
Expression Templates]] that are parsed at compile-time (static regexes).
Dynamic regexes have the advantage that they can be accepted from the user
-as input at runtime or read from an initialization file. Static regexes
+as input at runtime or read from an initialization file. Static regexes
have several advantages. Since they are C++ expressions instead of
strings, they can be syntax-checked at compile-time. Also, they can naturally
refer to code and data elsewhere in your program, giving you the ability to call
Modified: trunk/libs/xpressive/doc/matching.qbk
==============================================================================
--- trunk/libs/xpressive/doc/matching.qbk (original)
+++ trunk/libs/xpressive/doc/matching.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -23,8 +23,8 @@
algorithm.]
The input can be a bidirectional range such as `std::string`, a C-style null-terminated string or a pair of
-iterators. In all cases, the type of the iterator used to traverse the input sequence must match the iterator
-type used to declare the regex object. (You can use the table in the
+iterators. In all cases, the type of the iterator used to traverse the input sequence must match the iterator
+type used to declare the regex object. (You can use the table in the
[link boost_xpressive.user_s_guide.quick_start.know_your_iterator_type Quick Start] to find the correct regex
type for your iterator.)
@@ -78,8 +78,8 @@
In all other regards, _regex_search_ behaves like _regex_match_ ['(see above)]. In particular, it can operate
on a bidirectional range such as `std::string`, C-style null-terminated strings or iterator ranges. The same
-care must be taken to ensure that the iterator type of your regex matches the iterator type of your input
-sequence. As with _regex_match_, you can optionally provide a _match_results_ struct to receive the results
+care must be taken to ensure that the iterator type of your regex matches the iterator type of your input
+sequence. As with _regex_match_, you can optionally provide a _match_results_ struct to receive the results
of the search, and a _match_flag_type_ bitmask to control how the match is evaluated.
Click [link boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex here]
Modified: trunk/libs/xpressive/doc/nyi.qbk
==============================================================================
--- trunk/libs/xpressive/doc/nyi.qbk (original)
+++ trunk/libs/xpressive/doc/nyi.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -10,7 +10,7 @@
The following features are planned for xpressive 2.X:
* `syntax_option_type::collate`
-* Collation sequences such as [^'''[.a.]''']
+* Collation sequences such as [^'''[.a.]''']
* Equivalence classes like [^'''[=a=]''']
* Control of nested results generation with `syntax_option_type::nosubs`,
and a `nosubs()` modifier for static xpressive.
Modified: trunk/libs/xpressive/doc/substitutions.qbk
==============================================================================
--- trunk/libs/xpressive/doc/substitutions.qbk (original)
+++ trunk/libs/xpressive/doc/substitutions.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -14,10 +14,12 @@
[h2 regex_replace()]
Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
-and a format string. There are two versions of the _regex_replace_ algorithm. The first accepts the input
-sequence as `std::basic_string<>` and returns the result in a new `std::basic_string<>`. The second accepts
-the input sequence as a pair of iterators, and writes the result into an output iterator. Below are examples of
-each.
+and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
+the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
+of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
+accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
+may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
+using string-based substitutions.
std::string input("This is his face");
sregex re = as_xpr("his"); // find all occurrences of "his" ...
@@ -63,11 +65,13 @@
Boost-specific format sequences.]]
]
-These flags live in the `regex_constants` namespace.
+These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
+a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
+`format_all` are ignored.
[h2 The ECMA-262 Format Sequences]
-When you haven't specified a substitution string dialect with one of the format flags above,
+When you haven't specified a substitution string dialect with one of the format flags above,
you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
the escape sequences recognized in ECMA-262 mode.
@@ -150,4 +154,161 @@
it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
want a literal paren, you must escape it as [^\\(].
+[h2 Formatter Objects]
+
+Format strings are not always expressive enough for all your text substitution
+needs. Consider the simple example of wanting to map input strings to output
+strings, as you may want to do with environment variables. Rather than a format
+/string/, for this you would use a formatter /object/. Consider the following
+code, which finds embedded environment variables of the form `"$(XYZ)"` and
+computes the substitution string by looking up the environment variable in a
+map.
+
+ #include <map>
+ #include <string>
+ #include <iostream>
+ #include <boost/xpressive/xpressive.hpp>
+ using namespace boost;
+ using namespace xpressive;
+
+ std::map<std::string, std::string> env;
+
+ std::string const &format_fun(smatch const &what)
+ {
+ return env[what[1].str()];
+ }
+
+ int main()
+ {
+ env["X"] = "this";
+ env["Y"] = "that";
+
+ std::string input("\"$(X)\" has the value \"$(Y)\"");
+
+ // replace strings like "$(XYZ)" with the result of env["XYZ"]
+ sregex envar = "$(" >> (s1 = +_w) >> ')';
+ std::string output = regex_replace(input, envar, format_fun);
+ std::cout << output << std::endl;
+ }
+
+In this case, we use a function, `format_fun()` to compute the substitution string
+on the fly. It accepts a _match_results_ object which contains the results of the
+current match. `format_fun()` uses the first submatch as a key into the global `env`
+map. The above code displays:
+
+[pre
+"this" has the value "that"
+]
+
+The formatter need not be an ordinary function. It may be an object of class type.
+And rather than return a string, it may accept an output iterator into which it
+writes the substitution. Consider the following, which is functionally equivalent
+to the above.
+
+ #include <map>
+ #include <string>
+ #include <iostream>
+ #include <boost/xpressive/xpressive.hpp>
+ using namespace boost;
+ using namespace xpressive;
+
+ struct formatter
+ {
+ typedef std::map<std::string, std::string> env_map;
+ env_map env;
+
+ template<typename Out>
+ Out operator()(smatch const &what, Out out) const
+ {
+ env_map::const_iterator where = env.find(what[1]);
+ if(where != env.end())
+ {
+ std::string const &sub = where->second;
+ out = std::copy(sub.begin(), sub.end(), out);
+ }
+ return out;
+ }
+
+ };
+
+ int main()
+ {
+ formatter fmt;
+ fmt.env["X"] = "this";
+ fmt.env["Y"] = "that";
+
+ std::string input("\"$(X)\" has the value \"$(Y)\"");
+
+ sregex envar = "$(" >> (s1 = +_w) >> ')';
+ std::string output = regex_replace(input, envar, fmt);
+ std::cout << output << std::endl;
+ }
+
+The formatter must be a callable object -- a function or a function object --
+that has one of three possible signatures, detailed in the table below. For
+the table, `fmt` is a function pointer or function object, `what` is a
+_match_results_ object, `out` is an OutputIterator, and `flags` is a value
+of `regex_constants::match_flag_type`:
+
+[table Formatter Signatures
+[
+ [Formatter Invocation]
+ [Return Type]
+ [Semantics]
+]
+[
+ [`fmt(what)`]
+ [Range of characters (e.g. `std::string`) or null-terminated string]
+ [The string matched by the regex is replaced with the string returned by
+ the formatter.]
+]
+[
+ [`fmt(what, out)`]
+ [OutputIterator]
+ [The formatter writes the replacement string into `out` and returns `out`.]
+]
+[
+ [`fmt(what, out, flags)`]
+ [OutputIterator]
+ [The formatter writes the replacement string into `out` and returns `out`.
+ The `flags` parameter is the value of the match flags passed to the
+ _regex_replace_ algorithm.]
+]
+]
+
+[h2 Formatter Expressions]
+
+In addition to format /strings/ and formatter /objects/, _regex_replace_ also
+accepts formatter /expressions/. A formatter expression is a lambda expression
+that generates a string. It uses the same syntax as that for
+[link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
+Semantic Actions], which are covered later. The above example, which uses
+_regex_replace_ to substitute strings for environment variables, is repeated
+here using a formatter expression.
+
+ #include <map>
+ #include <string>
+ #include <iostream>
+ #include <boost/xpressive/xpressive.hpp>
+ #include <boost/xpressive/regex_actions.hpp>
+ using namespace boost::xpressive;
+
+ int main()
+ {
+ std::map<std::string, std::string> env;
+ env["X"] = "this";
+ env["Y"] = "that";
+
+ std::string input("\"$(X)\" has the value \"$(Y)\"");
+
+ sregex envar = "$(" >> (s1 = +_w) >> ')';
+ std::string output = regex_replace(input, envar, ref(env)[s1]);
+ std::cout << output << std::endl;
+ }
+
+In the above, the formatter expression is `ref(env)[s1]`. This means to use the
+value of the first submatch, `s1`, as a key into the `env` map. The purpose of
+`xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
+so that the index operation is deferred until we know what to replace `s1` with.
+
[endsect]
Modified: trunk/libs/xpressive/doc/symbols.qbk
==============================================================================
--- trunk/libs/xpressive/doc/symbols.qbk (original)
+++ trunk/libs/xpressive/doc/symbols.qbk 2008-06-12 18:40:00 EDT (Thu, 12 Jun 2008)
@@ -9,7 +9,7 @@
[h2 Overview]
-Symbol tables can be built into xpressive regular expressions with just a
+Symbol tables can be built into xpressive regular expressions with just a
`std::map<>`. The map keys are the strings to be matched and the map values are
the data to be returned to your semantic action. Xpressive attributes, named
`a1`, `a2`, through `a9`, hold the value corresponding to a matching key so
@@ -83,7 +83,7 @@
ninety nine million nine hundred ninety nine thousand nine hundred ninety nine"
along with some special number names like "dozen".
-Symbol table matches are case sensitive by default, but they can be made
+Symbol table matches are case sensitive by default, but they can be made
case-insensitive by enclosing the expression in `icase()`.
[h2 Attributes]
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk