|
Boost-Commit : |
From: eric_at_[hidden]
Date: 2008-03-16 20:35:04
Author: eric_niebler
Date: 2008-03-16 20:35:04 EDT (Sun, 16 Mar 2008)
New Revision: 43663
URL: http://svn.boost.org/trac/boost/changeset/43663
Log:
document new format flags, and range-based interface for regex_match and regex_search
Text files modified:
trunk/libs/xpressive/doc/matching.qbk | 33 ++++++-----
trunk/libs/xpressive/doc/substitutions.qbk | 110 +++++++++++++++++++++++++++++++++------
2 files changed, 109 insertions(+), 34 deletions(-)
Modified: trunk/libs/xpressive/doc/matching.qbk
==============================================================================
--- trunk/libs/xpressive/doc/matching.qbk (original)
+++ trunk/libs/xpressive/doc/matching.qbk 2008-03-16 20:35:04 EDT (Sun, 16 Mar 2008)
@@ -22,10 +22,11 @@
want to search through the string looking for sub-strings that the regex matches, use the _regex_search_
algorithm.]
-The input can be a `std::string`, a C-style null-terminated string or a pair of iterators. In all cases,
-the type of the iterator used to traverse the input sequence must match the iterator type used to declare
-the regex object. (You can use the table in the [link boost_xpressive.user_s_guide.quick_start.know_your_iterator_type Quick Start] to
-find the correct regex type for your iterator.)
+The input can be a bidirectional range such as `std::string`, a C-style null-terminated string or a pair of
+iterators. In all cases, the type of the iterator used to traverse the input sequence must match the iterator
+type used to declare the regex object. (You can use the table in the
+[link boost_xpressive.user_s_guide.quick_start.know_your_iterator_type Quick Start] to find the correct regex
+type for your iterator.)
cregex cre = +_w; // this regex can match C-style strings
sregex sre = +_w; // this regex can match std::strings
@@ -65,9 +66,9 @@
// should never get here!!!
}
-Click [link boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex here] to see a complete example program that
-shows how to use _regex_match_. And check the _regex_match_ reference to see a complete list of the available
-overloads.
+Click [link boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex here] to see a complete
+example program that shows how to use _regex_match_. And check the _regex_match_ reference to see a complete list
+of the available overloads.
[h2 Searching for Matching Sub-Strings]
@@ -75,14 +76,14 @@
_regex_search_ will try to match the regex at the beginning of the input sequence and scan forward in the
sequence until it either finds a match or exhausts the sequence.
-In all other regards, _regex_search_ behaves like _regex_match_ ['(see above)]. In particular, it can operate on `std::string`,
-C-style null-terminated strings or iterator ranges. The same care must be taken to ensure that the iterator
-type of your regex matches the iterator type of your input sequence. As with _regex_match_, you can optionally
-provide a _match_results_ struct to receive the results of the search, and a _match_flag_type_ bitmask to
-control how the match is evaluated.
-
-Click [link boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex here] to see a complete
-example program that shows how to use _regex_search_. And check the _regex_search_ reference to see a complete
-list of the available overloads.
+In all other regards, _regex_search_ behaves like _regex_match_ ['(see above)]. In particular, it can operate
+on a bidirectional range such as `std::string`, C-style null-terminated strings or iterator ranges. The same
+care must be taken to ensure that the iterator type of your regex matches the iterator type of your input
+sequence. As with _regex_match_, you can optionally provide a _match_results_ struct to receive the results
+of the search, and a _match_flag_type_ bitmask to control how the match is evaluated.
+
+Click [link boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex here]
+to see a complete example program that shows how to use _regex_search_. And check the _regex_search_ reference to
+see a complete list of the available overloads.
[endsect]
Modified: trunk/libs/xpressive/doc/substitutions.qbk
==============================================================================
--- trunk/libs/xpressive/doc/substitutions.qbk (original)
+++ trunk/libs/xpressive/doc/substitutions.qbk 2008-03-16 20:35:04 EDT (Sun, 16 Mar 2008)
@@ -7,8 +7,8 @@
[section String Substitutions]
-Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the most
-common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
+Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
+most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
searching and replacing.
[h2 regex_replace()]
@@ -40,19 +40,40 @@
Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.
-Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see a complete
-example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
+Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
+a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
to see a complete list of the available overloads.
-[h2 The Format String]
+[h2 Replace Options]
-As with Perl, you can refer to sub-matches in the format string. The table below shows the escape sequences
-xpressive recognizes in the format string.
+The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
+possible values of the bitmask are:
+
+[table Format Flags
+ [[Flag] [Meaning]]
+ [[`format_default`] [Recognize the ECMA-262 format sequences (see below).]]
+ [[`format_first_only`] [Only replace the first match, not all of them.]]
+ [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex
+ to the output sequence.]]
+ [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any
+ escape sequences.]]
+ [[`format_perl`] [Recognize the Perl format sequences (see below).]]
+ [[`format_sed`] [Recognize the sed format sequences (see below).]]
+ [[`format_all`] [In addition to the Perl format sequences, recognize some
+ Boost-specific format sequences.]]
+]
+
+These flags live in the `regex_constants` namespace.
+
+[h2 The ECMA-262 Format Sequences]
+
+When you haven't specified a substitution string dialect with one of the format flags above,
+you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
+the escape sequences recognized in ECMA-262 mode.
[table Format Escape Sequences
[[Escape Sequence] [Meaning]]
- [[[^$1]] [the first sub-match]]
- [[[^$2]] [the second sub-match (etc.)]]
+ [[[^$1], [^$2], etc.] [the corresponding sub-match]]
[[[^$&]] [the full match]]
[[[^$\`]] [the match prefix]]
[[[^$']] [the match suffix]]
@@ -62,18 +83,71 @@
Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
`"$a"` then `"$a"` would be inserted into the output sequence.
-[h2 Replace Options]
+[h2 The Sed Format Sequences]
-The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
-possible values of the bitmask are:
+When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
+are recognized:
-[table Format Flags
- [[Flag] [Meaning]]
- [[`format_first_only`] [Only replace the first match, not all of them.]]
- [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex to the output sequence.]]
- [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any escape sequences.]]
+[table Sed Format Escape Sequences
+ [[Escape Sequence] [Meaning]]
+ [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
+ [[[^&]] [the full match]]
+ [[[^\\a]] [A literal `'\a'`]]
+ [[[^\\e]] [A literal `char_type(27)`]]
+ [[[^\\f]] [A literal `'\f'`]]
+ [[[^\\n]] [A literal `'\n'`]]
+ [[[^\\r]] [A literal `'\r'`]]
+ [[[^\\t]] [A literal `'\t'`]]
+ [[[^\\v]] [A literal `'\v'`]]
+ [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
+ [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
+ [[[^\\cX]] [The control character [^['X]]]]
]
-These flags live in the `regex_constants` namespace.
+[h2 The Perl Format Sequences]
+
+When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
+are recognized:
+
+[table Perl Format Escape Sequences
+ [[Escape Sequence] [Meaning]]
+ [[[^$1], [^$2], etc.] [the corresponding sub-match]]
+ [[[^$&]] [the full match]]
+ [[[^$\`]] [the match prefix]]
+ [[[^$']] [the match suffix]]
+ [[[^$$]] [a literal `'$'` character]]
+ [[[^\\a]] [A literal `'\a'`]]
+ [[[^\\e]] [A literal `char_type(27)`]]
+ [[[^\\f]] [A literal `'\f'`]]
+ [[[^\\n]] [A literal `'\n'`]]
+ [[[^\\r]] [A literal `'\r'`]]
+ [[[^\\t]] [A literal `'\t'`]]
+ [[[^\\v]] [A literal `'\v'`]]
+ [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
+ [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
+ [[[^\\cX]] [The control character [^['X]]]]
+ [[[^\\l]] [Make the next character lowercase]]
+ [[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]]
+ [[[^\\u]] [Make the next character uppercase]]
+ [[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]]
+ [[[^\\E]] [Terminate [^\\L] or [^\\U]]]
+ [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
+ [[[^\\g<name>]] [The named backref /name/]]
+]
+
+[h2 The Boost-Specific Format Sequences]
+
+When specifying the `format_all` flag to _regex_replace_, the escape sequences
+recognized are the same as those above for `format_perl`. In addition, conditional
+expressions of the following form are recognized:
+
+[pre
+?Ntrue-expression:false-expression
+]
+
+where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
+participated in the full match, then the substitution is /true-expression/. Otherwise,
+it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
+want a literal paren, you must escape it as [^\\(].
[endsect]
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk