Boost logo

Boost-Commit :

From: lists.drrngrvy_at_[hidden]
Date: 2007-10-29 21:15:31


Author: drrngrvy
Date: 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
New Revision: 40591
URL: http://svn.boost.org/trac/boost/changeset/40591

Log:
Adding more sections from Core.
Added:
   sandbox/boost_docs/branches/spirit_qbking/doc/src/directives.qbk (contents, props changed)
   sandbox/boost_docs/branches/spirit_qbking/doc/src/epsilon.qbk (contents, props changed)
   sandbox/boost_docs/branches/spirit_qbking/doc/src/operators.qbk (contents, props changed)
   sandbox/boost_docs/branches/spirit_qbking/doc/src/primitives.qbk (contents, props changed)
Text files modified:
   sandbox/boost_docs/branches/spirit_qbking/doc/ISSUES | 6 ++++++
   sandbox/boost_docs/branches/spirit_qbking/doc/src/acknowledgements.qbk | 2 ++
   2 files changed, 8 insertions(+), 0 deletions(-)

Modified: sandbox/boost_docs/branches/spirit_qbking/doc/ISSUES
==============================================================================
--- sandbox/boost_docs/branches/spirit_qbking/doc/ISSUES (original)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/ISSUES 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -148,3 +148,9 @@
  * Link to primitives
 
  * Int. links
+
+epsilon.qbk
+
+ * Int. links (note hard-to-see ones)
+
+ * Changed c -> p in final code snippet, to fit with commentary.

Modified: sandbox/boost_docs/branches/spirit_qbking/doc/src/acknowledgements.qbk
==============================================================================
--- sandbox/boost_docs/branches/spirit_qbking/doc/src/acknowledgements.qbk (original)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/acknowledgements.qbk 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -78,6 +78,8 @@
 
 [*Peder Holt] for his porting work on Phoenix, Fusion and Spirit to VC6.
 
+[*Darren Garvey] for porting the docs over to QuickBook.
+
 To my wife [*Mariel] who did the graphics in this document.
 
 My, there's a lot in this list! And it's a continuing list. I add people to this list everytime. I hope I did not forget anyone. If I missed someone you know who has helped in any way, please inform me.

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/directives.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/directives.qbk 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -0,0 +1,150 @@
+
+[section Directives]
+
+Parser directives have the form: [*directive\[expression\]]
+
+A directive modifies the behavior of its enclosed expression, essentially ['decorating] it. The framework pre-defines a few directives. Clients of the framework are free to define their own directives as needed. Information on how this is done will be provided later. For now, we shall deal only with predefined directives.
+
+[h3 `lexeme_d`]
+
+Turns off white space skipping. At the phrase level, the parser ignores white spaces, possibly including comments. Use `lexeme_d` in situations where we want to work at the character level instead of the phrase level. Parsers can be made to work at the character level by enclosing the pertinent parts inside the lexeme_d directive. For example, let us complete the example presented in the [link __Introduction__]. There, we skipped the definition of the `integer` rule. Here's how it is actually defined:
+
+``
+ integer = lexeme_d[ !(ch_p('+') | '-') >> +digit ];
+``
+
+The `lexeme_d` directive instructs the parser to work on the character level. Without it, the `integer` rule would have allowed erroneous embedded white spaces in inputs such as "`1 2 345`" which will be parsed as "`12345`".
+
+[h3 `as_lower_d`]
+
+There are times when we want to inhibit case sensitivity. The `as_lower_d` directive converts all characters from the input to lower-case.
+
+[important [*`as_lower_d` behavior]
+
+It is important to note that only the input is converted to lower case. Parsers enclosed inside the `as_lower_d` expecting upper case characters will fail to parse. Example: `as_lower_d['X']` will never succeed because it expects an upper case `'X'` that the `as_lower_d` directive will never supply.
+]
+
+For example, in Pascal, keywords and identifiers are case insensitive. Pascal ignores the case of letters in identifiers and keywords. Identifiers Id, ID and id are indistinguishable in Pascal. Without the `as_lower_d` directive, it would be awkward to define a rule that recognizes this. Here's a possibility:
+
+``
+ r = str_p("id") | "Id" | "iD" | "ID";
+``
+
+Now, try doing that with the case insensitive Pascal keyword "`BEGIN`". The `as_lower_d` directive makes this simple:
+
+``
+ r = as_lower_d["begin"];
+``
+
+[note [*Primitive arguments]
+
+The astute reader will notice that we did not explicitly wrap "begin" inside an `str_p`. Whenever appropriate, directives should be able to allow primitive types such as `char`, `int`, `wchar_t`, `char const*`, `wchar_t const*` and so on. Examples:
+
+``
+as_lower_d["hello"] // same as as_lower_d[str_p("hello")]
+as_lower_d['x'] // same as as_lower_d[ch_p('x')]
+``
+]
+
+[h3 `no_actions_d`]
+
+There are cases where you want [link __semantic actions__] not to be triggered. By enclosing a parser in the `no_actions_d` directive, all semantic actions directly or indirectly attached to the parser will not fire.
+
+``
+ no_actions_d[expression]
+``
+
+[h3 Tweaking the Scanner Type]
+
+[$../theme/note.gif] How does `lexeme_d`, `as_lower_d` and `no_actions_d` work? These directives do their magic by tweaking the scanner policies. Well, you don't need to know what that means for now. Scanner policies are discussed [link __in_depth_the_scanner__ __later__]. However, it is important to note that when the scanner policy is tweaked, the result is a different scanner. Why is this important to note? The [link __rule__] is tied to a particular scanner (one or more scanners, to be precise). If you wrap a rule inside a `lexeme_d`, `as_lower_d` or `no_actions_d`, the compiler will complain about scanner mismatch unless you associate the required scanner with the rule.
+
+`lexeme_scanner`, `as_lower_scanner` and `no_actions_scanner` are your friends if the need to wrap a rule inside these directives arise. Learn bout these beasts in the next chapter on [link __The Scanner and Parsing__].
+
+[h3 `longest_d`]
+
+Alternatives in the Spirit parser compiler are short-circuited (see [link __Operators__]). Sometimes, this is not what is desired. The `longest_d` directive instructs the parser not to short-circuit alternatives enclosed inside this directive, but instead makes the parser try all possible alternatives and choose the one matching the longest portion of the input stream.
+
+Consider the parsing of integers and real numbers:
+
+``
+ number = real | integer;
+``
+
+A number can be a real or an integer. This grammar is ambiguous. An input "`1234`" should potentially match both `real` and `integer`. Recall though that alternatives are short-circuited . Thus, for inputs such as above, the `real` alternative always wins. However, if we swap the alternatives:
+
+``
+ number = integer | real;
+``
+
+we still have a problem. Now, an input "`123.456`" will be partially matched by integer until the decimal point. This is not what we want. The solution here is either to fix the ambiguity by factoring out the common prefixes of `real` and `integer` or, if that is not possible nor desired, use the `longest_d` directive:
+
+``
+ number = longest_d[ integer | real ];
+``
+
+[h3 `shortest_d`]
+
+Opposite of the `longest_d` directive.
+
+[note [*Multiple alternatives]
+
+The `longest_d` and `shortest_d` directives can accept two or more alternatives. Examples:
+
+``
+longest[ a | b | c ];
+shortest[ a | b | c | d ];
+``
+]
+
+[h3 `limit_d`]
+
+Ensures that the result of a parser is constrained to a given min..max range (inclusive). If not, then the parser fails and returns a no-match.
+
+[*Usage:]
+
+``
+ limit_d(min, max)[expression]
+``
+
+This directive is particularly useful in conjunction with parsers that parse specific scalar ranges (for example, [link __numeric parsers__]). Here's a practical example. Although the numeric parsers can be configured to accept only a limited number of digits (say, 0..2), there is no way to limit the result to a range (say -1.0..1.0). This design is deliberate. Doing so would have undermined Spirit's design rule that ["['the client should not pay for features that she does not use]]. We would have stored the min, max values in the numeric parser itself, used or unused. Well, we could get by by using static constants configured by a non-type template parameter, but that is not acceptable because that way, we can only accommodate integers. What about real numbers or user defined numbers such as big-ints?
+
+
+[*Example], parse time of the form [*HH:MM:SS]:
+
+``
+ uint_parser<int, 10, 2, 2> uint2_p;
+
+ r = lexeme_d
+ [
+ limit_d(0u, 23u)[uint2_p] >> ':' // Hours 00..23
+ >> limit_d(0u, 59u)[uint2_p] >> ':' // Minutes 00..59
+ >> limit_d(0u, 59u)[uint2_p] // Seconds 00..59
+ ];
+``
+
+[h3 `min_limit_d`]
+
+Sometimes, it is useful to unconstrain just the maximum limit. This will allow for an interval that's unbounded in one direction. The directive `min_limit_d` ensures that the result of a parser is not less than minimun. If not, then the parser fails and returns a no-match.
+
+[*Usage:]
+
+``
+ min_limit_d(min)[expression]
+``
+
+[*Example], ensure that a date is not less than 1900
+
+``
+ min_limit_d(1900u)[uint_p]
+``
+
+[h3 `max_limit_d`]
+
+Opposite of `min_limit_d`. Take note that `limit_d[p]` is equivalent to:
+
+``
+ min_limit_d(min)[max_limit_d(max)[p]]
+``
+
+[endsect][/ directives]
+

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/epsilon.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/epsilon.qbk 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -0,0 +1,81 @@
+
+[section Epsilon]
+
+The [*Epsilon] (`epsilon_p` and `eps_p`) is a multi-purpose parser that returns a zero length match.
+
+[h3 Simple Form]
+
+In its simplest form, `epsilon_p` matches the null string and always returns a match of zero length:
+
+``
+ epsilon_p // always returns a zero-length match
+``
+
+This form is usually used to trigger a [link __semantic action__] unconditionally. For example, it is useful in triggering error messages when a set of alternatives fail:
+
+``
+ r = A | B | C | eps_p[error]; // error if A, B, or C fails to match
+``
+
+[h3 Semantic Predicate]
+
+Semantic predicates allow you to attach a function anywhere in the grammar. In this role, the epsilon takes a 0-ary (nullary) function/functor. The run-time function/functor is typically a test that is called upon to resolve ambiguity in the grammar. A parse failure will be reported when the function/functor result evaluates to `false`. Otherwise an empty match will be reported. The general form is:
+
+``
+ eps_p(f) >> rest;
+``
+
+The nullary function `f` is called to do a semantic test (say, checking if a symbol is in the symbol table). If test returns `true`, `rest` will be evaluated. Otherwise, the production will return early with a no-match without ever touching `rest`.
+
+[h3 Syntactic Predicate]
+
+Similar to Semantic predicates, Syntactic predicates assert a certain conditional syntax to be satisfied before evaluating another production. This time, `epsilon_p` accepts a (conditional) parser. The general form is:
+
+``
+ eps_p(p) >> rest;
+``
+
+If `p` is matched on the input stream then attempt to recognize rest. The parser `p` is called to do a syntax check. Regardless of `p`'s success, `eps_p(p)` will always return a zero length match (i.e. the input is not consumed). If test returns `true`, `rest` will be evaluated. Otherwise, the production will return early with a no-match without ever touching `rest`.
+
+Example:
+
+``
+ eps_p('0') >> oct_p // note that '0' is actually a ch_p('0')
+``
+
+Epsilon here is used as a syntactic predicate. `oct_p` (see [link __numerics__]) is parsed only if we see a leading `'0'`. Wrapping the leading `'0'` inside an epsilon makes the parser not consume anything from the input. If a `'0'` is seen, `epsilon_p` reports a successful match with zero length.
+
+[note [*Primitive arguments]
+
+Epsilon allows primitive type arguments such as `char`, `int`, `wchar_t`, `char const*`, `wchar_t const*` and so on. Examples:
+
+``
+eps_p("hello") // same as eps_p(str_p("hello"))
+eps_p('x') // same as eps_p(ch_p('x'))
+``
+]
+
+[h3 [$../theme/alert.gif] Inhibiting Semantic Actions]
+
+In a syntactic predicate `eps_p(p)`, any semantic action directly or indirectly attached to the conditional parser `p` will not be called. However, semantic actions attached to epsilon itself will always be called. The following code snippets illustrates the behavior:
+
+``
+ eps_p(p[f]) // f not called
+ eps_p(p)[f] // f is called
+ eps_p[f] // f is called
+``
+
+Actually, the conditional parser `p` is implicitly wrapped in a `no_actions_d` directive:
+
+``
+ no_actions_d[p]
+``
+
+The conditional parser is required to be free from side-effects (semantic actions). The conditional parser's purpose is to resolve ambiguity by looking ahead in the input stream for a certain pattern. Ambiguity and semantic actions do not mix well. On an ambiguous grammar, backtracking happens. And when it happens, we cannot undo the effects of triggered semantic actions.
+
+[h3 Negation]
+
+Operator `~` is defined for parsers constructed by `epsilon_p`/`eps_p`. It performs negation by complementing the results reported. `~~eps_p(x)` is identical to `eps_p(x)`.
+
+[endsect][/ epsilon]
+

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/operators.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/operators.qbk 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -0,0 +1,112 @@
+
+[section Operators]
+
+[*Operators] are used as a means for object composition and embedding. Simple parsers may be composed to form composites through operator overloading, crafted to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant. An expression such as:
+
+``
+ a | b
+``
+
+actually yields a new parser type which is a composite of its operands, `a` and `b`. Taking this example further, if `a` and `b` were of type `chlit<>`, the result would have the composite type:
+
+``
+ alternative<chlit<>, chlit<> >
+``
+
+In general, for any binary operator, it will take its two arguments, parser1 and parser2, and create a new composed parser of the form:
+
+``
+ op<parser1, parser2>
+``
+
+where `parser1` and `parser2` can be arbitrarily complex parsers themselves, with the only limitations being what your compiler imposes.
+
+[section Set Operators]
+
+[table Set operators
+ [[`a | b`] [Union] [Match `a` or `b`. Also referred to as alternative.]]
+ [[`a & b`] [Intersection] [Match `a` and `b`.]]
+ [[`a - b`] [Difference] [Match `a` but not `b`. If both match and `b`'s matched text is shorter than `a`'s matched text, a successful match is made.]]
+ [[`a ^ b`] [XOR] [Match `a` or `b`, but not both.]]
+]
+
+[h5 Short-circuiting]
+
+Alternative operands are tried one by one on a first come first served basis starting from the leftmost operand. After a successfully matched alternative is found, the parser concludes its search, essentially short-circuiting the search for other potentially viable candidates. This short-circuiting implicitly gives the highest priority to the leftmost alternative.
+
+Short-circuiting is done in the same manner as C or C++'s logical expressions; e.g. `if (x < 3 || y < 2)` where, if `x` evaluates to be less than `3`, the `y < 2` test is not done at all. In addition to providing an implicit priority rule for alternatives which is necessary, given the non-deterministic nature of the Spirit parser compiler, short-circuiting improves the execution time. If the order of your alternatives is logically irrelevant, strive to put the (expected) most common choice first for maximum efficiency.
+
+[info [*Intersections]
+
+Some researchers assert that the intersections (e.g. `a & b`) let us define context sensitive languages ("[link __XBNF__]" [citing Leu-Weiner, 1973]). "The theory of defining a language as the intersection of a finite number of context free languages was developed by Leu and Weiner in 1973".
+]
+
+[info [*`~` Operator]
+
+The complement operator `~` was originally put into consideration. Further understanding of its value and meaning leads us to uncertainty. The basic problem stems from the fact that `~a` will yield `U-a`, where `U` is the universal set of all strings. However, where it makes sense, some parsers can be complemented (see the primitive character parsers for examples).
+]
+
+[h3 Sequencing Operators]
+
+[table Sequencing operators
+ [[`a >> b`] [Sequence] [Match `a` and `b` in sequence.]]
+ [[`a && b`] [Sequential-and] [Sequential-and. Same as above, match a and b in sequence.]]
+ [[`a || b`] [Sequential-or] [Match `a` or `b` in sequence.]]
+]
+
+The sequencing operator `>>` can alternatively be thought of as the sequential-and operator. The expression `a && b` reads as match `a` and `b` in sequence. Continuing this logic, we can also have a sequential-or operator where the expression `a || b` reads as match `a` or `b` and in sequence. That is, if both `a` and `b` match, it must be in sequence; this is equivalent to `a >> !b | b`.
+
+[h3 Optional and Loops]
+
+[table Optional and Loops
+ [[`*a`] [Kleene star] [Match a zero (`0`) or more times.]]
+ [[`+a`] [Positive] [Match a one (1) or more times.]]
+ [[`!a`] [Optional] [Match a zero (0) or one (1) time.]]
+ [[`a % b`] [List] [Match a list of one or more repetitions of `a` separated by occurrences of `b`. This is the same as `a >> *(b >> a)`. Note that `a` must not also match `b`.]]
+]
+
+[note
+If we look more closely, take note that we generalized the optional expression of the form `!a` in the same category as loops. This is logical, considering that the optional matches the expression following it zero (`0`) or one (`1`) time.
+]
+
+[h3 Primitive type operands]
+
+For binary operators, one of the operands but not both may be a `char`, `wchar_t`, `char const*` or `wchar_t const*`. Where `P` is a parser object, here are some examples:
+
+``
+ P | 'x'
+ P - L"Hello World"
+ 'x' >> P
+ "bebop" >> P
+``
+
+It is important to emphasize that C++ mandates that operators may only be overloaded if at least one argument is a user-defined type. Typically, in an expression involving multiple operators, explicitly typing the leftmost operand as a parser is enough to cause propagation to all the rest of the operands to its right to be regarded as parsers. Examples:
+
+``
+ r = 'a' | 'b' | 'c' | 'd'; // ill formed
+ r = ch_p('a') | 'b' | 'c' | 'd'; // OK
+``
+
+The second case is parsed as follows:
+
+``
+ r ```[$../theme/arrow.gif]```(((chlit<char> | char) | char) | char)
+
+ a ```[$../theme/arrow.gif]```(chlit<char> | char)
+ r ```[$../theme/arrow.gif]```(((a) | char) | char)
+
+ b ```[$../theme/arrow.gif]```(a | char)
+ r ```[$../theme/arrow.gif]```(((b)) | char)
+
+ c ```[$../theme/arrow.gif]```(b | char)
+ r ```[$../theme/arrow.gif]```(((c)))
+``
+
+[h3 Operator precedence and grouping]
+
+Since we are defining our meta-language in C++, we follow C/C++'s operator precedence rules. Grouping expressions inside the parentheses override this (e.g., `*(a | b)` reads: match `a` or `b` zero (`0`) or more times).
+
+[endsect][/ set_operators]
+
+[endsect][/ operators]
+

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/primitives.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/primitives.qbk 2007-10-29 21:15:30 EDT (Mon, 29 Oct 2007)
@@ -0,0 +1,180 @@
+
+[section Primitives]
+
+The framework predefines some parser primitives. These are the most basic building blocks that the client uses to build more complex parsers. These primitive parsers are template classes, making them very flexible.
+
+These primitive parsers can be instantiated directly or through a templatized helper function. Generally, the helper function is far simpler to deal with as it involves less typing.
+
+We have seen the character literal parser before through the generator function `ch_p` which is not really a parser but, rather, a parser generator. Class `chlit<CharT>` is the actual template class behind the character literal parser. To instantiate a `chlit` object, you must explicitly provide the character type, `CharT`, as a template parameter which determines the type of the character. This type typically corresponds to the input type, usually `char` or `wchar_t`. The following expression creates a temporary parser object which will recognize the single letter `'X'`.
+
+``
+ chlit<char>('X');
+``
+
+Using `chlit`'s generator function `ch_p` simplifies the usage of the `chlit<>` class (this is true of most Spirit parser classes since most have corresponding generator functions). It is convenient to call the function because the compiler will deduce the template type through argument deduction for us. The example above could be expressed less verbosely using the `ch_p` helper function.
+
+``
+ ch_p('X') // equivalent to chlit<char>('X') object
+``
+
+[info [*Parser generators]
+
+Whenever you see an invocation of the parser generator function, it is equivalent to the parser itself. Therefore, we often call `ch_p` a character parser, even if, technically speaking, it is a function that generates a character parser.
+]
+
+The following grammar snippet shows these forms in action:
+
+``
+ // a rule can "store" a parser object. They're covered
+ // later, but for now just consider a rule as an opaque type
+ rule<> r1, r2, r3;
+
+ chlit<char> x('X'); // declare a parser named x
+
+ r1 = chlit<char>('X'); // explicit declaration
+ r2 = x; // using x
+ r3 = ch_p('X') // using the generator
+``
+
+[h3 `chlit` and `ch_p`]
+
+Matches a single character literal. `chlit` has a single template type parameter which defaults to `char` (i.e. `chlit<>` is equivalent to `chlit<char>`). This type parameter is the character type that `chlit` will recognize when parsing. The function generator version deduces the template type parameters from the actual function arguments. The `chlit` class constructor accepts a single parameter: the character it will match the input against. Examples:
+
+``
+ r1 = chlit<>('X');
+ r2 = chlit<wchar_t>(L'X');
+ r3 = ch_p('X');
+``
+
+Going back to our original example:
+
+``
+ group = '(' >> expr >> ')';
+ expr1 = integer | group;
+ expr2 = expr1 >> *(('*' >> expr1) | ('/' >> expr1));
+ expr = expr2 >> *(('+' >> expr2) | ('-' >> expr2));
+``
+
+the character literals `'('`, `')'`, `'+'`, `'-'`, `'*'` and `'/'` in the grammar declaration are `chlit` objects that are implicitly created behind the scenes.
+
+[info [*char operands]
+
+The reason this works is from two special templatized overloads of `operator>>` that takes a `(char, ParserT)`, or `(ParserT, char)`. These functions convert the character into a `chlit` object.
+]
+
+One may prefer to declare these explicitly as:
+
+``
+ chlit<> plus('+');
+ chlit<> minus('-');
+ chlit<> times('*');
+ chlit<> divide('/');
+ chlit<> oppar('(');
+ chlit<> clpar(')');
+``
+
+[h3 `range` and `range_p`]
+
+A `range` of characters is created from a low/high character pair. Such a parser matches a single character that is in the `range`, including both endpoints. Like `chlit`, `range` has a single template type parameter which defaults to `char`. The range class constructor accepts two parameters: the character range (['from] and ['to], inclusive) it will match the input against. The function generator version is `range_p`. Examples:
+
+``
+ range<>('A','Z') // matches 'A'..'Z'
+ range_p('a','z') // matches 'a'..'z'
+``
+
+Note, the first character must be "before" the second, according to the underlying character encoding characters. The `range`, like `chlit` is a single character parser.
+
+[important [*Character mapping]
+
+Character mapping to is inherently platform dependent. It is not guaranteed in the standard for example that `'A' < 'Z'`, however, in many occasions, we are well aware of the character set we are using such as ASCII, ISO-8859-1 or Unicode. Take care though when porting to another platform.
+]
+
+
+[h3 `strlit` and `str_p`]
+
+This parser matches a string literal. `strlit` has a single template type parameter: an iterator type. Internally, `strlit` holds a begin/end iterator pair pointing to a string or a container of characters. The `strlit` attempts to match the current input stream with this string. The template type parameter defaults to `char const*`. `strlit` has two constructors. The first accepts a null-terminated character pointer. This constructor may be used to build `strlit`s from quoted string literals. The second constructor takes in a first/last iterator pair. The function generator version is `str_p`. Examples:
+
+``
+ strlit<>("Hello World")
+ str_p("Hello World")
+
+ std::string msg("Hello World");
+ strlit<std::string::const_iterator>(msg.begin(), msg.end());
+``
+
+[note [*Character and phrase level parsing]
+
+Typical parsers regard the processing of characters (symbols that form words or lexemes) and phrases (words that form sentences) as separate domains. Entities such as reserved words, operators, literal strings, numerical constants, etc., which constitute the terminals of a grammar are usually extracted first in a separate lexical analysis stage.
+
+At this point, as evident in the examples we have so far, it is important to note that, contrary to standard practice, the Spirit framework handles parsing tasks at both the character level as well as the phrase level. One may consider that a lexical analyzer is seamlessly integrated in the Spirit framework.
+
+Although the Spirit parser library does not need a separate lexical analyzer, there is no reason why we cannot have one. One can always have as many parser layers as needed. In theory, one may create a preprocessor, a lexical analyzer and a parser proper, all using the same framework.
+]
+
+[h3 `chseq` and `chseq_p`]
+
+Matches a character sequence. `chseq` has the same template type parameters and constructor parameters as `strlit`. The function generator version is `chseq_p`. Examples:
+
+``
+ chseq<>("ABCDEFG")
+ chseq_p("ABCDEFG")
+``
+
+`strlit` is an implicit lexeme. That is, it works solely on the character level. `chseq`, `strlit`'s twin, on the other hand, can work on both the character and phrase levels. What this simply means is that it can ignore white spaces in between the string characters. For example:
+
+``
+ chseq<>("ABCDEFG")
+``
+
+can parse:
+
+``
+ ABCDEFG
+ A B C D E F G
+ AB CD EFG
+``
+
+[h3 More character parsers]
+
+The framework also predefines the full repertoire of single character parsers:
+
+[table Single character parsers
+ [[`anychar_p`] [Matches any single character (including the null terminator: `'\0'`)]]
+ [[`alnum_p`] [Matches alpha-numeric characters.]]
+ [[`alpha_p`] [Matches alphabetic characters.]]
+ [[`blank_p`] [Matches spaces or tabs.]]
+ [[`cntrl_p`] [Matches control characters.]]
+ [[`digit_p`] [Matches numeric digits.]]
+ [[`graph_p`] [Matches non-space printing characters.]]
+ [[`lower_p`] [Matches lower case letters.]]
+ [[`print_p`] [Matches printable characters.]]
+ [[`punct_p`] [Matches punctuation symbols.]]
+ [[`space_p`] [Matches spaces, tabs, returns, and newlines.]]
+ [[`upper_p`] [Matches upper case letters.]]
+ [[`xdigit_p`] [Matches hexadecimal digits.]]
+]
+
+[h3 negation `~`]
+
+Single character parsers such as the `chlit`, `range`, `anychar_p`, `alnum_p` etc. can be negated. For example:
+
+``
+ ~ch_p('x')
+``
+
+matches any character except `'x'`. Double negation of a character parser cancels out the negation. `~~alpha_p` is equivalent to `alpha_p`.
+
+[h3 `eol_p`]
+
+Matches the end of line (CR/LF and combinations thereof).
+
+[h3 `nothing_p`]
+
+Never matches anything and always fails.
+
+[h3 `end_p`]
+
+Matches the end of input (returns a sucessful match with zero length when the input is exhausted).
+
+[endsect][/ primitives]
+


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk