|
Boost-Commit : |
Subject: [Boost-commit] svn:boost r68263 - trunk/libs/spirit/doc
From: joel_at_[hidden]
Date: 2011-01-18 22:44:17
Author: djowel
Date: 2011-01-18 22:44:14 EST (Tue, 18 Jan 2011)
New Revision: 68263
URL: http://svn.boost.org/trac/boost/changeset/68263
Log:
Added rationale section
Text files modified:
trunk/libs/spirit/doc/rationale.qbk | 137 ++++++++++++++++++++++++++++++++++++++++
trunk/libs/spirit/doc/spirit2.qbk | 11 +-
2 files changed, 143 insertions(+), 5 deletions(-)
Modified: trunk/libs/spirit/doc/rationale.qbk
==============================================================================
--- trunk/libs/spirit/doc/rationale.qbk (original)
+++ trunk/libs/spirit/doc/rationale.qbk 2011-01-18 22:44:14 EST (Tue, 18 Jan 2011)
@@ -7,4 +7,141 @@
===============================================================================/]
[section Rationale]
+
+[heading Naming]
+
+Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names
+have a better spirit, brings qi to your software and will enhance your
+karma so they can heal your (con)fusion and make you wave like a phoenix
+from the ashes. (Joachim Faulhaber)
+
+[heading Type Erasure: From static to dynamic C++]
+
+Rules straddle the border between static and dynamic C++. In effect, a
+rule transforms compile-time polymorphism (using templates) into
+run-time polymorphism (using virtual functions). This is necessary due
+to C++'s inability to automatically declare a variable of a type deduced
+from an arbitrarily complex expression in the right-hand side (rhs) of
+an assignment. Basically, we want to do something like:
+
+ T rule = an_arbitrarily_complex_expression;
+
+without having to know or care about the resulting type of the
+right-hand side (rhs) of the assignment expression. This can be done
+with modern C++ 0x compilers using `auto`:
+
+ auto rule = an_arbitrarily_complex_expression;
+
+Apart from this, we also need a facility to forward declare an unknown
+type:
+
+ T rule;
+ ...
+ rule = a | b;
+
+These limitations lead us to this implementation of rules. This comes at
+the expense of the overhead of a type-erased call which is an indirect
+function call that connot be inlined, once through each invocation of a
+rule.
+
+[heading Multiple declaration]
+
+Some BNF variants allow multiple declarations of a rule. The
+declarations are taken as alternatives. Example:
+
+ r = a;
+ r = b;
+
+is equivalent to:
+
+ r = a | b;
+
+Spirit v1.3 allowed this behavior. However, the current version of
+Spirit no longer allows this because experience shows that this behavior
+leads to unwanted gotchas (for instance, it does not allow rules to be
+held in containers). In the current release of Spirit, a second
+assignment to a rule will simply redefine it. The old definition is
+destructed. This follows more closely C++ semantics and is more in line
+with what the user expects the rule to behave.
+
+[heading Sequencing Syntax]
+
+The comma operator as in `a, b` seems to be a better candidate,
+syntax-wise. But then the problem is with its precedence. It has the
+lowest precedence in C/C++, which makes it virtually useless.
+
+Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000"
+talks about overloading whitespace. Such a feature would allow
+juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b
+| c instead of a >> b | c). Unfortunately, the article was dated April
+1, 1998. Oh well.
+
+[heading Forward iterators]
+
+In general, the expected iterator is at least a standard conforming
+forward iterator. Forward iterators are needed for backtracking where
+the iterator needs to be saved and restored later. Generally speaking,
+Spirit is a backtracking parser. The implication of this is that at some
+point, the iterator position will have to be saved to allow the parser
+to backtrack to a previous point. Thus, for backtracking to work, the
+framework requires at least a forward iterator.
+
+There are remedies of course. In cases where we need to use input
+iterators, you can use the __multi_pass__ iterator to make the forward
+iterators.
+
+Some parsers might require more specialized iterators (bi-directional or
+even random access). Perhaps in the future, deterministic parsers when
+added to the framework, will perform no backtracking and will need just
+a single token lookahead, hence will require input iterators only.
+
+[heading Exhaustive backtracking and greedy RD]
+
+Spirit doesn't do exhaustive backtracking like regular expressions are
+expected to. For example:
+
+ *char_('a') >> char_('a');
+
+will always fail to match because Spirit's Kleene star does not back off
+when the rest of the rule fails to match.
+
+Actually, there's a solution to this greedy RD problem. Such a scheme is
+discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The
+trick involves passing a tail parser (in addition to the scanner) to
+each parser. The start parser will then simply be:
+
+ start >> end;
+
+(end is the start's tail).
+
+Spirit is greedy --using straight forward, naive RD. It is certainly
+possible to implement the fully backtracking scheme presented above, but
+there will be also certainly be a performance hit. The scheme will
+always try to match all possible parser paths (full parser hierarchy
+traversal) until it reaches a point of certainty, that the whole thing
+matches or fails to match.
+
+[:Backtracking and Greedy RD
+
+Spirit is quite consistent and intuitive about when it backtracks and to
+where, although it may not be obvious to those coming from different
+backgrounds. In general, any (sub)parser will, given the same input,
+always match the same portion of the input (or fail to match the input
+at all). This means that Spirit is inherently greedy. Spirit will only
+backtrack when a (sub)parser fails to match the input, and it will
+always backtrack to the next choice point upward (not backward) in the
+parser structure. In other words abb|ab will match `"ab"`, as will
+`a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will
+always match the `'b'` after the `'a'` if it is available.
+
+--Rainer Deyke]
+
+This is the very nature of __peg__.
+
+There's a strong preference on "simplicity with all the knobs when you
+need them" approach, right now. On the other hand, the flexibility of
+Spirit makes it possible to have different optional schemes available.
+It might be possible to implement an exhaustive backtracking RD scheme
+as an optional feature in the future.
+
[endsect]
Modified: trunk/libs/spirit/doc/spirit2.qbk
==============================================================================
--- trunk/libs/spirit/doc/spirit2.qbk (original)
+++ trunk/libs/spirit/doc/spirit2.qbk 2011-01-18 22:44:14 EST (Tue, 18 Jan 2011)
@@ -117,6 +117,7 @@
[/ References to classes ]
[def __utree__ [link spirit.support.utree `utree`]]
+[def __multi_pass__ [link spirit.support.multi_pass `multi_pass`]]
[def __class_token_def__ [/ link spirit.lex.reference.tokendef `token_def<>`] `lex::token_def<>`]
@@ -309,7 +310,7 @@
''']
[/ index[title type]
- insert index
+ insert index
title: section title for the index
type: type of the index
@@ -423,9 +424,9 @@
[/ Here we go ]
-This is the documentation of the newest version of __spirit__ (currently,
-V2.4.2). If you're looking for the documentation of Spirit's previous version
-(formerly Spirit V1.8), see __classic__.
+This is the documentation of the newest version of __spirit__ (currently,
+V2.4.2). If you're looking for the documentation of Spirit's previous version
+(formerly Spirit V1.8), see __classic__.
[include preface.qbk]
[include what_s_new.qbk]
@@ -439,7 +440,7 @@
[include support.qbk]
[include faq.qbk]
[include notes.qbk]
-[/include rationale.qbk]
+[include rationale.qbk]
[include repository.qbk]
[include acknowledgments.qbk]
[include references.qbk]
Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk