Boost logo

Boost-Commit :

From: lists.drrngrvy_at_[hidden]
Date: 2007-11-09 23:24:19


Author: drrngrvy
Date: 2007-11-09 23:24:18 EST (Fri, 09 Nov 2007)
New Revision: 40979
URL: http://svn.boost.org/trac/boost/changeset/40979

Log:
Adding Grammar and Semantic Actions sections and, after a wander and a ramble, finishing the in-depth pages. That concludes all of the Spirit Core docs. All that's left now is the FAQ and the quickref (and then sorting out the issues/internal linkages/macros).
Added:
   sandbox/boost_docs/branches/spirit_qbking/doc/src/grammar.qbk (contents, props changed)
   sandbox/boost_docs/branches/spirit_qbking/doc/src/semantic_actions.qbk (contents, props changed)
Text files modified:
   sandbox/boost_docs/branches/spirit_qbking/doc/project-root.jam | 4 ++--
   sandbox/boost_docs/branches/spirit_qbking/doc/src/spirit.qbk | 8 ++++++++
   2 files changed, 10 insertions(+), 2 deletions(-)

Modified: sandbox/boost_docs/branches/spirit_qbking/doc/project-root.jam
==============================================================================
--- sandbox/boost_docs/branches/spirit_qbking/doc/project-root.jam (original)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/project-root.jam 2007-11-09 23:24:18 EST (Fri, 09 Nov 2007)
@@ -1,4 +1,4 @@
 import os ;
 
-path-constant BOOST_ROOT : [ os.environ BOOST_ROOT ] ;
-path-constant BOOST_BUILD_PATH : [ os.environ BOOST_BUILD_PATH ] ;
+#path-constant BOOST_ROOT : [ os.environ BOOST_ROOT ] ;
+#path-constant BOOST_BUILD_PATH : [ os.environ BOOST_BUILD_PATH ] ;

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/grammar.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/grammar.qbk 2007-11-09 23:24:18 EST (Fri, 09 Nov 2007)
@@ -0,0 +1,222 @@
+
+[section:grammar The Grammar]
+
+The *Grammar* encapsulates a set of rules. The `grammar` class is a protocol base class. It is essentially an interface contract. The `grammar` is a template class that is parameterized by its derived class, `DerivedT`, and its context, `ContextT`. The template parameter `ContextT` defaults to `parser_context`, a predefined context.
+
+You need not be concerned at all with the `ContextT` template parameter unless you wish to tweak the low level behavior of the grammar. Detailed information on the `ContextT` template parameter is provided elsewhere. The grammar relies on the template parameter `DerivedT`, a grammar subclass to define the actual rules.
+
+Presented below is the public API. There may actually be more template parameters after `ContextT`. Everything after the `ContextT` parameter should not be of concern to the client and are strictly for internal use only.
+
+``
+ template<
+ typename DerivedT,
+ typename ContextT = parser_context<> >
+ struct grammar;
+``
+
+[section:definition Grammar definition]
+
+A concrete sub-class inheriting from grammar is expected to have a nested template class (or struct) named definition:
+
+* It is a nested template class with a typename ScannerT parameter.
+* Its constructor defines the grammar rules.
+* Its constructor is passed in a reference to the actual grammar self.
+* It has a member function named start that returns a reference to the start rule.
+
+[endsect][/ definition]
+
+[section:skeleton Grammar skeleton]
+
+``
+ struct my_grammar : public grammar<my_grammar>
+ {
+ template <typename ScannerT>
+ struct definition
+ {
+ rule<ScannerT> r;
+ definition(my_grammar const& self) { r = /*..define here..*/; }
+ rule<ScannerT> const& start() const { return r; }
+ };
+ };
+``
+
+Decoupling the scanner type from the rules that form a grammar allows the grammar to be used in different contexts possibly using different scanners. We do not care what scanner we are dealing with. The user-defined `my_grammar` can be used with [*any] type of scanner. Unlike the rule, the grammar is not tied to a specific scanner type. See "__Scanner_Business__" to see why this is important and to gain further understanding on this scanner-rule coupling problem.
+
+[header Instantiating and using `my_grammar`]
+
+Our grammar above may be instantiated and put into action:
+
+``
+ my_grammar g;
+
+ if (parse(first, last, g, space_p).full)
+ cout << "parsing succeeded\n";
+ else
+ cout << "parsing failed\n";
+``
+
+`my_grammar` [*IS-A] parser and can be used anywhere a parser is expected, even referenced by another rule:
+
+``
+ rule<> r = g >> str_p("cool huh?");
+
+``
+
+[important [*Referencing grammars]
+
+Like the rule, the grammar is also held by reference when it is placed in the right hand side of an EBNF expression. It is the responsibility of the client to ensure that the referenced grammar stays in scope and does not get destructed while it is being referenced.
+]
+
+[endsect][/ skeleton]
+
+[section:example Full Grammar Example]
+
+Recalling our original calculator example, here it is now rewritten using a grammar:
+
+``
+ struct calculator : public grammar<calculator>
+ {
+ template <typename ScannerT>
+ struct definition
+ {
+ definition(calculator const& self)
+ {
+ group = '(' >> expression >> ')';
+ factor = integer | group;
+ term = factor >> *(('*' >> factor) | ('/' >> factor));
+ expression = term >> *(('+' >> term) | ('-' >> term));
+ }
+
+ rule<ScannerT> expression, term, factor, group;
+
+ rule<ScannerT> const&
+ start() const { return expression; }
+ };
+ };
+``
+
+__lens__ A fully working example with [link __semantic_actions__ semantic actions] can be [@__example__/fundamental/calc_plain.cpp viewed here]. This is part of the Spirit distribution.
+
+[info `self`
+
+You might notice that the definition of the grammar has a constructor that accepts a `const` reference to the outer grammar. In the example above, notice that `calculator::definition` takes in a `calculator const& self`. While this is unused in the example above, in many cases, this is very useful. The self argument is the definition's window to the outside world. For example, the calculator class might have a reference to some state information that the definition can update while parsing proceeds through [link __semantic_actions__ semantic_actions__].
+]
+
+[endsect][/ example]
+
+[section:capsules Grammar Capsules]
+
+As a grammar becomes complicated, it is a good idea to group parts into logical modules. For instance, when writing a language, it might be wise to put expressions and statements into separate grammar capsules. The grammar takes advantage of the encapsulation properties of C++ classes. The declarative nature of classes makes it a perfect fit for the definition of grammars. Since the grammar is nothing more than a class declaration, we can conveniently publish it in header files. The idea is that once written and fully tested, a grammar can be reused in many contexts. We now have the notion of grammar libraries.
+
+[endsect][/ capsules]
+
+[section:mt Reentrancy and multithreading]
+
+An instance of a grammar may be used in different places multiple times without any problem. The implementation is tuned to allow this at the expense of some overhead. However, we can save considerable cycles and bytes if we are certain that a grammar will only have a single instance. If this is desired, simply define `BOOST_SPIRIT_SINGLE_GRAMMAR_INSTANCE` before including any spirit header files.
+
+``
+ #define BOOST_SPIRIT_SINGLE_GRAMMAR_INSTANCE
+``
+
+On the other hand, if a grammar is intended to be used in multithreaded code, we should then define `BOOST_SPIRIT_THREADSAFE` before including any spirit header files. In this case it will also be required to link against __Boost_Thread__.
+
+``
+ #define BOOST_SPIRIT_THREADSAFE
+``
+
+[header Using more than one grammar `start` rule]
+
+Sometimes it is desirable to have more than one visible entry point to a grammar (apart from the `start` rule). To allow additional start points, Spirit provides a helper template `grammar_def`, which may be used as a base class for the definition subclass of your grammar. Here's an example:
+
+``
+ // this header has to be explicitly included
+ #include <boost/spirit/utility/grammar_def.hpp>
+
+ struct calculator2 : public grammar<calculator2>
+ {
+ enum
+ {
+ expression = 0,
+ term = 1,
+ factor = 2,
+ };
+
+ template <typename ScannerT>
+ struct definition
+ : public grammar_def<rule<ScannerT>, same, same>
+ {
+ definition(calculator2 const& self)
+ {
+ group = '(' >> expression >> ')';
+ factor = integer | group;
+ term = factor >> *(('*' >> factor) | ('/' >> factor));
+ expression = term >> *(('+' >> term) | ('-' >> term));
+
+ this->start_parsers(expression, term, factor);
+ }
+
+ rule<ScannerT> expression, term, factor, group;
+ };
+ };
+``
+
+The `grammar_def` template has to be instantiated with the types of all the rules you wish to make visible from outside the grammar:
+
+``
+ grammar_def<rule<ScannerT>, same, same>
+``
+
+The shorthand notation same is used to indicate that the same type be used as specified by the previous template parameter (e.g. `rule<ScannerT>`). Obviously, same may not be used as the first template parameter.
+
+[tip [*`grammar_def` start types]
+
+It may not be obvious, but it is interesting to note that aside from `rule<>`s, any parser type may be specified (e.g. `chlit<>`, `strlit<>`, `int_parser<>`, etc.).
+]
+
+Using the `grammar_def` class, there is no need to provide a `start()` member function anymore. Instead, you'll have to insert a call to the `this->start_parsers()` (which is a member function of the `grammar_def` template) to define the start symbols for your grammar. __note__ Note that the number and the sequence of the rules used as the parameters to the `start_parsers()` function should match the types specified in the `grammar_def` template:
+
+``
+ this->start_parsers(expression, term, factor);
+``
+
+The grammar entry point may be specified using the following syntax:
+
+``
+ g.use_parser<N>() // Where g is your grammar and N is the Nth entry.
+``
+
+This sample shows how to use the `term` rule from the `calculator2` grammar above:
+
+``
+ calculator2 g;
+
+ if (parse(
+ first, last,
+ g.use_parser<calculator2::term>(),
+ space_p
+ ).full)
+ {
+ cout << "parsing succeeded\n";
+ }
+ else {
+ cout << "parsing failed\n";
+ }
+``
+
+The template parameter for the `use_parser<>` template type should be the zero based index into the list of rules specified in the `start_parsers()` function call.
+
+[note [*`use_parser<0>`]
+
+Note, that using `0` (zero) as the template parameter to `use_parser` is equivalent to using the `start` rule, exported by conventional means through the `start()` function, as shown in the first calculator sample above. So this notation may be used even for grammars exporting one rule through its `start()` function only. On the other hand, calling a grammar without the `use_parser` notation will execute the rule specified as the first parameter to the `start_parsers()` function.
+]
+
+The maximum number of usable start rules is limited by the preprocessor constant:
+
+``
+ BOOST_SPIRIT_GRAMMAR_STARTRULE_TYPE_LIMIT // defaults to 3
+``
+
+[endsect][/ mt]
+
+[endsect][/ grammar]
+

Added: sandbox/boost_docs/branches/spirit_qbking/doc/src/semantic_actions.qbk
==============================================================================
--- (empty file)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/semantic_actions.qbk 2007-11-09 23:24:18 EST (Fri, 09 Nov 2007)
@@ -0,0 +1,243 @@
+
+[section Semantic Actions]
+
+Semantic actions have the form: [*expression\[action\]]
+
+Ultimately, after having defined our grammar and having generated a corresponding parser, we will need to produce some output and do some work besides syntax analysis; unless, of course, what we want is merely to check for the conformance of an input with our grammar, which is very seldom the case. Semantic actions may be attached to any expression at any level within the parser hierarchy. An action is a C/C++ function or function object that will be called if a match is found in the particular context where it is attached. The action function serves as a hook into the parser and may be used to, for example:
+
+* Generate output from the parser (ASTs, for example);
+* Report warnings or errors;
+* Manage symbol tables.
+
+[section:generic Generic Semantic Actions (Transduction Interface)]
+
+A generic semantic action can be any free function or function object that is compatible with the interface:
+
+``
+ void f(IteratorT first, IteratorT last);
+``
+
+where `IteratorT` is the type of iterator used, `first` points to the current input and `last` points to one after the end of the input (identical to STL iterator ranges). A function object (functor) should have a member `operator()` with the same signature as above:
+
+``
+ struct my_functor
+ {
+ void operator()(IteratorT first, IteratorT last) const;
+ };
+``
+
+Iterators pointing to the matching portion of the input are passed into the function/functor.
+
+In general, semantic actions accept the first-last iterator pair. This is the transduction interface. The action functions or functors receive the unprocessed data representing the matching production directly from the input. In many cases, this is sufficient. Examples are source to source translation, pre-processing, etc.
+
+[header:example Example:]
+
+``
+ void
+ my_action(char const* first, char const* last)
+ {
+ std::string str(first, last);
+ std::cout << str << std::endl;
+ }
+
+ rule<> myrule = (a | b | *(c >> d))[&my_action];
+``
+
+The function `my_action` will be called whenever the expression `(a | b | *(c >> d)` matches a portion of the input stream while parsing. Two iterators, `first` and `last`, are passed into the function. These iterators point to the start and end, respectively, of the portion of input stream where the match is found.
+
+[header Const-ness:]
+
+With functors, take note that the `operator()` should be `const`. This implies that functors are immutable. One may wish to have some member variables that are modified when the action gets called. This is not a good idea. First of all, functors are preferably lightweight. Functors are passed around a lot and it would incur a lot of overhead if the functors are heavily laden. Second, functors are passed by value. Thus, the actual functor object that finally attaches to the parser, will surely not be the original instance supplied by the client. What this means is that changes to a functor's state will not affect the original functor that the client passed in since they are distinct copies. If a functor needs to update some state variables, which is often the case, it is better to use references to external data. The following example shows how this can be done:
+
+``
+ struct my_functor
+ {
+ my_functor(std::string& str_)
+ : str(str_) {}
+
+ void
+ operator()(IteratorT first, IteratorT last) const
+ {
+ str.assign(first, last);
+ }
+
+ std::string& str;
+ };
+``
+
+[header:full_example Full Example:]
+
+Here now is our calculator enhanced with semantic actions:
+
+``
+ namespace
+ {
+ void do_int(char const* str, char const* end)
+ {
+ string s(str, end);
+ cout << "PUSH(" << s << ')' << endl;
+ }
+
+ void do_add(char const*, char const*) { cout << "ADD\n"; }
+ void do_subt(char const*, char const*) { cout << "SUBTRACT\n"; }
+ void do_mult(char const*, char const*) { cout << "MULTIPLY\n"; }
+ void do_div(char const*, char const*) { cout << "DIVIDE\n"; }
+ void do_neg(char const*, char const*) { cout << "NEGATE\n"; }
+ }
+``
+
+We augment our grammar with semantic actions:
+
+``
+ struct calculator : public grammar<calculator>
+ {
+ template <typename ScannerT>
+ struct definition
+ {
+ definition(calculator const& self)
+ {
+ expression
+ = term
+ >> *( ('+' >> term)[&do_add]
+ | ('-' >> term)[&do_subt]
+ )
+ ;
+
+ term =
+ factor
+ >> *( ('*' >> factor)[&do_mult]
+ | ('/' >> factor)[&do_div]
+ )
+ ;
+
+ factor
+ = lexeme_d[(+digit_p)[&do_int]]
+ | '(' >> expression >> ')'
+ | ('-' >> factor)[&do_neg]
+ | ('+' >> factor)
+ ;
+ }
+
+ rule<ScannerT> expression, term, factor;
+
+ rule<ScannerT> const&
+ start() const { return expression; }
+ };
+ };
+``
+
+Feeding in the expression `(-1 + 2) * (3 + -4)`, for example, to the rule `expression` will produce the expected output:
+
+``
+-1
+2
+ADD
+3
+-4
+ADD
+MULT
+``
+
+which, by the way, is the Reverse Polish Notation (RPN) of the given expression, reminiscent of some primitive calculators and the language Forth.
+
+[$__lens__ [@__examples__/fundamental/calc_plain.cpp View the complete source code here]. This is part of the Spirit distribution.
+
+[endsect][/ general]
+
+[section:specialized Specialized Actions]
+
+In general, semantic actions accept the first-last iterator pair. There are situations though where we might want to pass data in its processed form. A concrete example is the numeric parser. It is unwise to pass unprocessed data to a semantic action attached to a numeric parser and just throw away what has been parsed by the parser. We want to pass the actual parsed number.
+
+The function and functor signature of a semantic action varies depending on the parser where it is attached to. The following table lists the parsers that accept unique signatures.
+
+[note
+Unless explicitly stated in the documentation of a specific parser type, parsers not included in the list by default expect the generic signature as explained above.
+]
+
+[section Numeric Actions]
+
+[header Applies to:]
+
+* `uint_p`
+* `int_p`
+* `ureal_p`
+* `real_p`
+
+[header Signature for functions:]
+
+``
+ void func(NumT val);
+``
+
+[header Signature for functors:]
+
+``
+ struct ftor
+ {
+ void operator()(NumT val) const;
+ };
+``
+
+Where `NumT` is any primitive numeric type such as `int`, `long`, `float`, `double`, etc., or a user defined numeric type such as `big_int`. `NumT` is the same type used as template parameter to `uint_p`, `int_p`, `ureal_p` or `real_p`. The parsed number is passed into the function/functor.
+
+[endsect][/ numeric_actions]
+
+[section Character Actions]
+
+[header Applies to:]
+
+* `chlit`, `ch_p`
+* `range`, `range_p`
+* `anychar`
+* `alnum`, `alpha`
+* `cntrl`, `digit`
+* `graph`, `lower`
+* `print`, `punct`
+* `space`, `upper`
+* `xdigit`
+
+[header Signature for functions:]
+
+``
+ void func(CharT ch);
+``
+
+[header Signature for functors:]
+
+``
+ struct ftor
+ {
+ void operator()(CharT ch) const;
+ };
+``
+
+Where `CharT` is the `value_type` of the iterator used in parsing. A `char const*` iterator for example has a `value_type` of `char`. The matching character is passed into the function/functor.
+
+[endsect][/ specialized]
+
+[section:cascading Cascading Actions]
+
+Actions can be cascaded. Cascaded actions also inherit the function/functor interface of the original. For example:
+
+``
+ uint_p[fa][fb][fc]
+``
+
+Here, the functors `fa`, `fb` and `fc` all expect the signature `void operator()(unsigned n) const`.
+
+[endsect][/ cascading]
+
+[section:d_and_r Directives and Actions]
+
+Directives inherit the the function/functor interface of the subject it is enclosing. Example:
+
+``
+ as_lower_d[ch_p('x')][f]
+``
+
+Here, the functor `f` expects the signature `void operator()(char ch) const`, assuming that the iterator used is a `char const*`.
+
+[endsect][/ d_and_r]
+
+[endsect][/ semantic_actions]
+

Modified: sandbox/boost_docs/branches/spirit_qbking/doc/src/spirit.qbk
==============================================================================
--- sandbox/boost_docs/branches/spirit_qbking/doc/src/spirit.qbk (original)
+++ sandbox/boost_docs/branches/spirit_qbking/doc/src/spirit.qbk 2007-11-09 23:24:18 EST (Fri, 09 Nov 2007)
@@ -45,6 +45,10 @@
 [def __BLL__ [@http://www.boost.org/libs/lambda BLL]]
 [def __Phoenix__ [@http://www.boost.org/libs/spirit/phoenix]]
 
+[/ Internal Spirit doc link shortcuts]
+[def __lens__ [$../theme/lens.gif]]
+[def __note__ [$../theme/note.gif]]
+
 [def __boost_ref__ [@http://www.boost.org/libs/ref boost::ref]]
 [def __phoenix_var__ [@http://www.boost.org/libs/spirit/phoenix/doc/variables.html phoenix::var]]
 
@@ -94,18 +98,22 @@
 [include scanner.qbk]
 
 The Grammar
+[include grammar.qbk]
 
 Subrules
 [include subrules.qbk]
 
 Semantic Actions
+[include semantic_actions.qbk]
 
 In-depth: The Parser
+[include in_depth/the_parser.qbk]
 
 In-depth: The Scanner
 [include in_depth/the_scanner.qbk]
 
 In-depth: The Parser Context
+[include in_depth/the_parser_context.qbk]
 
 [endsect][/ core]
 


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk