Boost logo

Boost-Commit :

From: hartmut.kaiser_at_[hidden]
Date: 2008-07-08 14:37:32


Author: hkaiser
Date: 2008-07-08 14:37:32 EDT (Tue, 08 Jul 2008)
New Revision: 47245
URL: http://svn.boost.org/trac/boost/changeset/47245

Log:
Spirit: docs
Binary files modified:
   branches/release/libs/spirit/doc/html/images/spiritkarmaflow.png
   branches/release/libs/spirit/doc/html/images/spiritstructure.png
Text files modified:
   branches/release/libs/spirit/doc/introduction.qbk | 155 ++++++++++++++++++++++++++++++++++++---
   1 files changed, 140 insertions(+), 15 deletions(-)

Modified: branches/release/libs/spirit/doc/html/images/spiritkarmaflow.png
==============================================================================
Binary files. No diff available.

Modified: branches/release/libs/spirit/doc/html/images/spiritstructure.png
==============================================================================
Binary files. No diff available.

Modified: branches/release/libs/spirit/doc/introduction.qbk
==============================================================================
--- branches/release/libs/spirit/doc/introduction.qbk (original)
+++ branches/release/libs/spirit/doc/introduction.qbk 2008-07-08 14:37:32 EDT (Tue, 08 Jul 2008)
@@ -11,11 +11,15 @@
 Boost Spirit is an object oriented, recursive-descent parser and output generation
 library for C++. It allows to write grammars and format descriptions using a
 format very similar to EBNF (Extended Backus Naur Form, see [4]) directly in
-C++. It allows to describe the input structure and the output format
-specification in a very similar way, and based on a single syntax and semantics.
+C++. These inline grammar specifications can mix freely with other C++ code and,
+thanks to the generative power of C++ templates, are immediately executable.
+In retrospect, conventional compiler-compilers or parser-generators have to
+perform an additional translation step from the source EBNF code to C or C++
+code.
+
 The syntax and semantics of the libraries API directly form domain specific
-languages (DSEL - domain specific languages). In fact, Spirit exposes 3
-different DSEL's to the user:
+embedded languages (DSEL). In fact, Spirit exposes 3 different DSEL's to the
+user:
 
 * one for creating parser grammars,
 * one for the specification of the required tokens to be used for parsing,
@@ -24,7 +28,7 @@
 Since the target input grammars and output formats are written entirely in C++
 we do not need any separate tools to compile, preprocess, or integrate those
 into the build process. __spirit__ allows seamless integration of the parsing
-and output gerenation process with other C++ code. Often this allows for
+and output generation process with other C++ code. Often this allows for
 simpler and more efficient code.
 
 Both, the created parsers and generators, are fully attributed which allows to
@@ -32,9 +36,7 @@
 structures resemble the structure of the input data and can directly be used to
 generate arbitrarily formatted output.
 
-Immediately executable
-
-The [link spirit.spiritstructure picture] below depicts the overall structure
+The [link spirit.spiritstructure figure] below depicts the overall structure
 of the Boost Spirit library. The library consists out of 4 major parts:
 
 * __classic__: This is the almost unchanged code base taken from the
@@ -58,15 +60,138 @@
 with any of the other parts. Because of their similar structure and identical
 underlying technology these are usable either separately or together at the
 same time. For instance is it possible to directly feed the hierarchical data
-structures generated by __qi__ into output generators created using __karma__.
+structures generated by __qi__ into output generators created using __karma__;
+or to use the token sequence generated by __lex__ as the input for a parser
+generated by __qi__.
 
 
-The [link spirit.spiritkarmaflow picture] below shows the typical data flow of
+The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
 some input being converted to some internal representation. After some
-(optional) transformation this data is converted back into some external
-representation. The picture highlights the place in this data transformation
-flow where __spirit__ can be used.
-
-[fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a typical data transformation application..spirit.spiritkarmaflow]
+(optional) transformation this data is converted back into some different,
+external representation. The picture highlights Spirit's the place in this data
+transformation flow.
+
+[fig ./images/spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
+
+[heading A quick overview about Parsing with __qi__]
+
+__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
+target grammar (essentially a format description of the input data to read).
+
+A simple EBNF grammar snippet:
+
+ group ::= '(' expression ')'
+ factor ::= integer | group
+ term ::= factor (('*' factor) | ('/' factor))*
+ expression ::= term (('+' term) | ('-' term))*
+
+is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
+code snippet:
+
+ group = '(' >> expression >> ')';
+ factor = integer | group;
+ term = factor >> *(('*' >> factor) | ('/' >> factor));
+ expression = term >> *(('+' >> term) | ('-' >> term));
+
+Through the magic of expression templates, this is perfectly valid and
+executable C++ code. The production rule `expression` is in fact an object that
+has a member function parse that does the work given a source code written in
+the grammar that we have just declared. Yes, it's a calculator. We shall
+simplify for now by skipping the type declarations and the definition of the
+rule integer invoked by factor. Now, the production rule `expression` in our
+grammar specification, traditionally called the start symbol, can recognize
+inputs such as:
+
+ 12345
+ -12345
+ +12345
+ 1 + 2
+ 1 * 2
+ 1/2 + 3/4
+ 1 + 2 + 3 + 4
+ 1 * 2 * 3 * 4
+ (1 + 2) * (3 + 4)
+ (-1 + 2) * (3 + -4)
+ 1 + ((6 * 200) - 20) / 6
+ (1 + (2 + (3 + (4 + 5))))
+
+Certainly we have done some modifications to the original EBNF syntax. This is
+done to conform to C++ syntax rules. Most notably we see the abundance of
+shift >> operators. Since there are no 'empty' operators in C++, it is simply
+not possible to write something like:
+
+ a b
+
+as seen in math syntax, for example, to mean multiplication or, in our case,
+as seen in EBNF syntax to mean sequencing (b should follow a). Spirit
+uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
+with arrows pointing to the right, to mean "is followed by". Thus we write:
+
+ a >> b
+
+The alternative operator `|` and the parentheses `()` remain as is. The
+assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
+the Kleene star `*` which used to be a postfix operator in EBNF becomes a
+prefix. Instead of:
+
+ a* //... in EBNF syntax,
+
+we write:
+
+ *a //... in Spirit.
+
+since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
+rule with the ubiquitous semi-colon, `;`.
+
+
+[heading A quick overview about Output Generation with __karma__]
+
+Spirit not only allows to describe the structure of the input. Starting with
+Version 2.0 it enables the specification of the output format for your data
+in a very similar way, and based on a single syntax and compatible semantics.
+
+Let's assume we need to generate a textual representation from a simple data
+structure as a `std::vector<int>`. Conventional code probably would look like:
+
+ std::vector<int> v (initialize_and_fill());
+ std::vector<int>::iterator end = v.end();
+ for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
+ std::cout << *it << std::endl;
+
+which is not very flexible and quite difficult to maintain when it comes to
+changing the required output format. Spirit's sublibrary /Karma/ allows to
+specify output formats for arbitrary data structures in a very flexible way.
+following snippet is the /Karma/ format description used to create the very
+The same output as the traditional code above:
+
+ *(int_ << eol)
+
+Here are some more examples of format descriptions for different output
+representations of the same `std::vector<int>`:
+
+[table Different output formats for `std::vector<int>`
+ [ [Format] [Example] [Description] ]
+ [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
+ [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),]`] [Comma separated list of integers in parenthesis] ]
+ [ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
+ [ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
+]
+
+The syntax is very similar to /Qi/ with the exception that we use the `<<`
+operator for output concatenation. This should be easy to understand as it
+follows the conventions used in the Standard's I/O streams.
+
+Another important feature of /karma/ is to allow to fully decouple the data
+type from the output format. You can use the same output format with different
+data types as long as these conforma conceptually. The next table gives some
+related examples.
+
+[table Different data types usable with the output format `(*int_ << eol)`
+ [ [Data type] ]
+ [ [`int i[4]`] [C style arrays] ]
+ [ [`std::vector<int>`] [Standard vector] ]
+ [ [`std::list<int>`] [Standard list] ]
+ [ [`boost::array<long, 20>`] [Boost array] ]
+]
 
 [endsect]


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk