Boost logo

Boost :

From: rogeeff (rogeeff_at_[hidden])
Date: 2002-01-17 02:18:06

--- In boost_at_y..., "David A. Greene" <greened_at_e...> wrote:
> rogeeff wrote:
> > First of all I would like to say that IMO it is odd even to
> > an ability to use Spirit for generic command line parser. It's
> > use a canon to kill a fly. For one It is very expensive and heavy
> > also I should drag it all over the place.
> How do you know this? I cannot make a judgement because I haven't
> used Spirit (yet). But I know from reading the Spirit mailing list
> that Joel, et. al. have put in lots of thought on how to keep things
> lightweight.

What I meant is adding line #include "boost/spirit/spirit.hpp" in
your code immidiately produce ~600k of include files (and this
without counting standart headers). And what if I need ot access CLA
in a several/all files of my project? My estimation that CLA parser
should take about 20 times less, cause since CLA parsing is not
performance critical, the library should be mostly offline
implemented. Now by default parser should be able to hanle integer
type, floating point values strings, boolean values (flags) and
probably also some support for collection of them. I hope you agree
that I do not need Spirit to parse integer value from string. Also
the framework should support an ability for user define it's own CLA
class, with it's own parsing logic. And here he(user) can use
whatever means he prefer to implement it (tokenizer,regexp,
handwritten code, Spirit). But this is not a part of CLA parser
framework - it's user code. There are also several other points:

* I was not able to find out portability report for Spirit. Since CLA
parsing is very basic facility, I should be able to compile it on
majority of compilers.
* I could be wrong, but Spirit seems to be static compile-time
facility. I.e. I can't load CLA scheme dynamicaly or read it from
configuration file. Also how would it distributed definitions?
* Even if I do not load parser rules from external file, I still
could be in a situation when I do not know parsing rules at my
compile time, cause I am a library developer and parsing rules are
provided by my users.

Spirit is a parser framework. Command line/Configuration processing
is a different realm with diffrent rules and priorities. I do not say
that one can't implement concrete CLA parser using Spirit. I just
question it'a ability to be used for generic framework.

> >>Nope. Spirit is targetted at any C++ programmer who needs to do
> >>parsing, which is most of them. Even more so now in the age of
> >>internet and all it's protocols which need to be parsed.
> >>
> > So now iin any place where we were using tokenizer or regular
> > expression I should pile up Spirit.
> No, not always. Sometimes you just want a tokenizer to work with
> data or some similar thing like /etc/passwd data. There is really
> no grammar there (or rather, it is a very simple one).
> For almost anything non-trivial (read, not a delimited set of
> one usually ends up wanting a formalized grammar specification
> and a parser to go along with it, just to make things simpler in the
> long run. This is my personal experience, so take it with a
> grain of salt.

I agree with you. Every task has it own tools to solve it. More over
I also thinking about using Spirit for some of my parsing purposes.
It I would be able to compile it with my compiler ;-))

> >>If a programmer doesn't know EBNF, then they are missing an
> >>important piece of knowledge, since it's the standard for computer
> >>langauge definition.
> >
> > Are you so sure? How is it in reallity?
> It's a standard, all right. It's also well-known. Spirit deviates
> from it in various areas. Tools like YACC have done the same thing
> in the past. But Spirit is "close enough" for people familiar with
> EBNF or YACC to make due.
> Parsing is a common and important task in computer science.
> Rudimentary familiarity with the available tools is a must. It's
> not terribly difficult to grasp.

I would assume that thare are a lot of programmers that never had a
need to parse a formal grammar that complex that they would need YACC
or even simply knowledge of EBNF, though I do not question it's value.

> >>Spirit is aimed to be a general parsing framework. It is not
aimed to
> >>be a simple command line parser.
> >>
> >
> > That's the point.
> You missed the point. Spirit is flexible enough for many, many
> parsing tasks, including implementation of the command-line parser.
> One need not expose the Spirit interface to the programmer. But it
> makes a great deal of sense to me to use Spirit to do the actual
> parsing.

I did not get it. What will provide an interface and where do you see
a place for Spirit? Specifically, with example.

> >>structured in such a way, that you only pay for what you use.
> >
> > How much line of includes it will add to use a Spirit to parse a
> > or less complex command line?
> Depends what "complex" means, I suppose. I think people get way too
> caught up arguing about overhead for command-line parsing. It's a
> one-shot job so performance shouldn't be an issue. Size is
> a concern. I don't have a feel for how Spirit scales in that sense.
> It would be nice to have some numbers. Joel, do the Spirit guys
> working on command-line parsers have any size numbers they can

There are several questions:

1. How portable it is?
2. How it affect compilation time?
3. How it affect code size?

Let do not forget that this framework is supposed to fit for majority
of programs from tiny test program to complex and bulk process.

> >>sense (to me anyway) to use spirit internally for the command
> >>parser implementation. But, that still restricts the user to
> >>whatever option format(s) the command line parser supports.
> >
> > Why would I want to use the Spirit in the implementation? Rather
> > regexp or tokenizer? Is it that flexible that I can implement
> > arbitrary parsing with it?
> I don't know what you mean by "arbitrary parsing." Spirit is at
> least as flexible as YACC (well, except for left-recursion,
> probably :)).

How about error handling? what if I want to ignore an error and
proceed. How one-liner below would handle it?

> regexp and tokenizer don't necessarily have enough power to do
> the job. Consider the option format we use in our software:
> --option1={--nestedOption1=value1
> --nestedOption2={--nestedNestedOption1
> --nestedNestedOption2=value2}
> --nestedOption3}
> Tokenizer ain't gonna help much with that. Regex won't either,
> because we need to be able to match a _tree_, not a linear sequence
> of characters. We also need to be able to perform various
> actions during the matching or produce some structure (i.e. a
> parse tree or AST) that allows us to post-process the matched
> input.
> Sure, it's _possible_ to parse with tokenizer and regex, in the
> same sense that it's _possible_ to write "OO" code in assembler.
> With tokenizer and regexp you'll end up writing a highly specialized
> semantic action framework -- a framework that is already available
> in Spirit.
> For simple command-line specifications tokenizer may be sufficient.
> It isn't for us. If the cost isn't too high (and that remains to
> be seen), I see no reason not to use Spirit.

As I sad above you sure have to have an ability to implement your
own "very complex" parsing and somehow plug it into the framework.
But it should not be part of the framework.

> >>Learning how to use spirit is no different than learning a new
> >>or library.
> > I would assume that command-line parser still will have MUCH more
> > simpler interface.
> And by trading off flexibility for simplicity that parser can
> still have the same interface but be implemented with Spirit.
> If we want to get really fancy we can augment the command-line
> interface to allow more complex specifications, allowing the
> library parser to be extended. I think this is crucial for
> any command-line processor that is included in a library,
> especially one like Boost. Honestly, how many of us actually
> use getopt() regularly for anything but the most trivial
> utilities?

Id did not say that I agree with any flexibility tradeoff. But the
interface should be as simple as possible: 1. plug parsing rule, 2.
parse, 3. get value. Couple predefined parsing rules, like for
interger, string e.t.c, plus an ability to plug arbitrary user-
defined parsing rule. There could be variations and some
enhancements, but something around this (in reality you would also
want the framework to support several predefined kinds of argument
identification for user to choose from).

> >> Spirit makes parsing incredibly easy. Say, for instance you
> >>had to write a function that would parse a complex number of the
> >>form:
> >>real, (real), or (real,imaginary) and store the real and
> >>parts in 2 doubles:
> >>With spirit, it's a one liner:
> >>
> >>return (
> >> real_p[ref(real)]
> >> | '('
> >> >> real_p[ref(real)]
> >> >> !(',' >> real_p[ref(imaginary)])
> >> >> ')'
> >> ).parse(str, str+strlen(str));
> > I would say that it will take at least 10 min for maintanance
> > programmer to grasp what is written here (and this is not
> > understanding how its working).
> I agree Spirit looks a little cryptic. In particular the assignment
> of values is rather "magical" ("ref" should probably be named
> "assign_to"). But even so, as someone who has experience with YACC
> but zero with Spirit, I can follow this and understand what it means
> (except for the bang, which I had to look up, but it makes sense
> if you consider it an "|" with an empty left operand).
> Now that I've taken 5 minutes of my time to look up two things
> I wasn't compeletely familiar with, I can understand almost any
> Spirit grammar.

How many programmers are familiar with YACC and how many would use
CLA parser?

> > I would implement the same logic in 2-3 lines using tokenizer or
> > regexp. Something like this:
> > token_iterator it( str, " \t,()" );
                                 ^- I did not remarked these in you
one-liner originally

> >
> > real = lexical_cast<double>( *it++ );
> > imaginary = lexical_cast<double>( *it );
> Well, complex numbers are not a really good example to show why a
> parser is useful because they are just a CSV structure with some
> optional syntactic sugar on the ends.
> Even so, this tokenizer example is at least as hard to understand
> as the Spirit example. How are parens and the comma ignored? All
> I see in a quick glance through the tokenizer documentation is that
> by default "punctuation" is skipped. What "punctuation" means is
> not immediately obvious, nor is a definition of the default
> punctuation set easily found.
> I don't see a token_iterator(char *, char *) constructor. I don't
> even see a "token_iterator" declared anywhere. Are you sure your
> example works? Am I missing something?

Wait, wait. It is not boost tokenizer (it my simple token_iterator I
am using for old Sun compiler that can't handle boost one). Given the
    token_iterator( const_string string_to_tokenize,
                    const_string delimeters)

it should be pretty easy to unerstand what is written there.

> Tokenizer and regexp are essentially scanners. Doing something
> moderately interesting usually requires something more to make
> things easier.
> -Dave
> --
> "Some little people have music in them, but Fats, he was all music,
> and you know how big he was." -- James P. Johnson


Boost list run by bdawes at, gregod at, cpdaniel at, john at