Boost logo

Boost :

From: David A. Greene (greened_at_[hidden])
Date: 2002-01-17 12:06:32


rogeeff wrote:

>>used Spirit (yet). But I know from reading the Spirit mailing list
>>that Joel, et. al. have put in lots of thought on how to keep things
>>lightweight.
>
> What I meant is adding line #include "boost/spirit/spirit.hpp" in
> your code immidiately produce ~600k of include files (and this

Besides compile time, which I admit could be significant depending
on what bits of Spirit are being used, what's the problem here?

> implemented. Now by default parser should be able to hanle integer
> type, floating point values strings, boolean values (flags) and
> probably also some support for collection of them. I hope you agree
> that I do not need Spirit to parse integer value from string. Also

The values are not the problem. The problem is the myriad of
command-line formats. Is there an '=' between the option name
and value, a space, a comma, nothing? Is this a single letter option?
I multi-character option? One dash or two? Is any nesting involved?

> the framework should support an ability for user define it's own CLA
> class, with it's own parsing logic. And here he(user) can use
> whatever means he prefer to implement it (tokenizer,regexp,
> handwritten code, Spirit). But this is not a part of CLA parser
> framework - it's user code. There are also several other points:

But it _is_ part of the framework. I'd imagine a user would want
his or her extensions to blend nicely with the existing tools.
Spirit allows that.

> * I was not able to find out portability report for Spirit. Since CLA
> parsing is very basic facility, I should be able to compile it on
> majority of compilers.

This _is_ an issue with Spirit. The developers are working on it.
IMHO braindead compilers should not direct the design of libraries,
though of course every effort should be made to support them if
at all possible.

> * I could be wrong, but Spirit seems to be static compile-time
> facility. I.e. I can't load CLA scheme dynamicaly or read it from
> configuration file. Also how would it distributed definitions?

Spirit is not static. Dynamic grammars are possible.

> * Even if I do not load parser rules from external file, I still
> could be in a situation when I do not know parsing rules at my
> compile time, cause I am a library developer and parsing rules are
> provided by my users.

Er...huh? What's the issue here?

> Spirit is a parser framework. Command line/Configuration processing
> is a different realm with diffrent rules and priorities.

No, it isn't. A sufficiently rich command-line scheme almost

certainly requires more than a tokenizer. It's unfortunate that
most people, when they hear "parser," think "compiler." I'd guess
that 99% of parser usage is completely outside the realm of
language translation. Unfortunately, many of those parsers are
hand-coded and fragile.

> I would assume that thare are a lot of programmers that never had a
> need to parse a formal grammar that complex that they would need YACC
> or even simply knowledge of EBNF, though I do not question it's value.

My experience is that most programmers aren't familiar with YACC,
etc. and waste time writing custom parsers. Then when they discover
the available tools they either wish they'd had them earlier or
rewrite their software to use them.

>>You missed the point. Spirit is flexible enough for many, many
>>parsing tasks, including implementation of the command-line parser.
>>One need not expose the Spirit interface to the programmer. But it
>>makes a great deal of sense to me to use Spirit to do the actual
>>parsing.
>
> I did not get it. What will provide an interface and where do you see
> a place for Spirit? Specifically, with example.

Well, off the top of my head, I can imagine this (note: this is
probably not correct Spirit-wise since I've not yet used Spirit,
but it gives a general idea):

class CommandLine {
...
public:
    // Implemented with Spirit
    match parse(int argc, char **argv) const;

    template<class Val>
    void addOption(const std::string &name, Val &valueToSet);

    // Extend the parser in new and interesting ways
    ruleTag addRule(rule<> &newOptionConstruct);
    void removeRule(ruleTag ruleToRemove);
};

This is really off-the-cuff -- something better should be

provided. Here the only reason the programmer needs know
about Spirit is when invoking the addRule member.

> There are several questions:
>
> 1. How portable it is?
> 2. How it affect compilation time?
> 3. How it affect code size?

These are certainly valid questions. The only way we're going to
answer them is to experiment.

> Let do not forget that this framework is supposed to fit for majority
> of programs from tiny test program to complex and bulk process.

I was thinking about this the other day. Tiny test programs are

generally the ones that require very simple option syntaxes. Larger
programs with their many options require something more heavyweight.
Perhaps a CLA library should provide both. To start out, though, my
leaning woulkd be toward implementing all of it in spirit and then
moving to a specialized "simple" CLA class if that proves necessary.

>>I don't know what you mean by "arbitrary parsing." Spirit is at
>>least as flexible as YACC (well, except for left-recursion,
>>probably :)).
>
> How about error handling? what if I want to ignore an error and
> proceed. How one-liner below would handle it?

Don't read too much into Dan's example. Of course it doesn't
have error reporting. That would be placed in the semantic
actions.

 
>>regexp and tokenizer don't necessarily have enough power to do
>>the job. Consider the option format we use in our software:

> As I sad above you sure have to have an ability to implement your
> own "very complex" parsing and somehow plug it into the framework.
> But it should not be part of the framework.

Maybe. As long as it can be extended to accomplish what we
need, then I guess it's not a big deal if CLA proper doesn't
provide it. We can release our extensions as CLA++ or
something. :)

In any case, doing that sort of extension with Spirit is very,

very easy, unlike with tools such as YACC. One simply subtracts
rules, adds rules, etc.

>>>I would assume that command-line parser still will have MUCH more
>>>simpler interface.
>>
>>And by trading off flexibility for simplicity that parser can
>>still have the same interface but be implemented with Spirit.

> Id did not say that I agree with any flexibility tradeoff. But the
> interface should be as simple as possible: 1. plug parsing rule, 2.
> parse, 3. get value. Couple predefined parsing rules, like for
> interger, string e.t.c, plus an ability to plug arbitrary user-
> defined parsing rule. There could be variations and some
> enhancements, but something around this (in reality you would also
> want the framework to support several predefined kinds of argument
> identification for user to choose from).

Sounds good to me, though I don't necessarily agree that "get value"
should be the sole interface for doing things with options. Often
I want to execute an arbitrary piece of code when I parse an option
(or process it later in an abstract syntax tree, etc.).

Validation is also important. The programmer should be able

to specify dependencies (i.e. if this option is set, this other
one is implied or needed), provide validator objects to check
specified values and so forth. Some of this stuff could be
implemented via extensions.

The ability to plug in arbitrary parsing rules is exactly the

argument for using Spirit. With tokenizer or regex this will
be very painful.

>>I agree Spirit looks a little cryptic. In particular the assignment
>>of values is rather "magical" ("ref" should probably be named
>>"assign_to"). But even so, as someone who has experience with YACC
>>but zero with Spirit, I can follow this and understand what it means
>>(except for the bang, which I had to look up, but it makes sense
>>if you consider it an "|" with an empty left operand).

> How many programmers are familiar with YACC and how many would use
> CLA parser?

Let me rephrase the question as, "how many programmers should be
familiar with YACC or similar tools?" The answer, of course,
is most of them -- roughly the same set that would need a CLA parser.

>>I don't see a token_iterator(char *, char *) constructor. I don't
>>even see a "token_iterator" declared anywhere. Are you sure your
>>example works? Am I missing something?
>
> Wait, wait. It is not boost tokenizer (it my simple token_iterator I
> am using for old Sun compiler that can't handle boost one).

Ah, ok, I didn't follow that at all. That's important information.
I _was_ missing something. :)

> definition
> token_iterator( const_string string_to_tokenize,
> const_string delimeters)
>
> it should be pretty easy to unerstand what is written there.

Sure. It still doesn't have the power necessary to do what I
want a CLA parser to do.

                               -Dave

-- 
"Some little people have music in them, but Fats, he was all music,
  and you know how big he was."  --  James P. Johnson

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk