Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2003-05-26 04:44:23


Hi Gennadiy,

Gennadiy Rozental wrote:

> This is not a review of the supplied library. I am not gonna discuss
> docs
> even though they are scarce. I almost don't mention implementation/code/
> testing. I just want to express my opinion on design of the library.

Thanks for taking the time to comment!
I actually expected critical opinion from you, and while I don't think I can
change your mind, it's still a good idea to answer your concerns for the
benefit of other reviwvers.

But two preliminary points:

First, it's very sad that we both put time in different libraries for the
same task. I'm not sure we could have come to a conclusion, but IIRC,
you've decided to write your own after looking at 'cmdline' class from my
library only --- the most basic and internal one --- and not at anything
else. Had you drop an email saying what you don't like, we'd probably have
a more pleasant situation.

Second, I will not discuss what's right design and what's right programming
style per se. Such discussions tend to have no resolution. Instead, I'll
focus at relative benefits to user --- which is something more tangible.

> 1. Terminology
> Terminology chosen by author is confusing to me. In my understanding:
> term "parameter" - originated from 'formal parameter", formal
> description of the expected value
> term "option" - special case of parameter that describe
> parameters with boolean values
> term "argument" - originated from 'actual argument', actual value
> passed as expected value
> The way these terms are used in library does not seems to follow above
> definitions.

I'm not claiming the definitions are perfect, but I do claim it's not 100%
broken. For example, what "opt" in "getopt" stands for? The "Options"
library by Brad Appleton

   http://www.enteract.com/~bradapp/ftp/src/libs/C++/Options.html

and APR

   http://apr.apache.org/
   
Use the same definition of "option", which is different from yours. I would
say using "option" in different meaning can confuse a large user base.

> 2. Layered design.
> The task in hand indeed ask for the "layered design". But design
> presented in library does not come close to what I understand under this
> term. Let's see what are the layers and their purpose
> a. First layer cmdline/config_file. It allows parsing argc/argv
> producing list of pairs of string. It does not know anything about
> description correct types and so on. The question is: why would I want in
> practice to use this layer standalone, instead of program options with all
> types defaulted to string? Why not hide this class into implementation
> details?

Because you are not required to use this particual config_file class, for
example. As I've mentioned in other posts, should you want to use xml
config file, for example, you can write another config file parser, and add
it without any trouble.

> b. Second layer declared in design is class options_and_arguments.
> What
> this class adds to the cmdline functionality (other than unnecessary copy
> of huge object - not that performance is that important, but still) and
> why it could not be implementation detail of the class cmdline unclear to
> me.

Same concern here. Single options_and_argument is the class that all parsers
return. It's a single uniform interface. It that interface is moved to
cmdline, then... well... should be duplicate that interface (and
implementation) in config_file?

> c. Third level declared is program options. Looks like the only usable
> layer to me.
> d. Finally variable_map - third place where actual argument are stored
> (Number of places where actual value is stored looks redundant). Why would
> I want to use this class if I need only CLA parsing unclear to me.
> Multi-source case discussed below.

You don't *need* this class for parsing only CLA. However, this class is the
interface for getting typed options. The 'options_and_arguments' class is
simpler --- it deals only with strings.

> 3. Variety of different name/value syntaxes support
> Library struggle to support most used syntaxes for the CLA processing.
> I
> am not gonna say that any specific style is missing. Let's better see how
> this support is designed. class cmdline is responsible for all syntax
> related tasks. The only way to select non-default style is to provide some
> bitmask of desired styles in constructor (BTW, I did not see an example of
> usage custom style with variable map - is it possible?).

Hmm... custom style has nothing to do with variables map. You specify custom
style to command line parser, and varaibles map is created from results of
parsing. As such, it's completely independent.

> This has many
> problems:
> a. Limit number of supported styles die to bitmap limitation

The number of directly supported styles will always be limited. For all
others, users will be compelled to write they code --be it a function,
or a policy.

> b. Force same style for all parameters in command line (I could not
> define /h --my_long_param)

Do you need it in practice?

> c. This kind of design when one class is responsible for many different
> styles *always* leeds to macaroni style unreadable code. See on cmdline
> implementation: endless list of if, else, and switch. And with only
> minimal number of styles (IMO) supported. How it will look in a year when
> author adopt all requests from users I could only imagine. IMO this is not
> a kind of style we want to promote.

First, it would not adopt "all styles". In already supports a fair number of
them, and more elaborate styles will be supported by custom parsers.

Secondly, "macaroni", as you put it here, sounds a bit... unconstructive.
You don't even try to name the functions that you find poorly written.

Some functions are longer that they should be and can be splitted. But the
main logic is perfectly reasonable in my opinion.

> 4. Parameter definition.
>
> First of all I want to remark that there are in fact to ways to define a
> parameters in reviewed library. And both are not what I am looking.
> Program parameter definition is not repetitive task. We do it once,
> somewhere in program main most probably. As a developer or maintainer
> looking on unfamiliar code in need to see what are the parameters it
> supports, I want to read parameters descriptions like a poem, without need
> to solve a puzzle or guess implicit rules. From this rationale I believe
> that primary interface for parameter definition should be as verbose as
> possible. You may supply shortcuts for lazy programmer, but only as a
> second option. You may laugh, but even after several ours I spent with
> library I still could not figure out what second parameter in ("name", "",
> "") is. Parameter description should be clear to anybody who never read
> your docs.

So far, you're the first who raises this issue. Chuck Messenger even
proposes to use string for all arguments:

 desc.add_options()
         ...
         (param("magic|value|magic value for the program") >> magic)
         (param("val||some value with a default", defaultValue) >> val)
         ...
         ;

I think, that if you don't think maintenance programmer will ever read docs,
you can help him in a simpler way

  desc.add_options()
       // first argument is options name
       // second argument is parameter name
       // third argument is description
       ("output", "file", "output file")
       ....
       ;

This means that one comment line is added for first argument, and then you
can very convenienty specify option name.

> 5. Code organization
> Library implemented as an external library. IMO it's unacceptable for
> such basic component like CLA processing. test programs just a small part
> among all programs, but I still got numerous complains about absence of
> the inline version. Related topic is dependencies: IMO library
> implementation use too many dependencies, which became important once you
> use it inline. Though it's subtle point.

I subscribe to the view in Boost coding guidelines by David Abrahams, where
he says that inlining is either second or third (don't remember exactly)
most misused feature in C++.
If a library can be implemented externally, it should. Linking to a library
is minor problem. Inlining/templating will only

- add compilation time. E.g. profile build of a simple program which uses
BGL takes several minutes for me. Yes, program_options is not as large, but
what if you add 20 small libraries...

- add distribution size. E.g. if program_options is accepted in Boost, next
release will include it. It means that when developing applications for
Debian Linux, for example, I'll only have to add a single line to a single
file "Depends: boost-1.31.0". And my application can depend on program
options without compiling anything in.

> 6. cmdline - container or iterator?
>
> Looking on class cmdline one will be puzzled: what king of design if
> follow? Is it container of iterator? Seems like both. If I remember
> correctly this is how some ancient pre-standard, or RW containers were
> implemented. I don't believe this is kind of design we want to promote.

Alas, this comment seems non-constructive for me. I don't think that
the question is what kind of design should be promoted. What are the
problems with the current design? Can you list some interesting things that
would be possible if config_file were an iterator? What would be its
value_type? And what will operator++ do on error?

> 7. Overblown interfaces
>
> In my taste interface to many major classes unreasonably overblown, which
> is not a kind of practice we should promote. See cmdline and
> option_description interface for example.

I can only say that my taste differs. The level of cohesion of
option_description seems pretty high for me.

> 8. Separators handling
>
> Library silently assign special meaning to space symbol. The way parsing
> is designed there is no way to use parameter names with spaces inside.

Right -- in command line. Do you need it there?

> 9. Multi-pass parsing support
>
> I did not find any support for the multi-pass parsing. What I mean is that
> every parser parse CLAs and eats what it understand and leave the rest.
> It's important feature that require special support inside the library

Could you please clarify why it's important feature? The proposed library
allows to combine options_description instances, parse the command line,
and then pass the result to interested modules.

> 10 Errors notification: usage of exceptions
>
> I believe that there are 2 cases when we should prefer to use exceptions
> instead of return value to notify about the error:
> a. function already have a return value
> b. We expect that user will want to propagate the error and handle it
> in
> scope different from one above.
> In case if
> a. we could return the value
> b. we assume that user would want to catch the error immediately in
> place
> of call
> c. user in majority of cases does not bother about type of error
> I do not see any reason to use exceptions for errors notification. IOW
> why
> user would need to write code like this:
>
> try {
> foo();
> catch( std::exception const& ex ) {
> cout << ex.what();
> }
>
> I believe that with CLA parsing we are namely in later case. Library
> should be able to provide user with interface that does not throw on
> error, but instead provide error return value and access to the error
> message.

I believe that exceptions are more convenient means. The handling of options
is not one line. First you declare options, and can make error there. Then
you parse command line and can have errors there. Later, you might want to
parse config file, which can be inaccessible, or have invalid syntax, or
something else. So, in example above, you have several function invocation
in 'try' block. If you're use error codes, you'd have to check it after
calling each function.

> 11 Alternative search algorithms
> In a first statement of documentation author defines "program options" as
> name/value pairs. IMO this is way too limiting. For once we need to
> support positional parameters.

It is already requested and agreed to. However, I'm planning to introduce an
interface for positional parameters which is more usefull IMO than just
access by number. Say program accepts one required positional parameter and
one optional, and the usage message looks like:

   program [options] input_file [output_file]

then, it's possible to make program_options treat

   program foo.txt bar.txt

command line as containing *option* "input_file" with value "foo.txt". I
feel this is much simpler for user to ask: "what's the value of
'input_file'", then ask what the value of 1st positional parameter.
  
> But in general IMO it should be possible to
> support almost arbitrary parameters identification. Let say I want to pass
> list of Points,Lines and Circles as a command line arguments. Point syntax
> is (N1,N2). Line syntax is (Point1, Point2). Circle syntax is (Point, N).
> I expect CLA parser to be able to parse the input.

I've expressed this concern some time ago: I feel this example is either
artificial, or is not directly related to command line parsing. If you want
to parse arbitrary syntax, you need a parser.

> Moreover in many interfaces library assumes existence on long name,
> short
> name, which may not be the case even with name based parameter
> identification.

I don't understand what you mean here.

> 12. Usage of pointers
> The decision to use pointers instead of references for const string and
> value location passing looks incorrect to me.

Can you clarify what "const string". Why the decision is incorrect?

> 13 Config file format
> config file class supports basically only one style of parameter
> definition. Also in my opinion config file is more basic concept. It may
> be used for completely different purposes. We need more than one layer to
> correctly model config file. Accordingly presented design is unacceptable
> IMO.

The config_file class is not a masterpiece in any way. But anyone can write
a masterpiece and put it there --- without any effect on the rest of the
library.

> 14. Multiple sources support.
>
> Library declare support for multiple sources of program parameters. But
> IMO it is based on incorrect basic assumption that same parameter in
> different sources has the same name. Unfortunately I believe that it
> almost never the case. Each kind of source has different naming
> conventions and formatting rules. For example Boost.Test log level
> parameter would look like :
> -log_level - in command line
> BOOST_TEST_LOG_LEVEL - in environment
> boost/test/log/level - in registry
> boost::test::log::level - in configuration file

I observe that all those cases differ ony in decoration of the same two
words: "log" and "level". As such, it's possible to do source-specific
adjustments. I agree that such functionality should be considered,
probably at the same time when I'll be adding registry support.

> 15. External parsers support.
> Library only supports one extra external parser per parsing. That may
> be
> very limiting in case if I want to implement SAX like, callback based CLA
> parsing.

I don't understand what you mean. Could you clarify?

Thanks,
Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk