Boost logo

Boost :

From: James Kanze (kanze_at_[hidden])
Date: 2002-02-16 10:54:49


"Brey, Edward D" <EdwardDBrey_at_[hidden]> writes:

|> > From: James Kanze [mailto:kanze_at_[hidden]]
|> > |> Since we are embarking on a new class, not touching the
|> > |> existing printf, we can safely shed any baggage that gets in
|> > |> the way, knowing that it is easy to create a parallel
|> > |> printf-friendly class to support those who need one.

|> > There's not just a problem of code compatibility; there's user
|> > compatibility. People know printf; any new format, they'd have to
|> > learn. If there are strong reasons for abandonning printf format,
|> > then it should certainly be done, but it shouldn't be done on a
|> > whim.

|> Agreed. Whether a reason is strong enough depends on the current
|> general level of familiarity weighed against the benefits of the new
|> syntax.

I'll admit my prejudice. I've been programming C/C++ for twenty years.
I've implemented a standard library for C. I've followed not just C++
standardization, but also C. I've regularly used the X/Open extension
to printf. My level of familiarity with the printf syntax is *very*
high.

That said, I don't think that the printf syntax is that bad. In some
ways, it represents a middle road between COBOL's PIC clauses, and
Fortran's formatting specifiers, with their implicit loops and
everything. Having used all three (Basic's PRINT USING, rather than
COBOL's PIC, but the philosophy is the same), I would conclude that, at
least for laying out tables, the only one which is in any way
comfortable is the PIC/PRINT USING format, in which the format specifier
is exactly the same length as the text it generates. For free format,
something like the printf format is fine. And the Fortran type format
is really only good for hackers:-). (In the late seventies, I was proud
that I had managed to output an entire chessboard, in ASCII, using +,-,
and | to deliminate the squares, in a single write statement, by means
of an appropriate format. Totally unreadable, of course, but back in
the seventies, we were real programmers:-).)

|> > |> Lets look at the alternatives. A very clear option is
|> > |> function calls, e.g. .with(arg1). It also has no precedence
|> > |> problems. Unfortunately, this clarity comes with quite a bit
|> > |> of verbosity and makes arg1 jump out at you less, so the code
|> > |> is a little harder to visually parse.

|> > My opinion is exactly the opposite. The syntax clearly states
|> > that arg1 is being used by format, and IMHO, is the easiest to
|> > visually parse.

|> I am considering clarity to be a separate issue from the ease with
|> which a line can be visually parsed. Consider this syntax:

|> Take the string "from x to y" and substitute for x the contents of
|> the variable lower_bound and, furthermore, substitute for y the
|> contents of the variable upper_bound.

|> It is very clear, but not easily visually parsed. This is an
|> extreme representation of the slight problem that "with" causes: the
|> more text, the generally less easy it is to find the important text.

Point taken. I would certainly object if the name of the function was
"parameterToBeFormatted". I don't have too much problem with "with",
and I generally prefer named functions to operator overloading unless
the semantics of the operator match *very* closely. Were I designing a
new map class today, for example, I don't think it would have an
operator[].

In practice, a lot depends on what you are doing. When I first designed
my AssocArray class, 10 years ago, it didn't occur to me that it
couldn't have an operator[], or that operator[] wouldn't insert, because
that's the way the associative arrays in AWK worked. From hindsight, I
can say that this *is* the right decision for a class designed to allow
C++ to replace AWK or Perl, e.i. knocking out small but efficient
applications quickly. I can also say that for more large scale
applications, at least the ones I've worked on, having operator[]
insert, instead of treating the absense of an entry as an error, has
caused no end of problems. If I were doing it today, I'd use named
functions instead of [] (although this is a close call), and if there
were an operator[], it wouldn't insert, but would treat accessing an
inexistant member as an error.

|> > A possible alternative might be operator() -- the function call
|> > syntax without the function name:
|> > format( "..." )( arg1 )( arg2 ) ;
|> > I prefer it with "with", but that's just my opinion.

|> Indeed, operator() has been suggested, and has the advantage of
|> focusing attention on the arguments. Operator[] does likewise.

|> > |> Operator[] is similar, essentially the abbreviated version.
|> > |> It has a logical mapping into the traditional use of the
|> > |> operator, in that is it maps into a virtual stream space,
|> > |> which theoretically contains the stream representation of all
|> > |> possible argument values. In this respect, the usual thoughts
|> > |> on operator[] work, such as applying one set of []s on the
|> > |> result of the former. Each occurrence of [arg] logically
|> > |> replaces its corresponding reference in the context string and
|> > |> returns resulting context string for the next [arg]. As
|> > |> intuitive as std::map? Well, no, not quite. But a lot better
|> > |> than operator%.

|> > Almost anything would be better than operator%, at least from a
|> > visual, suggestive point of view. Let's face it, the choice of <<
|> > for ostream was made at least partially on the basis of the
|> > suggestiveness of its graphics.

|> Agreed.

|> > IMHO, this is the biggest problem with []. The graphics don't
|> > suggest anything. (This is also why I like the named function,
|> > with. With says something.) A perhaps worse problem is that all
|> > C/C++ programmers are very familiar with the operator, and
|> > associate a very specific semantics with it: the result of
|> > applying operator[] is a part of what you applied it to. (This is
|> > the most general sense I can find. For me, it is a projection,
|> > and any other use, including that in std::map, is overloading
|> > abuse. But even my clients don't agree, and my own map classes
|> > supported [], at their demand. So maybe I'm being too strict
|> > here.)

|> I think that it is generally agreed that operator[] is a projection.
|> The difference of opinion appears to lie in what can be considered a
|> projection. For an array, it is from index to reference. For map,
|> it is from key to reference. For format, it is from unsubstituted
|> string to substituted string. In all three cases, there is a finite
|> set of possible elements, of which one is projected to. Often, the
|> set is fully contained in memory. However, for interval arrays and
|> maps, and for format, the set is never physically present in its
|> entirety. This doesn't change the validity of the concept.

I understand this point of view. I'm not a mathematician, so my point
of view may not be mathematically correct, but I expect a projection to
apply to a collection (ordered or not), and to return a subset of that
collection. This is also the behavior I expect of an operator[].

My impression, from discussions with other C++ programmers where I've
worked, is that most programmers have an even narrower view of
operator[], and consider it an indexing operator. They accept its use
in std::map, for example, either because they don't really care about
the misuse of operator overloading, or because they consider it
indexing, in so far as the result is a single value. They generally
consider the fact that it automatically inserts if the value isn't there
a misuse of operator overloading; an operator[] should not change the
collection it is applied to. (They obviously didn't have my exposure to
AWK before learning C++;-).)

|> In case the preceding paragraph wasn't clear with regard to format,
|> here's an example to demonstrate the concept, given x = 123 and y =
|> 456:

|> format("from [1] to [2]") [x] [y]
|> = format("from 123 to [2]") [y]
|> = format("from 123 to 456")

|> This demonstrates successive applications of the projection
|> operator.

As I say, I don't consider this a projection. Maybe I'm just not
mathematically enough inclined. I would consider this a series of
applications (or functions -- I'm not sure of the correct English
translation). And in the little bit of mathematics I have studied,
applications were generally named (although in mathematics, the names
were typically something like f or g).

By your reasoning, one could argue that we could use [] as a power
operator. After all, you apply it to something to get something else:

|> To understand any given application, consider:

|> format("from 123 to [2]") [y]

|> as choosing from this set of projections, as indexed by y:

|> 0: "from 123 to 0"
|> 1: "from 123 to 1"
|> 2: "from 123 to 2"

|> and so on.

I find that you're stretching it some. As I say, the same argument
would apply to defining double::operator[]( double ) as a power
function.

|> > |> Another nice feature of operator[] is that it allows
|> > |> manipulators to be compact and easily visually parsed,
|> > |> although it's only good for persistent manipulators. For
|> > |> non-persistent manipulators, you want something that binds
|> > |> tightly to the argument that it is manipulating.

|> > Could you please explain why you even want manipulators with
|> > format.

|> The desired feature is to allow another part of the program to pass
|> in a formatting style. The idea is that a function can generically
|> generate some output, with a portion of the formatting style being a
|> parameter. For example a function may know how many parameters it
|> has and where to put them, as well as the surrounding text, but it
|> may not know whether to display hex or decimal. Manipulators are
|> one way of accomplishing this, but not the only way.

OK. I've never needed to do this, but I can see that it might be useful
in certain special cases. (It can be done, of course, by using format
to generate the format string. I've done something similar in AWK a
couple of times. It works fine, but it isn't the most transparent way
to code.)

|> > |> Even the function style, i.e.

|> > |> format("Banana price: [1]/[2] tax: [3]%")
|> > |> .apply_next(width(3)).with(cost).apply_next(width(4)).with(unit)
|> > |> .apply_next(width(4)).apply_next(precision(1)).with(tax)

|> > |> doesn't show the binding as precisely as something like:

|> > |> format("Banana price: [1]/[2] tax: [3]%")
|> > |> [manip(width(3)), cost] [manip(width(3)), unit]
|> > |> [manip(width(3)), manip(precision(1)), tax]

|> > Who said that with was verbose:-)?

|> :-) Indeed, manipulators can get pretty ugly compared with putting
|> such format info directly into the format string. I like having the
|> feature for when you need programmatic control, but I certainly
|> don't see it as a substitute for the a concise specification in the
|> format string. The example was not meant as a highlight film for
|> manipulators.

|> > My existing practice would write this either:

|> > GB_Format( "Banana price: %1$3d/%2$3d tax: %3$4.1f" )
|> > .with( cost )
|> > .with( unit )
|> > .with( tax ) ;
|> > or
|> > GB_Format( "Banana price: %1$d/%2$d tax: %3$f" )
|> > .with( 3, cost )
|> > .with( 3, unit )
|> > .with( 4, 1, tax ) ;

|> > Generally, I would prefer the first unless the width and the
|> > precision were being dynamically calculated. (The presence of a
|> > '*' as a width or precision specifier in this case is optional; if
|> > a '*' occurs in the format, however, the with function MUST
|> > specify the corresponding field. If I remember right, at least.)

|> I agree with you that it is easiest to put the width right in the
|> format string, if it is fixed. The idea of using an overload on
|> with()/operator() for the width is interesting. I would be
|> interesting to see how it would play with all the other formatting
|> requirements. It might be general enough. If also restricts you
|> operator choice, of course, too. ;-)

Answering the last question first: I chose to use with(), and the
solution was obvious. I haven't considered the possibility of anything
other than width and precision, for the somewhat silly reason that I've
never needed them. I am bothered by the fact that in my implementation,
you cannot specify the precision in "with" without specifying the
width. This could be made to work by means of dummy parameters, but
seemed more effort than it was worth (and made things even more
verbose).

If all of the manipulators derived from a common type, and there were
some way of chaining them, this should be possible. I'd have to think
about it some, though.

|> > |> Since operator[] uses a closing token other than ')', it makes
|> > |> the groupings a little easier to see than the function call
|> > |> method.

|> > This is at least partially true. On the other hand, it is the ')'
|> > which we are used to see delimiting parameters, and it is the ')'
|> > which most editors understand for aligning (supposing a very long
|> > expression).

|> I wouldn't say that ')' delimits parameters, exactly. It indicates
|> that a parameter is the last one. That is why it is a bit of a
|> misfit for format. The editor point is a good one, though, although
|> only occasionally significant.

Agreed. I tend to use the alignment above. Which isn't directly
supported by emacs (the only editor I know except for vi and ed, neither
of which do any syntax formatting).

|> > |> Existing practice definitely helps with recognizably. What's
|> > |> interesting It turns out that '%' and '[]' both have a lot of
|> > |> existing practice, each from different circles. For '[]', the
|> > |> existing practice is in everyday writing. The brackets are
|> > |> commonly used for footnotes a plain text environment, which
|> > |> makes them familiar and easily recognized. [1] This usage
|> > |> actually gives brackets a leg up, since programmer and
|> > |> non-programmers (read translators) alike read everyday English
|> > |> (or other languages), whereas only experiences programmers or
|> > |> translators will have encountered '%' for substitution.

|> > That would only be true if the semantics of their use corresponded
|> > to what was expected. Since it doesn't, this is a strong counter
|> > argument against [].

|> Fortunately, we don't have to worry about the issue, since format's
|> use of operator[] fits existing practice. The only difference is
|> its application to a new domain. I don't think there is any reason
|> to be worried about an unexpected result. What else could one
|> expect from the application of operator[] to a format object?

I would expect an error from the compiler:-). Seriously, I don't know.
What else would you expect of applying [] to a double, except a power
function.

I think most programmers would expect it to give them the nth character
of the formatted string. But I'm just guessing about this.

|> > I've seen % widely used for this in a number of cases. Many of
|> > them, of course, were probably influenced by C. But IMHO, it has
|> > an advantage in that it probably doesn't occur naturally in the
|> > contexts where a format specifier would occur -- in all natural
|> > language text, it will be preceded by a number, and how often do
|> > you use a format specifier preceded by a number.

|> I'm curious where else you've seen it.

I can't really remember; it's just an impression. And I suspect that
most of them were influenced by printf. (My world has been mostly C/C++
and Unix for the past twenty years.) I think that most of the cases
were substitution in a configuration file, or something along those
lines.

|> Do any of those languages consider "%" escaped if it is preceded by
|> a number?

No.

|> How is the fact that it is commonly preceded by number relevant?

Simply for the human reader -- if it is used as a format, it will not
normally be preceded by a number; if it is used literally, it will be.

|> > |> - Escaping is minimized: Only literal text of the form "[{digit}"
|> > |> would need be escaped.

|> > Regretfully, such text is not particularly rare.

|> Hmm. My experience is that '%' shows up more often than '[{digit}'.
|> Where have you run into the latter?

I'm sorry. I was thinking of simply the [, which does show up rather
frequently (often followed by a digit), at least in my work.

|> > |> I think that Boost has an excellent opportunity to put some
|> > |> powerful new syntaxes into existing practice. It's easy to
|> > |> say "I'm not used to it" and be stuck with no improvements.
|> > |> (Witness how C++'s syntax still puts return types in front of
|> > |> function names (rather than after the parameters), just
|> > |> because it was done that way in K&R C when most return types
|> > |> were int and hence omitted.) Far better is to look for the
|> > |> technically best solution and put forth a mind to use it,
|> > |> especially when it is reminiscent of existing practice in
|> > |> regular English usage.

|> > If we really want to invent a new syntax, we should try for
|> > something really usable, along the lines of Basic's print using,
|> > or Cobol's picture clauses. Short of that, I can see no reason to
|> > abandon a known syntax for an unknown, which retains most of the
|> > defaults of the known. (It's sort of like Java: no pretentions of
|> > C compatibility, but most of the defaults of C syntax anyway.)

|> By all means. If you have an idea on how to take the best of Basic
|> and Cobol and put it to work, now is the time.

I've already got more ideas than time. None of this interests my
employer, nor my wife, who claims most of my time out of work.

|> My idea of leveraging everyday English notation is just one possible
|> asset. Putting the best of the ideas together, we should be able to
|> make a significant improvement over printf.

|> > |> Here's a summary of the format syntax rationale:

|> > |> Pros for operator%:
|> > |> - Familiar in context string
|> > |> - Familiar for separating a single argument (cf. Python).

|> > |> Pros for operator[] (and associated [] in context string):
|> > |> - Succinct (as opposed to named functions).
|> > |> - Guaranteed desired precedence.
|> > |> - Quickly parsed visually, especially when persistent
|> > |> manipulators are involved.
|> > |> - Facilitates tightly bound manipulators.
|> > |> - Logically maps to conventional use of operator[].
|> > |> - Corresponds to everyday publishing practice.
|> > |> - Preceding and following non-space text is easily differentiated.
|> > |> - Encapsulates formatting options, free of ambiguity.
|> > |> - Rarely would need escaping in real-world programs.

|> > |> [1] When you get to a footnote (if you're interested in
|> > |> reading it), you jump down to where it is defined, and
|> > |> mentally substitute it into the text where the reference
|> > |> appeared - exactly what format does.

|> > I'm somewhat confused in all this about when you are talking about
|> > the format specifier, and when you are talking about how the
|> > arguments are being passed. For example, in your pros for
|> > operator[], the first four seem to apply to the parameter passing,
|> > whereas the rest apply to the format specifier. It would be
|> > easier to discuss if you'd keep the two separate.

|> A desirable feature it to use similar syntax for both the format
|> specifier and argument delimiting. So I'm not so sure that it is
|> practical to deal with them completely separately.

Why is that a desirable feature? What does it buy us?

-- 
James Kanze                                mailto:kanze_at_[hidden]
Conseils en informatique orientée objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk