Boost logo

Boost :

From: Brey, Edward D (EdwardDBrey_at_[hidden])
Date: 2002-02-11 10:04:37


> From: James Kanze [mailto:kanze_at_[hidden]]
> |> Since we are embarking on a new class, not
> |> touching the existing printf, we can safely shed any baggage that
> |> gets in the way, knowing that it is easy to create a parallel
> |> printf-friendly class to support those who need one.
>
> There's not just a problem of code compatibility; there's user
> compatibility. People know printf; any new format, they'd have to
> learn. If there are strong reasons for abandonning printf
> format, then
> it should certainly be done, but it shouldn't be done on a whim.

Agreed. Whether a reason is strong enough depends on the current general
level of familiarity weighed against the benefits of the new syntax.

> |> Lets look at the alternatives. A very clear option is function
> |> calls, e.g. .with(arg1). It also has no precedence problems.
> |> Unfortunately, this clarity comes with quite a bit of
> verbosity and
> |> makes arg1 jump out at you less, so the code is a little harder to
> |> visually parse.
>
> My opinion is exactly the opposite. The syntax clearly
> states that arg1
> is being used by format, and IMHO, is the easiest to visually parse.

I am considering clarity to be a separate issue from the ease with which a
line can be visually parsed. Consider this syntax:

  Take the string "from x to y" and substitute for x the contents of the
variable lower_bound and, furthermore, substitute for y the contents of the
variable upper_bound.

It is very clear, but not easily visually parsed. This is an extreme
representation of the slight problem that "with" causes: the more text, the
generally less easy it is to find the important text.

> A possible alternative might be operator() -- the function call syntax
> without the function name:
> format( "..." )( arg1 )( arg2 ) ;
> I prefer it with "with", but that's just my opinion.

Indeed, operator() has been suggested, and has the advantage of focusing
attention on the arguments. Operator[] does likewise.

> |> Operator[] is similar, essentially the abbreviated
> version. It has
> |> a logical mapping into the traditional use of the
> operator, in that
> |> is it maps into a virtual stream space, which
> theoretically contains
> |> the stream representation of all possible argument
> values. In this
> |> respect, the usual thoughts on operator[] work, such as
> applying one
> |> set of []s on the result of the former. Each occurrence of [arg]
> |> logically replaces its corresponding reference in the
> context string
> |> and returns resulting context string for the next [arg]. As
> |> intuitive as std::map? Well, no, not quite. But a lot
> better than
> |> operator%.
>
> Almost anything would be better than operator%, at least from
> a visual,
> suggestive point of view. Let's face it, the choice of << for ostream
> was made at least partially on the basis of the suggestiveness of its
> graphics.

Agreed.
>
> IMHO, this is the biggest problem with []. The graphics don't suggest
> anything. (This is also why I like the named function, with.
> With says
> something.) A perhaps worse problem is that all C/C++ programmers are
> very familiar with the operator, and associate a very
> specific semantics
> with it: the result of applying operator[] is a part of what
> you applied
> it to. (This is the most general sense I can find. For me, it is a
> projection, and any other use, including that in std::map, is
> overloading abuse. But even my clients don't agree, and my own map
> classes supported [], at their demand. So maybe I'm being too strict
> here.)

I think that it is generally agreed that operator[] is a projection. The
difference of opinion appears to lie in what can be considered a projection.
For an array, it is from index to reference. For map, it is from key to
reference. For format, it is from unsubstituted string to substituted
string. In all three cases, there is a finite set of possible elements, of
which one is projected to. Often, the set is fully contained in memory.
However, for interval arrays and maps, and for format, the set is never
physically present in its entirety. This doesn't change the validity of the
concept.

In case the preceding paragraph wasn't clear with regard to format, here's
an example to demonstrate the concept, given x = 123 and y = 456:

  format("from [1] to [2]") [x] [y]
= format("from 123 to [2]") [y]
= format("from 123 to 456")

This demonstrates successive applications of the projection operator. To
understand any given application, consider:

format("from 123 to [2]") [y]

as choosing from this set of projections, as indexed by y:

0: "from 123 to 0"
1: "from 123 to 1"
2: "from 123 to 2"

and so on.

> |> Another nice feature of operator[] is that it allows
> manipulators to
> |> be compact and easily visually parsed, although it's only good for
> |> persistent manipulators. For non-persistent
> manipulators, you want
> |> something that binds tightly to the argument that it is
> |> manipulating.
>
> Could you please explain why you even want manipulators with format.

The desired feature is to allow another part of the program to pass in a
formatting style. The idea is that a function can generically generate some
output, with a portion of the formatting style being a parameter. For
example a function may know how many parameters it has and where to put
them, as well as the surrounding text, but it may not know whether to
display hex or decimal. Manipulators are one way of accomplishing this, but
not the only way.

> |> Even the function style, i.e.
>
> |> format("Banana price: [1]/[2] tax: [3]%")
> |> .apply_next(width(3)).with(cost).apply_next(width(4)).with(unit)
> |> .apply_next(width(4)).apply_next(precision(1)).with(tax)
>
> |> doesn't show the binding as precisely as something like:
>
> |> format("Banana price: [1]/[2] tax: [3]%")
> |> [manip(width(3)), cost] [manip(width(3)), unit]
> |> [manip(width(3)), manip(precision(1)), tax]
>
> Who said that with was verbose:-)?

:-) Indeed, manipulators can get pretty ugly compared with putting such
format info directly into the format string. I like having the feature for
when you need programmatic control, but I certainly don't see it as a
substitute for the a concise specification in the format string. The
example was not meant as a highlight film for manipulators.

> My existing practice would write this either:
>
> GB_Format( "Banana price: %1$3d/%2$3d tax: %3$4.1f" )
> .with( cost )
> .with( unit )
> .with( tax ) ;
> or
> GB_Format( "Banana price: %1$d/%2$d tax: %3$f" )
> .with( 3, cost )
> .with( 3, unit )
> .with( 4, 1, tax ) ;
>
> Generally, I would prefer the first unless the width and the precision
> were being dynamically calculated. (The presence of a '*' as
> a width or
> precision specifier in this case is optional; if a '*' occurs in the
> format, however, the with function MUST specify the corresponding
> field. If I remember right, at least.)

I agree with you that it is easiest to put the width right in the format
string, if it is fixed. The idea of using an overload on with()/operator()
for the width is interesting. I would be interesting to see how it would
play with all the other formatting requirements. It might be general
enough. If also restricts you operator choice, of course, too. ;-)

> |> Since operator[] uses a closing token other than ')', it makes the
> |> groupings a little easier to see than the function call method.
>
> This is at least partially true. On the other hand, it is
> the ')' which
> we are used to see delimiting parameters, and it is the ')' which most
> editors understand for aligning (supposing a very long expression).

I wouldn't say that ')' delimits parameters, exactly. It indicates that a
parameter is the last one. That is why it is a bit of a misfit for format.
The editor point is a good one, though, although only occasionally
significant.

> |> Existing practice definitely helps with recognizably. What's
> |> interesting It turns out that '%' and '[]' both have a lot of
> |> existing practice, each from different circles. For '[]', the
> |> existing practice is in everyday writing. The brackets
> are commonly
> |> used for footnotes a plain text environment, which makes them
> |> familiar and easily recognized. [1] This usage actually gives
> |> brackets a leg up, since programmer and non-programmers (read
> |> translators) alike read everyday English (or other languages),
> |> whereas only experiences programmers or translators will have
> |> encountered '%' for substitution.
>
> That would only be true if the semantics of their use corresponded to
> what was expected. Since it doesn't, this is a strong
> counter argument against [].

Fortunately, we don't have to worry about the issue, since format's use of
operator[] fits existing practice. The only difference is its application
to a new domain. I don't think there is any reason to be worried about an
unexpected result. What else could one expect from the application of
operator[] to a format object?

> I've seen % widely used for this in a number of cases. Many
> of them, of
> course, were probably influenced by C. But IMHO, it has an
> advantage in
> that it probably doesn't occur naturally in the contexts
> where a format
> specifier would occur -- in all natural language text, it will be
> preceded by a number, and how often do you use a format specifier
> preceded by a number.

I'm curious where else you've seen it. Do any of those languages consider
"%" escaped if it is preceded by a number? How is the fact that it is
commonly preceded by number relevant?

> |> - Escaping is minimized: Only literal text of the form "[{digit}"
> |> would need be escaped.
>
> Regretfully, such text is not particularly rare.

Hmm. My experience is that '%' shows up more often than '[{digit}'. Where
have you run into the latter?

> |> I think that Boost has an excellent opportunity to put
> some powerful
> |> new syntaxes into existing practice. It's easy to say
> "I'm not used
> |> to it" and be stuck with no improvements. (Witness how
> C++'s syntax
> |> still puts return types in front of function names (rather than
> |> after the parameters), just because it was done that way in K&R C
> |> when most return types were int and hence omitted.) Far better is
> |> to look for the technically best solution and put forth a mind to
> |> use it, especially when it is reminiscent of existing practice in
> |> regular English usage.
>
> If we really want to invent a new syntax, we should try for something
> really usable, along the lines of Basic's print using, or Cobol's
> picture clauses. Short of that, I can see no reason to
> abandon a known
> syntax for an unknown, which retains most of the defaults of
> the known.
> (It's sort of like Java: no pretentions of C compatibility,
> but most of
> the defaults of C syntax anyway.)

By all means. If you have an idea on how to take the best of Basic and
Cobol and put it to work, now is the time. My idea of leveraging everyday
English notation is just one possible asset. Putting the best of the ideas
together, we should be able to make a significant improvement over printf.

> |> Here's a summary of the format syntax rationale:
>
> |> Pros for operator%:
> |> - Familiar in context string
> |> - Familiar for separating a single argument (cf. Python).
>
> |> Pros for operator[] (and associated [] in context string):
> |> - Succinct (as opposed to named functions).
> |> - Guaranteed desired precedence.
> |> - Quickly parsed visually, especially when persistent
> manipulators are
> |> involved.
> |> - Facilitates tightly bound manipulators.
> |> - Logically maps to conventional use of operator[].
> |> - Corresponds to everyday publishing practice.
> |> - Preceding and following non-space text is easily differentiated.
> |> - Encapsulates formatting options, free of ambiguity.
> |> - Rarely would need escaping in real-world programs.
>
> |> [1] When you get to a footnote (if you're interested in
> reading it), you
> |> jump down to where it is defined, and mentally substitute
> it into the text
> |> where the reference appeared - exactly what format does.
>
> I'm somewhat confused in all this about when you are talking about the
> format specifier, and when you are talking about how the arguments are
> being passed. For example, in your pros for operator[], the
> first four
> seem to apply to the parameter passing, whereas the rest apply to the
> format specifier. It would be easier to discuss if you'd keep the two
> separate.

A desirable feature it to use similar syntax for both the format specifier
and argument delimiting. So I'm not so sure that it is practical to deal
with them completely separately.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk