Boost logo

Boost :

From: James Kanze (kanze_at_[hidden])
Date: 2002-02-09 09:23:28


"Brey, Edward D" <EdwardDBrey_at_[hidden]> writes:

|> I've been putting some thought into the syntax of Boost.Format, and
|> have come to see that the best syntax going forward is one that does
|> not try to retain compatibility with printf as a high priority.
|> Backward compatibility can certainly be useful, but cases where
|> compatibility is needed differ from cases of new code, where a
|> syntax that can be optimal, in the sense of not needed to bend based
|> on legacy decisions. Since we are embarking on a new class, not
|> touching the existing printf, we can safely shed any baggage that
|> gets in the way, knowing that it is easy to create a parallel
|> printf-friendly class to support those who need one.

There's not just a problem of code compatibility; there's user
compatibility. People know printf; any new format, they'd have to
learn. If there are strong reasons for abandonning printf format, then
it should certainly be done, but it shouldn't be done on a whim.

|> I'm going to assume for the moment that the formal argument syntax
|> (format(spec, arg1, arg2)) isn't going to be technically feasible.
|> (Even if so, it doesn't handle persistent manipulators all that
|> well.)

I'm curious. Why would you have persistent manipulators for a format
class? The issue of persistent manipulators only becomes relevant if
you emulate ostream syntax, and even then, I'm not sure how relevant.
Who's going to write:
    format( "%f" ) << fixed << setprecision( 2 ) << setw( 6 ) << x ;
if they can write:
    format( "%6.2f" ) << x ;

|> Taking that option off the table, let's look at our other options.
|> Consider operator% and operator<<. Operator<< is confusing in a
|> larger cout statement. operator% is not familiar, and has no
|> intuitive mapping to an established use of the operator. Both of
|> have precedence problems. On the plus side, operator% is kind of
|> similar to Python, but only to an extent: as soon as you go beyond
|> one parameter, the syntax starts looking quite a bit different, as
|> there is no precedent for chained '%'s.

|> Lets look at the alternatives. A very clear option is function
|> calls, e.g. .with(arg1). It also has no precedence problems.
|> Unfortunately, this clarity comes with quite a bit of verbosity and
|> makes arg1 jump out at you less, so the code is a little harder to
|> visually parse.

My opinion is exactly the opposite. The syntax clearly states that arg1
is being used by format, and IMHO, is the easiest to visually parse.

A possible alternative might be operator() -- the function call syntax
without the function name:
    format( "..." )( arg1 )( arg2 ) ;
I prefer it with "with", but that's just my opinion.

|> Operator[] is similar, essentially the abbreviated version. It has
|> a logical mapping into the traditional use of the operator, in that
|> is it maps into a virtual stream space, which theoretically contains
|> the stream representation of all possible argument values. In this
|> respect, the usual thoughts on operator[] work, such as applying one
|> set of []s on the result of the former. Each occurrence of [arg]
|> logically replaces its corresponding reference in the context string
|> and returns resulting context string for the next [arg]. As
|> intuitive as std::map? Well, no, not quite. But a lot better than
|> operator%.

Almost anything would be better than operator%, at least from a visual,
suggestive point of view. Let's face it, the choice of << for ostream
was made at least partially on the basis of the suggestiveness of its
graphics.

IMHO, this is the biggest problem with []. The graphics don't suggest
anything. (This is also why I like the named function, with. With says
something.) A perhaps worse problem is that all C/C++ programmers are
very familiar with the operator, and associate a very specific semantics
with it: the result of applying operator[] is a part of what you applied
it to. (This is the most general sense I can find. For me, it is a
projection, and any other use, including that in std::map, is
overloading abuse. But even my clients don't agree, and my own map
classes supported [], at their demand. So maybe I'm being too strict
here.)

|> Another nice feature of operator[] is that it allows manipulators to
|> be compact and easily visually parsed, although it's only good for
|> persistent manipulators. For non-persistent manipulators, you want
|> something that binds tightly to the argument that it is
|> manipulating.

Could you please explain why you even want manipulators with format.

|> Even the function style, i.e.

|> format("Banana price: [1]/[2] tax: [3]%")
|> .apply_next(width(3)).with(cost).apply_next(width(4)).with(unit)
|> .apply_next(width(4)).apply_next(precision(1)).with(tax)

|> doesn't show the binding as precisely as something like:

|> format("Banana price: [1]/[2] tax: [3]%")
|> [manip(width(3)), cost] [manip(width(3)), unit]
|> [manip(width(3)), manip(precision(1)), tax]

Who said that with was verbose:-)?

My existing practice would write this either:

    GB_Format( "Banana price: %1$3d/%2$3d tax: %3$4.1f" )
        .with( cost )
        .with( unit )
        .with( tax ) ;
or
    GB_Format( "Banana price: %1$d/%2$d tax: %3$f" )
        .with( 3, cost )
        .with( 3, unit )
        .with( 4, 1, tax ) ;

Generally, I would prefer the first unless the width and the precision
were being dynamically calculated. (The presence of a '*' as a width or
precision specifier in this case is optional; if a '*' occurs in the
format, however, the with function MUST specify the corresponding
field. If I remember right, at least.)

|> Since operator[] uses a closing token other than ')', it makes the
|> groupings a little easier to see than the function call method.

This is at least partially true. On the other hand, it is the ')' which
we are used to see delimiting parameters, and it is the ')' which most
editors understand for aligning (supposing a very long expression).

|> Granted, in the syntaxes above, there is enough manipulator text
|> overhead that no syntax can really look clean - definitely an
|> argument for allowing format specifies within the context string.

I'll admit that it didn't occur to me that anyone could think
otherwise:-).

Of course, even when the specifier does give the format, you still need
a possibility of specifying the width and the precision dynamically.

|> What of the existing practice argument for '%'. It is used in C
|> (and C++), Python, and CLISP. But existing practice in and of
|> itself isn't very important. The real question is whether it is a
|> beneficial syntax. There are two aspects to consider: How
|> recognizable/intuitive is it? And does it have beneficial
|> syntactical properties?

|> Existing practice definitely helps with recognizably. What's
|> interesting It turns out that '%' and '[]' both have a lot of
|> existing practice, each from different circles. For '[]', the
|> existing practice is in everyday writing. The brackets are commonly
|> used for footnotes a plain text environment, which makes them
|> familiar and easily recognized. [1] This usage actually gives
|> brackets a leg up, since programmer and non-programmers (read
|> translators) alike read everyday English (or other languages),
|> whereas only experiences programmers or translators will have
|> encountered '%' for substitution.

That would only be true if the semantics of their use corresponded to
what was expected. Since it doesn't, this is a strong counter argument
against [].

|> For the question of syntactical properties, '%' starts with a leg
|> up. After all, it was chosen for C for a good reason, right? And
|> then by subsequent languages. Well, kind of. '%' was chosen by C
|> assuming that a letter was going to follow (not necessarily
|> immediately). Given the use of letters, [u] just doesn't have the
|> same natural look as [1] does, so it makes some sense. Also, in
|> those days, squeezing out the last character was more important than
|> it is now. Why did more modern languages adopt the same style? In
|> Python's case, it's because Python likes to copy C. Many of
|> Python's libraries are thin wrappers on C, without much redesign
|> effort put into tailoring the interface to leverage the Python
|> environment (cf. the lack of positional formatting for strings). I
|> can't really speak to CLISP's usage of '%'; however, as flexible of
|> a language that Lisp is, user-friendliness of syntax isn't its
|> forte, and so I don't think there is much to be gleaned from that
|> example.

I've seen % widely used for this in a number of cases. Many of them, of
course, were probably influenced by C. But IMHO, it has an advantage in
that it probably doesn't occur naturally in the contexts where a format
specifier would occur -- in all natural language text, it will be
preceded by a number, and how often do you use a format specifier
preceded by a number.

|> Therefore, as we move forward, we can choose to join the crowd that
|> follows the C way. This helps C++ be close to C. However, I think
|> it would be closer to C than is possible, in that it would be
|> impossible to leverage some syntactical advantages without moving to
|> a more modern format specification. With '[]', the following is
|> possible:

|> - Preceding and following non-space characters are easy to
|> distinguish from the substitution placeholder.

|> = Formatting options are well encapsulated, while still keeping the
|> position indicator in the front. For example [1:8.3>] means
|> parameter 1, total width of 8 characters, 3 digits of precision,
|> right justified.

|> - Escaping is minimized: Only literal text of the form "[{digit}"
|> would need be escaped.

Regretfully, such text is not particularly rare.

|> Escaping is as simple as "[[1]".

|> I think that Boost has an excellent opportunity to put some powerful
|> new syntaxes into existing practice. It's easy to say "I'm not used
|> to it" and be stuck with no improvements. (Witness how C++'s syntax
|> still puts return types in front of function names (rather than
|> after the parameters), just because it was done that way in K&R C
|> when most return types were int and hence omitted.) Far better is
|> to look for the technically best solution and put forth a mind to
|> use it, especially when it is reminiscent of existing practice in
|> regular English usage.

If we really want to invent a new syntax, we should try for something
really usable, along the lines of Basic's print using, or Cobol's
picture clauses. Short of that, I can see no reason to abandon a known
syntax for an unknown, which retains most of the defaults of the known.
(It's sort of like Java: no pretentions of C compatibility, but most of
the defaults of C syntax anyway.)

|> Here's a summary of the format syntax rationale:

|> Pros for operator%:
|> - Familiar in context string
|> - Familiar for separating a single argument (cf. Python).

|> Pros for operator[] (and associated [] in context string):
|> - Succinct (as opposed to named functions).
|> - Guaranteed desired precedence.
|> - Quickly parsed visually, especially when persistent manipulators are
|> involved.
|> - Facilitates tightly bound manipulators.
|> - Logically maps to conventional use of operator[].
|> - Corresponds to everyday publishing practice.
|> - Preceding and following non-space text is easily differentiated.
|> - Encapsulates formatting options, free of ambiguity.
|> - Rarely would need escaping in real-world programs.

|> [1] When you get to a footnote (if you're interested in reading it), you
|> jump down to where it is defined, and mentally substitute it into the text
|> where the reference appeared - exactly what format does.

I'm somewhat confused in all this about when you are talking about the
format specifier, and when you are talking about how the arguments are
being passed. For example, in your pros for operator[], the first four
seem to apply to the parameter passing, whereas the rest apply to the
format specifier. It would be easier to discuss if you'd keep the two
separate.

-- 
James Kanze                                mailto:kanze_at_[hidden]
Conseils en informatique orientée objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk