Boost logo

Boost :

From: Brey, Edward D (EdwardDBrey_at_[hidden])
Date: 2002-01-28 10:36:44


I've been putting some thought into the syntax of Boost.Format, and have
come to see that the best syntax going forward is one that does not try to
retain compatibility with printf as a high priority. Backward compatibility
can certainly be useful, but cases where compatibility is needed differ from
cases of new code, where a syntax that can be optimal, in the sense of not
needed to bend based on legacy decisions. Since we are embarking on a new
class, not touching the existing printf, we can safely shed any baggage that
gets in the way, knowing that it is easy to create a parallel
printf-friendly class to support those who need one.

I'm going to assume for the moment that the formal argument syntax
(format(spec, arg1, arg2)) isn't going to be technically feasible. (Even if
so, it doesn't handle persistent manipulators all that well.) Taking that
option off the table, let's look at our other options. Consider operator%
and operator<<. Operator<< is confusing in a larger cout statement.
operator% is not familiar, and has no intuitive mapping to an established
use of the operator. Both of have precedence problems. On the plus side,
operator% is kind of similar to Python, but only to an extent: as soon as
you go beyond one parameter, the syntax starts looking quite a bit
different, as there is no precedent for chained '%'s.

Lets look at the alternatives. A very clear option is function calls, e.g.
.with(arg1). It also has no precedence problems. Unfortunately, this
clarity comes with quite a bit of verbosity and makes arg1 jump out at you
less, so the code is a little harder to visually parse. Operator[] is
similar, essentially the abbreviated version. It has a logical mapping into
the traditional use of the operator, in that is it maps into a virtual
stream space, which theoretically contains the stream representation of all
possible argument values. In this respect, the usual thoughts on operator[]
work, such as applying one set of []s on the result of the former. Each
occurrence of [arg] logically replaces its corresponding reference in the
context string and returns resulting context string for the next [arg]. As
intuitive as std::map? Well, no, not quite. But a lot better than
operator%.

Another nice feature of operator[] is that it allows manipulators to be
compact and easily visually parsed, although it's only good for persistent
manipulators. For non-persistent manipulators, you want something that
binds tightly to the argument that it is manipulating. Even the function
style, i.e.

format("Banana price: [1]/[2] tax: [3]%")
  .apply_next(width(3)).with(cost).apply_next(width(4)).with(unit)
  .apply_next(width(4)).apply_next(precision(1)).with(tax)

doesn't show the binding as precisely as something like:

format("Banana price: [1]/[2] tax: [3]%")
  [manip(width(3)), cost] [manip(width(3)), unit]
  [manip(width(3)), manip(precision(1)), tax]

Since operator[] uses a closing token other than ')', it makes the groupings
a little easier to see than the function call method. Granted, in the
syntaxes above, there is enough manipulator text overhead that no syntax can
really look clean - definitely an argument for allowing format specifies
within the context string.

What of the existing practice argument for '%'. It is used in C (and C++),
Python, and CLISP. But existing practice in and of itself isn't very
important. The real question is whether it is a beneficial syntax. There
are two aspects to consider: How recognizable/intuitive is it? And does it
have beneficial syntactical properties?

Existing practice definitely helps with recognizably. What's interesting It
turns out that '%' and '[]' both have a lot of existing practice, each from
different circles. For '[]', the existing practice is in everyday writing.
The brackets are commonly used for footnotes a plain text environment, which
makes them familiar and easily recognized. [1] This usage actually gives
brackets a leg up, since programmer and non-programmers (read translators)
alike read everyday English (or other languages), whereas only experiences
programmers or translators will have encountered '%' for substitution.

For the question of syntactical properties, '%' starts with a leg up. After
all, it was chosen for C for a good reason, right? And then by subsequent
languages. Well, kind of. '%' was chosen by C assuming that a letter was
going to follow (not necessarily immediately). Given the use of letters,
[u] just doesn't have the same natural look as [1] does, so it makes some
sense. Also, in those days, squeezing out the last character was more
important than it is now. Why did more modern languages adopt the same
style? In Python's case, it's because Python likes to copy C. Many of
Python's libraries are thin wrappers on C, without much redesign effort put
into tailoring the interface to leverage the Python environment (cf. the
lack of positional formatting for strings). I can't really speak to CLISP's
usage of '%'; however, as flexible of a language that Lisp is,
user-friendliness of syntax isn't its forte, and so I don't think there is
much to be gleaned from that example.

Therefore, as we move forward, we can choose to join the crowd that follows
the C way. This helps C++ be close to C. However, I think it would be
closer to C than is possible, in that it would be impossible to leverage
some syntactical advantages without moving to a more modern format
specification. With '[]', the following is possible:

- Preceding and following non-space characters are easy to distinguish from
the substitution placeholder.

- Formatting options are well encapsulated, while still keeping the position
indicator in the front. For example [1:8.3>] means parameter 1, total width
of 8 characters, 3 digits of precision, right justified.

- Escaping is minimized: Only literal text of the form "[{digit}" would need
be escaped. Escaping is as simple as "[[1]".

I think that Boost has an excellent opportunity to put some powerful new
syntaxes into existing practice. It's easy to say "I'm not used to it" and
be stuck with no improvements. (Witness how C++'s syntax still puts return
types in front of function names (rather than after the parameters), just
because it was done that way in K&R C when most return types were int and
hence omitted.) Far better is to look for the technically best solution and
put forth a mind to use it, especially when it is reminiscent of existing
practice in regular English usage.

Here's a summary of the format syntax rationale:

Pros for operator%:
- Familiar in context string
- Familiar for separating a single argument (cf. Python).

Pros for operator[] (and associated [] in context string):
- Succinct (as opposed to named functions).
- Guaranteed desired precedence.
- Quickly parsed visually, especially when persistent manipulators are
involved.
- Facilitates tightly bound manipulators.
- Logically maps to conventional use of operator[].
- Corresponds to everyday publishing practice.
- Preceding and following non-space text is easily differentiated.
- Encapsulates formatting options, free of ambiguity.
- Rarely would need escaping in real-world programs.

Best regards,
Ed

[1] When you get to a footnote (if you're interested in reading it), you
jump down to where it is defined, and mentally substitute it into the text
where the reference appeared - exactly what format does.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk