Boost logo

Boost :

From: James Kanze (kanze_at_[hidden])
Date: 2002-02-09 10:17:18


Samuel Krempp <krempp_at_[hidden]> writes:

|> On Wed, 2002-01-30 at 23:56, James Kanze wrote:
|> > The problem is that

|> > format( "..." ) % x * y % z ;

|> > is parsed wrong. Generally, this should result in a compiler
|> > error, but I would be very surprised if we can get away with not
|> > having format implicitly convert to std::string, which means that:

|> > format( "..." ) % x + y ;

|> > will NOT be a compile time error if x and y are strings.

|> yes it will, I've carefully insured it would. there is no implicit
|> conversion to string. (when wondering what kinds of trouble this
|> implicit conversion could bring, I detected the problem with '+')

As I said, can we get away with it? I know what problems implicit
conversions can create, and I generally avoid them as much as possible.
My initial GB_Format has a toString function, and that was it.

My customers insisted otherwise:-).

|> > IMHO, the best solution is just to forget about the operator (and
|> > the implicit conversions), but try selling that.

|> maybe I'm giving too much importance to appearance, but I'm not
|> pleased by lines like :

|> cout << format( fstr ).with( x1 ).with( x2 ).with( x3 );

|> when I can have, for the price of compile-time catched user
|> precedence errors :

|> cout << format( fstr ) % x1 % x2 % x3;

|> the repeated 'with' makes the statement very long, full of words,
|> and the eye can't catch what's happening, nor detect an output in
|> the middle of other lines of code.

And I find the first more visually appealing, and more readable.
Basically, only in the first is it 100% clear that the arguments are
applied to the format object; that it isn't just an arbitrary
expression.

|> In my view, that's (very generally) the benefit of operators : the
|> lines of code gain in readability.

Only if the overloading is intuitive. Or established: + works for
strings, and << and >> for iostream, because they've become
established. But at least the later is a flagrant case of operator
overloading abuse; the standard gets away with it only because the
operator is very rare (partially true for % as well), and is graphically
very suggestive. And of course, it has an acceptable (although not
perfect) priority.

|> Also very generally, the problem with operators, is that it can
|> interact with others, with rules that the user might not grasp, or
|> might not be able to apply safely 100% of the times.

|> But I thought it was OK as long as there is no implicit conversion
|> kicking in. Because, as long as we obey this constraint, precedence
|> mix-ups result in compile time errors.

Well, implicit conversions do tend to make everything more difficult.
I'll support you in banning them, but I have a feeling that when it
comes to the committee, you may have a fight on your hands. (On the
other hand, we avoided operator char const*() in std::string, so perhaps
it isn't as hopeless as that.)

|> format objects have no implicit conversions defined, and I believe
|> the precedence mix-ups due to operator% would all be detected at
|> compilation.

|> > It's true that the printf uses the format specifier for two
|> > things, the type, and how to format it. And obviously, a C++
|> > implementation doesn't need the first. But is there an easier way
|> > than %f and %e to say that you want fixed or exponential format?

|> no, %e and %f do their job of specifying a float format nicely.
|> But what when you don't have a specific idea on this, and you just want
|> 'default' formatting ?

That's what %g is for:-).

Seriously, have you ever actually used %g in a real program?

|> What comes to mind, is use '%s' by default. e.g :
|> string path="010010";
|> double p = 0.0043;
|> format("Result : f(%s) == %s ") % path % p;

That's more or less the strategy I adopted. When in doubt, %s.

For user defined specializations, I make the actual specifier available.
Thus, a formatter for complex could use %c and %p, for cartesian or
polar coordinates. At this point, you want to be able to group
specifiers, because complex probably also wants to know whether to use
%e or %f. My GB_Format allows this. Accidentally, I must admit; I
simply thought that there would be cases where the user wanted a more
than one letter specifier, and adopted an extention to printf that I had
seen described somewhere, in which <...> is treated as a specifier
character.

Some one else suggested that there should be some grouping, and I like
the idea. IMHO, it should be optional; there are an awful lot of people
out there who know printf, and having to write %{d} when %d would
probably wouldn't go down with them. (Although honestly, why not?) But
like most people, I'm already used to using optional grouping when
specifying shell variables. (E.g. PATH=$PATH:newElement doesn't work.)
In that sense, we could extend the printf format by allowing grouping
(using {}, [], or <>, I really don't think it matters), with the
following rules:

  - The opening character immediatly follows the %.

  - If grouping is present, it, and only it, determines the length of
    the specifier. In particular, if grouping is present, the type
    specifier can be absent, or can be more than one character (but
    starting with a letter). In the case of the default formatters,
    only the first letter will be considered, and no letter will behave
    as %s.

  - For the rest, grouping does not affect the interpretation.

|> Problem is all the type-characters imply some actions,
|> even %s.
|> e.g., once you want to set precision, you might use :
|> format("Result : f(%.4s) == %.4s ") % path % p;
|> But no, crazy you ! precision, with %s, means truncation.. So '%s'
|> is a very special type-char after all.

The definition of the precision of a string is the maximum number of
characters it can provide. That has nothing to do with %s; it has to do
with the way the C committee defined precision of a string.

The same thing is true for precision of integral types. Precision means
at least that many digits (not spaces). Independantly of the type
specifier given.

|> '%d' has less effect : it simply sets decimal base.
|> but... if the argument is passed with a 'hex' manipulator, we've
|> enforced decimal base but did not mean to.

Format and manipulators don't mix. Don't try to.

|> So we need a new type-character, defined as doing nothing, just
|> closing the directive.
|> say, 'a' :
|> format("Result : f(%.4a) == %.4a ") % path % p;

|> at this stage, I realised I simply preferred importing printf
|> syntax, inside brackets, making the final type-character optional,
|> rather than using directly printf syntax, and require a type-char to
|> end each directive.

Sounds OK. But what does it mean?

|> format("Result : f(%{.4}) == %{.4} ") % path % p;

And if path is a string, how does the .4 affect it? Or if it is a user
defined type, without a specialization?

In my implementation, I ignore any letter not defined in the C standard.
(Plus %a, because it didn't enter the C standard until after I'd
implemented my code.) But ignoring a letter comes out pretty close to
treating it like %s : I set the precision in the ostream which I pass to
the template function. There's a specialization for std::string and for
char const*, which interprets the precision correctly. (The standard
ostream operator<< ignores the precision on a string.) User defined
types without a specialization output to an ostream with the precision
set. What that means, in turn, depends on the type. User defined types
with a specialization do whatever they want with it.

|> it looks more natural, each directive is visually grouped (just like
|> the other proposed syntax, '[stuff]' ),

|> and we can omit the type-character if we don't need one. We can
|> also support "%1" at the same time.

|> and we still benefit from the conciseness of the printf syntax.

|> the "{stuff}" mechanism is completely intuitive, so no trouble to
|> get used to.

Agreed.

|> I'm writing documentation now, introducing clearly the choice
|> between pure printf and new syntax, and it seems to me this
|> modified, encapsulated printf is easy to grasp.

It sounds good. I'd argue for maintaining compatibility, but one
doesn't exclude the other. The only real question is if you want to
support %<n>, where n is a numeric value; this conflicts with printf,
since a printf specifier can start with a numeric value. Of course,
that numeric value can only be followed by a letter, a point, or, if you
are supporting the X/Open printf, a $, so most of the time, you could
distinguish. IMHO, however, it isn't really necessary.

-- 
James Kanze                                mailto:kanze_at_[hidden]
Conseils en informatique orientée objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk