Boost logo

Boost :

From: Ed Brey (EdwardDBrey_at_[hidden])
Date: 2002-01-14 18:48:10


I think that a format library would be an excellent addition to Boost. This implementation seems to do a good job of implementing the string formatting concept; however, there are some issues with it that should be resolved before it is accepted.

One problem is portability. It doesn't compile on either Cygwin gcc 2.95.3 or VC6. gcc doesn't like char_traits, and I'm not sure if much can be done about it. VC could probably be made to work fairly easily, just by dropping the separate .ihh file and merging the member function definitions together with the declarations.

The other problem is some issues with the interface (see "Interface comments" below). Details of the interface are a very important part of a library like this, since its application is so broad. If at all possible a consensus should be reached on the best syntax to use. In any case, the library should contain rationale describing why any reasonable-appearing interfaces are not used. Based on the current rationale document, it looks like some of the possibilities haven't been thought all the way through yet.

Of a lesser concern, the documentation could use some reorganization and refactoring, although many localized parts of it are quite good.

My recommendation is to hash out the best answers to the portability and interface issues and have a mini-review before acceptance.

Interface comments:

- I like the object model; i.e. as opposed to, say format(cout, "%1", 123). It is quite nice how a single name, "format", enables either streaming and string creation (except that str is needed - more on that later), including reuse of the format specification.

- While operator% does a nice job of binding the parameters to the format object, I believe it is possible to do even better. Consider operator[], which has these advantages:
- - Tighter binding. No chance of accidentally expecting a different binding.
- - Has a correlation to the built-in operator[]: Consider the 'format("%1") [num]'. Think of "%1" as representing a virtual array {"0", "1", "2", ... }, which num indexes into.
- - Opens the door for a natural syntax which can eliminate the "glue" function. This is useful since the name "glue" does little to help the intuition. The syntax would be to use operator() for manipulators, which works nicely since it looks similar to what surrounds the specification string. The specification string and manipulators share the common trait of describing the output layout, which a syntactical similarity reinforces.

I don't believe that operator[] has been proposed yet, and indeed I only came across the idea in writing up this review, or I would have suggested it sooner. However, operator() has been proposed, which has similar advantages, and I don't recall seeing any counterarguments to the proposal on the mailing list. Likewise, the idea isn't addressed in choices.html. The choice between operator()/[]/% is important and should be well thought through and documented, just as was done for operator%/<<.

- str() is one of those names that I've often heard on this list should not be abbreviated. No wonder - Does it stand for string or stream? :-) Clearly, it was abbreviated just to avoid a name conflict, which always calls the design into question. I realize that the standard library does this, and I question it, too. Is there any harm in just providing implicit string conversion instead?

- Instead of having formatter automatically raise an exception, consider having it store its error state and set the failbit on the stream if there is an exception. I'm not sure this is a good idea - just something to think about. The consideration is that there may be some applications that would prefer to silently but gracefully degrade in the face of this situation than to have to handle an exception.

- I didn't look deeply into the non-legacy formatting codes, since the documentation wasn't clear about what is legacy printf compatibility and what is the "latest and greatest" that new code would use. I'd like to have another look following some documentation refactoring (see below).

Documentation comments:

The ordering of sections needs work. At the very beginning should be a description, synopsis of features (preferably formatted, not in a <pre> block), and some examples. It should be sufficient so that most users can stop and make use of the class with just two or three minutes of learning curve.

Next some format tweaking details would be in order. These seem to be non-existent in the docs. There are a lot of options in the printf section, but what if I'm not interested in printf compatibility, but want to center text? Those should be in the format tweaking section.

Next a legacy section would be in order. Here the printf interface and all its idiosyncrasies would appear. This is kept towards the end because few people will read it. Those writing new code will just use the native interface, and those using the printf interface already have format strings that they don't want to change, so they're going to just let pformat rip with whatever's there.

Finally, details like rationale and history should be at the end.

This library doesn't have a true implementation document, which is fine. It is important, however, not to throw arbitrary pieces of implementation-specific information into the interface document. Specifically, the references to the ".ihh" file should not exist in the html doc or the sample documentation.

It would be much nicer to have samples snippets incorporated directly into the HTML instead of having a separate sample program. Currently there is a hybrid sample/mini-test-harness that serves neither purpose very well.

The docs could use some grammar clean-up. Here's are items that I found, organized by section:

How it works:
- Bullet 1:
 "directives in it, and prepare" ->
 "directives in it and prepares"
- The last paragraph: Hardly one word. An implementation detail like using stringstream should not be documented here, since you won't want your users to rely on that fact (not that I see how they could).

Syntax:
- "Mixing positional-directives with non-positional" ->
 "Mixing positional with non-positional directives"

- "formatting option specified" ->
 "formatting options specified"

Printf directives:
- "(but common flags have ... anybody)" ->
 "(Common flags have ... anybody.)"

Credits:
- "Karl Nelson formatting" ->
 "Karl Nelson's formatting" ->

choices.html:

'glue' rational:
- "the small inconvenient is that" ->
 "the small inconvenience is that"

sample_formats.cc:

- "Too much arguments" ->
 "Too many arguments"

- "throwed" ->
 "thrown"

Directory organization:

The "bench" folder does not have an intuitive name. I had to open the file inside it to understand its purpose. Better would be to move the folder under the test folder and rename it to "benchmark" or "performance".

Code comments:

Commented out code needs to be changed or needs an explanation. E.g "//inline" and "//#include".

The current mixture of spaces and tabs in the code makes it quite hard to read. Please pick one or the other.

If format_item::pad were unsigned, it would help make it clear that it is a bit array (had me faked out at first).

str2int(): Why use the long if/else block rather than n += s[i] - '0'?

format.hpp:
- '"arguments passed incompatible with format string "' ->
 '"arguments passed are incompatible with format string"'

format_and_args_mismatch: Consider something less verbose, such as "specification_mismatch".


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk