|
Boost : |
From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2007-06-30 12:19:11
Phil Endecott wrote:
> ** Formatting of user-defined types often broken in practice.
>
> The ability to write overloaded functions to format user-defined types
> for text I/O is attractive in theory, but in practice it always lets me
> down somewhere. My main complaint is that neither of these work:
>
> typedef std::set<thing> things_t;
> operator<<(things_t things) { .... } // doesn't work because things_t
> is a typedef
>
I see no specific reason why that would fail, as long as there isn't an
operator << for std::set<thing> somewhere already. It's even legal, I
think, because std::set<thing> depends on a type not in namespace std.
(You can't overload for std::set<int>, for example, by the rules of the
standard.)
> uint8_t i;
> cout << i; // doesn't work because uint8_t is actually a char
>
Yes, that's annoying. In my opinion, it's a defect in the standard that
unsigned and signed char are treated as characters instead of small
integers. Characters is what char is for.
> When I do have a class, I often find that there is more than one way in
> which I'd like to format it, but there is only one operator<< to
> overload. And often I want to put the result of the formatting into a
> string, not a stream.
>
I have an idea for a formatting system that should address all these
issues. Basically, a format string would be able to specify, in an
extensible and type-safe way, how to format an object. The format string
would be used to look up a formatter in some sort of registry.
> ** lexical_cast<> uses streams, should the reversed.
>
> Currently we implement formatters that output to streams. We implement
> lexical_cast using stringstreams. Surely it would be preferable to
> implement formatters as specialisations of lexical_cast to a string (or
> character sequence / output iterator / whatever) and to implement
> formatted output to streams on top of that. I suppose you could argue
> that the stream model is better for very large amounts of output since
> you don't accumulate it all in a temporary string, but I've never
> encountered a case where that would matter.
>
I have written in another post why I think the stream interface is
better. Efficiency is one part of the issue. Another is that the code is
simpler that way for the library implementer, and the difference is
transparent for the library user. Also, it means that it's easier to
switch the string type used (something that is not uncommon).
> ** Formatting state has the wrong scope
> void f() {
> scoped_fmt_state(cout,hex);
> cout << ....;
> if (...) throw;
> cout << .....;
> }
>
> Hmm, I think that's too much work. I'd be happy with NO formatting
> state in the stream, and to use explicit formatting when I want it:
>
> cout << hex(x);
> OR cout << format("%08x",x);
> OR printf(stdout,"%08x",x);
>
I absolutely agree. Stateful formatting is generally not good. The only
state that should be in formatting is the used locale.
> And it _is_
> type safe if you are using a compiler that treats it as special.)
>
... _and_ if you use a string literal as the formatting string. Far from
guaranteed, especially when localizing.
> ** Too much disconnect between POSIX file descriptors and std::streams
>
I cannot make myself think of this specific issue as a defect. It would
mean platform coupling.
> I have quite a lot of code that uses sockets and serial ports, does
> ioctls on file descriptors, and things like that. So I have a
> FileDescriptor class that wraps a file descriptor with methods that
> implement simple error-trapping wrappers around the POSIX function calls.
>
Is there any specific reason you cannot implement a streambuffer that
acts on a file descriptor? A streambuffer, despite its name, doesn't
have to buffer data.
> Currently, there's a strong separation between what I can do to a
> FileDescriptor (i.e. reads and writes) and what I can do to a stream.
> There is no reason why this has to be the case. It should be possible
> to add buffering to a FileDescriptor *and only add buffering*, and it
> should be possible to do formatted I/O on a non-buffered FileDescriptor.
>
Yes. It is possible now. It should be easier with my system.
> ** Character sets need support
>
> This is a hugely complex area which native English speakers are
> uniquely unqualified to talk about.
>
Luckily, I'm not a native English speaker. I have some experience with
the issues involved, although my experience is limited to German
umlauts. I have experienced the pains of unexpected encoding use in web
applications. This is why I really, really think all C++ types involving
text handling really need to be tagged with the encoding used.
> I think that a starting point would be for someone to write a Boost
> interface to iconv (I have an example that makes functors for iconv
> conversions), and to write a tagged-string class that knows its
> encoding (either a compile-time type tag or a run-time enumeration tag
> or both). Ideally we'd spend a couple of years getting used to using
> that, and then consider how it can best integrate with IO.
>
I don't want to wait that long ;)
I have in fact considered this issue and have drawn the outline of such
a character handling and conversion library. In fact, a subset of it is
absolutely needed for the text layer of my I/O plans.
Sebastian Redl
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk