Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-02 23:26:44


"Dietmar Kuehl" <dietmar_kuehl_at_[hidden]> wrote:
> Hi,

Hi. Thanks for the comments.

>
> let's start with the obvious formal requirement for review, i.e. the
> answer to the question: Should this library be included into Boost? I
> don't know. I have no strong feelings either way.

Well, I'm glad to hear you don't have a strong inclination to reject.

> Maybe I'm too used to writing stream buffers without help from any
> library but to me it is unclear whether this really makes things
> simpler. The number of different concepts is, at least to me, quite
> confusing. The library does, however, avoid a small fraction of
> boilerplate code (mainly for buffer maintenance). Unfortunately it
> does not at all address the really problematic area in stream buffers,
> namely seeking: the interface of stream buffers is retained. This is
> interface is, in my opinion, really hard to use. Typically, I don't
> bother implementing it unless some specification requires me to do so.
> On the other hand, I'm not using seeking with streams anyway.

These are all important points.

I can certainly understand why someone who has written as many stream buffers as
you have may not feel that additional library support is needed. For the benefit
of others, let me try to explain the benefits of writing a Source or Sink and
then turning it into a stream buffer using streambuf_facade instead of writing a
stream buffer from scratch.

(BTW, I say 'for the benefit of others', but I'd really like your feedback on
several points :-)

----
Writing a Source or Sink lets you express just the core functionality of a
component, and nothing else. Typcially, all you need to do is inherit from a
convenience base class and implement one or two of the functions read, write,
seek and close, which have intuitive names and specifications. (Close can be
messy, but only in unusal cases.)
The details which you don't need to worry about include:
A) Buffer manipulation. You mention this as eliminating only a small fraction of
boilerplate code. I think in a typical buffered implementation of a derived
basic_streambuf class, a substantial portion of the code is devoted to buffer
manipulation. More important than the amount of code, however, is the obscurity
of the buffer-manipulation interface for those who do not routinely use it. In
addition to buffer allocation, which is entirely the programmer's
responsibility, there are ten buffer manipulation functions:
     eback, egptr, epptr, gbump, gptr, pbase, pbump, pptr, setg, setp
I'd guess that other than hard-core iostreams junkies few can say exactly what
all these functions do without consulting documentation. Most people -- even
some iostreams junkies -- should be pleased not to have to use these functions
at all.
B) Character traits. Writing a stream buffer from scratch means dealing with
some or all of the follwing member functions of the character traits type:
     eof, not_eof, to_char_type, to_int_type, eq_int_type
and some or all of the member types:
     int_type, off_type, pos_type
As above, the functions and types are unfamiliar to many. They also make code
difficult to read, esp. when prefixed with 'traits_type::'.
By contrast, when writing Filters and Resources, there is no traits type. The
type streamoff is used everywhere instead of traits_type::off_type. When writing
Sources and Sinks the above member functions are not needed at all.  When
writing filters, they are use only in the case of unbuffered input; even here,
only std::char_traits is used. (For a way to avoid using traits even in this
case, see 'Future Directions' at  http://tinyurl.com/6r8p2.)
C) Protected virtual stream buffer functions. The protected virtual interface of
basic_streambuf is, IMO, quite strange. The function have wierd names:
underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff,
etc -- the functions read, write, and seek are much more intuitive. The
specifications of the standard functions are tricky, too. For example, overflow
(one of the better-named functions), is specified roughly like this:
    virtual int_type overflow(int_type c = traits_type::eof());
    "If c is not eof, attempts to insert into the output sequence
    the result of converting c to a character. If this can't be done,
    returns eof or throws an exception. Otherwise returns any value
    other than eof."
Contrast this with
    void write(const char_type* s, std::streamsize n);
    "Writes the sequence of n characters starting at s to the
    output sequence, throwing an exception in case of error."
(On the void return type, see http://tinyurl.com/4xl65.)
----
Next, you say that seeking is the most problematic part of the stream buffer
interface. I'm a bit surprised by this, and I'm interested to know what part of
the interface you find troublesome. I'm certainly open to simplificactions.
Still, the library does simplify the seeking interface a bit. First, seeking
based on saved stream positions in not supported, because I think it would be
rarely used. (See http://tinyurl.com/5saj6.) That reduces the number of
random-access functions from two to one. Second, 'seek' is a more intuitive name
(just barely) than 'seekoff'. Third, except in the relatively rare case that a
stream has two separate repositionable pointers, seek has the signature:
    streamoff seek(streamoff off, ios::seekdir way);
which is at least a slight improvement over
    pos_type seekoff( off_type off,
                      ios::seekdir way,
                      ios::openmode which =
                          ios::in | ios::out);
And of course, as with writing stream buffers from scratch, if you don't need
random-access then you don't need to worry about implementing seek.
----
Finally, I am sympathetic to the complaint that there are too many concepts.
Really, the problem is that there are so many i/o modes. (See
http://tinyurl.com/5fgu2 for a discussion.)
To summarize:
1. For each of eight i/o modes there is a resource concept and a filter concept.
2. There are also the concepts Filter and Resource at the base of the two above
hierarchies
3. There are several other concepts representing optional behavior, such as
localizability.
To me, (3) is not much of an issue. Here are some changes that could be made:
- Get rid of all the concepts in (1). As explained here http://tinyurl.com/3waf8
and here http://tinyurl.com/3pxog, the concepts Filter and Resource are
sufficient by themselves to support the library. The problem with this is that
most users of the library will be writing just Sources, Sinks, InputFilters and
OutputFilters (the 'Big Four'). Making these users read the specification of
Filter and Resource forces them to look at the interface for random-access,
which is complex in its full generality and totally irrelevant to these users.
- Get rid of all the i/o modes except input and output. This is attractive for
its simplicity, and would leave the library no weaker than many other i/o
libraries, but I reject it for reasons discussed here: http://tinyurl.com/5fgu2
(same section as cited above)
- Get rid of the separate documentation for the concepts InoutResource,
SeekableResource,  InoutFilter and SeekableFilter. This would leave the library
intact, but would eliminate all the documented Filter and Resource refinements
other than the Big Four.
- Keep all the current concepts and documentation, but emphasize prominently and
repeatedly that the Big Four are the really crucial ones. The other concepts
could even be documented in an 'Advanced Topics' section.
I think I tend to favor the last option.
> The documentation should make clear that the filters are not usable as
> temporary add-ons: I have some uses where I have an underlying stream
> buffer which is used directly until a portion of the stream needs
> special processing. At this point, a filter is used to process the
> portio and normal processing is continued afterwards. For example,
> when parsing a mail, a mime-decode filter is used to handle embedded
> binary data. This assumes, however, that the filter synchronizes with
> the underlying stream buffer.
Right. I ran out of time to include a discussion of these issues, and was hoping
someone would bring it up during the review.
One would like to be able to do the following, for example:
- perform some i/o using a filtering_stream
- pop the terminal resource, without closing it
- push a new filter on the chain
- add the same resource, and continue i/o
This can't be done in general, because filters, resources and the stream buffers
which glue them together can all perform internal buffering, and in some cases
it is simply not possible to force a component to synchronize its buffers
without sacrificing performance or data-integrity.
It would be possible to allow components to advertise that they can be fully
synchronized at any time during i/o. Then, if a chain consisted entirely of such
components, filters could be used as temporary add-ons, as you describe.
I didn't implement this because of the complexity it would add to the library. I
figured that if a lot of people requested it, I would add it.
> On the ACCU 2003 conference JC (I can figure out the full name if
> necessary...) presented the idea for creating filters which I really
> liked. It looked something like this:
>
> filter_stream out(tee(std::cout) | encode | gzip | file("some file"));
>
> This would create a stream which writes everything to 'std::cout'
> and also encodes the stuff before sending it on to compress it and
> finally write it to the file. I like this notation...
Very cool! At one point I considered operator+ for this purpose, but | is
obviously much better. I'll have to think about it, but I think a macro
    BOOST_IO_MAKE_PIPEABLE(concept, arity)
would be sufficient to enable this.
> Finally, here are some issues I stumbled over in the documentation:
>
> - The description for "boost::io::read()" seems wrong to me: "returning
> a value less than n indicates end-of-sequence". At least on POSIX
> systems, return less than the number of requested bytes indicates
> that *currently* no more characters are available. The next call may
> return more characters. This behavior could be useful e.g. when
> filtering: get a buffer from the underlying stream and pass on what
> is left from this buffer after filtering.
To me, this is the biggest open issue.
I'm aware of the POSIX specification. Originally, I chose the current
specification for simplicity, but am having second thoughts, as explained in
Rationale-->Planned Changes-->Item 4 (http://tinyurl.com/6rtkz).
The hardest problem is how to allow unbuffered input filters to indicate that no
character is available even though EOF has not been reached. I present three
alternatives here, http://tinyurl.com/6r8p2, but none is entirely satisfactory.
> - On the "Filter" concept page, "OutputFilter" is not found (but it is
> reachable from the "InoutFilter" page).
Thanks.
> - On the "SeekableFilter" concept page, I think the preconditions for
> "f.put()" and "f.write()" should be "output" rather than "input". At
> least, I would consider it confusing if "input" were right...
Actually, all the preconditions are messed up here.
Thanks again!
Best Regards,
Jonathan

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk