Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-05 19:58:47


Daryle, I think this discussion is getting overheated. (See, e.g., the long code
excerpts containing 'jon_xxx'). If I was a bit harsh in my first comments on
your library, I'm sorry. I did vote to include a large percentage of it.

On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:

> > "Daryle Walker" <darylew_at_[hidden]> wrote:

> >> 1. Aren't memory-mapped files and file descriptors highly platform
> >> specific?
> >
> > Yes, just like threads, sockets and directory iteration.
> >
> >> Code that works with them would have to be non-portable, so I
> >> don't think they're appropriate for Boost.
> >
> > It achieves portability the same way boost.thread and boost.filesystem do:
by
> > having separate implementations for different systems. See
> > http://www.boost.org/more/imp_vars.htm ("Implementation variations").
>
> But for the thread and file-system libraries, we can define default
> behavior.

We can do this for memory mapped files as well. Either including the appropriate
header could cause a static assertion, or construction of mapped file resources
could fail at runtime. Right now I've followed the example of Boost.Filesystem
and assumed that every system is either Windows or Posix. This can easily be
changed to produce more informative errors. Good point.

> Thread-less environments act as if no spare threads can be
> allocated.

That's not the approach of Boost.Thread, IIRC. If thread support is unavailable,
you get a preprocessor error (at least on windows.)

> All file-systems can simulate a tree/container calculus, so a
> portable interface can be defined.

Again, Boost.Filesystem doesn't do this.

> But memory-mapped files and file
> descriptors are totally meaningless on some environments; what would the
> code map to in those cases?

See above.

>
> >> 2. This library does what a lot of other text-I/O libraries do, try to fit
> >> in "kewl" compression schemes. The problem is that the types of
compression
> >> here are binary oriented; they convert between sets of byte streams.
> >> However, characters are not bytes (although characters, like other types,
> >> are stored as bytes).
> >
> > Are you saying there are problems with the implementation of the compression
> > filters, e.g., that they make unwarranted assumptions about 'char'? If so,
> > please let me know. I'm sure it can be fixed.
>
> I'm complaining that binary I/O should _not_ be treated as a variant of text
> I/O (which your library assumes).

All I/O is treated as streams of characters. When these streams of characters
require special 'textual' interpretation, you can use a newline_filter, for
line-ending conversion, or a converter, for code conversion.

> Binary I/O only concerns itself with
> bytes, which is too low-level for text I/O. There can and should be
> bridging code, but the concepts of text sources/sinks should be distinct
> from binary sources/sinks.

This just doubles the number of concepts, for little gain.

> > I don't see the iostream framework as relating to text streams only: streams
> > can handle text and binary. In some cases, you want text and binary to work
> > together. E.g., suppose you have a compressed text file ("essay.z") and you
> > want
> > to read a 'family-friendly' version of it. You can do so as follows:
> >
> > filtering_istream in;
> > in.push(regex_filter(regex("damn"), "darn"));
> > in.push(zlib_decompressor());
> > in.push(file_source("essay.z"));
> > // read from in.
> >
> > Isn't this perfectly natural and convenient? What's wrong with using the
> > decompressor and the regex filter in the same chain?
>
> By itself, nothing. But these compression schemes only work with bytes, so
> you have hidden at least one text <-> binary converter in your code.

(BTW, the file_source above should have been opened in binary mode.)

All that's assumed in this example is that the characters in the essay file can
be mapped directly to chars. If they can't, one would have to add a layer of
code conversion (using converter) after the decompression, and use a
wide-character filtering stream and wide-character regex_filter.

If the above example were disallowed, then in the common case that output is
stored in a form which can be directly mapped to the internal character set
without code conversion, the user would be forced to insert a do-nothing
adapter.

The current library trusts users to know when they are dealing with data which
must be converter to a wide character type before it can be processed by
text-oriented filters.

> > Can I rephrase this as follows: InputFilters and OutputFilters are a useful
> > addition to the standard library, but Sources and Sinks just duplicate
> > functionality alread present? If this is not your point please correct me
>
> Yes, that's my point. I looked through your code, and thought "this is just
> a rearrangement of what's already in streams and stream-buffers". I got
> really convinced of this once I saw that you added member functions for
> locale control.

I found I had to add this, rather late in development, to implement converting
streams and stream buffers (which still aren't finished). What's wrong with
locales? You say it like it's a dirty word.

> I've recently noticed that even your documentation for the
> Resource and Filter concepts admit that they're just like certain C++ or C
> I/O functions.

You mean when I say, for example,

   "Filters are class types which define one or more member
   functions get, put, read, write and seek having interfaces
   resembling the functions fgetc, fputc, fread, fwrite and fseek
   from <stdio.h>"

?

The functions boost::io::read, boost::io::write, etc., are indeed generic
versions of these familiar functions. I mention the familiar functions as a way
to introduce readers to the generic versions. The benefits of generic
programming are well known, I hope.

> > There are two main resons to write Sources and Sinks instead of stream
> > buffers:
> >
> > 1. Sources and Sinks and sinks express just the core functionality of a
> > component. Usually you have to implement just one or two functions with very
> > natural interfaces. You don't have to worry about buffering or about putting
> > back characters. I would have thought it would be obvious that it's easier
to
> > write:
> >
> > template<typename Ch>
> > struct null_buf {
> > typedef Ch char_type;
> > typedef sink_tag category;
> > void write(const Ch*, std::streamsize) { }
> > };
> >
> > than to write your null_buf, which is 79 lines long.
>
> That really misleading. The null-sink I have does a lot more. I keep track
> of how many characters passed through (i.e. a value-added function), and I
> optimize for single vs. multiple character output.

Okay,

    template<typename Ch>
    class null_buf {
    public:
        typedef Ch char_type;
        typedef sink_tag category;
        buf() : count_(0) { }
        void write(const Ch*, std::streamsize n) { count_ += n}
        int count() const { return count_; }
    private:
        int count_;
    };

This will lead to a stream buffer which keeps track of how many characters pass
through, is optimized for single vs. multiple character output, *and* is
buffered by default.

> Also, I'm verbose in my
> writing style. If I wanted to be compact I could just do:
>
> //========================================================================
> template < typename Ch, class Tr = std::char_traits<Ch> >
> class basic_nullbuf
> : public std::basic_streambuf<Ch, Tr>
> {
> protected:
> // Overriden virtual functions
> virtual int_type overflow( int_type c = traits_type::eof() )
> { return traits_type::not_eof( c ); }
> };

But that doesn't do what my version, listed above, does.

> And for those of you who think that "traits_type" is scary: get over it!
> Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and
> WRONG. The whole point of the traits class is so that a character type
> isn't forced to define those operators. Worse, those operators could exist
> but be inappropriate. For example, Josuttis' STL book has a string type
> that implements case-insensitive comparisons with a custom traits type.
> Using operator== directly would have missed that. Ignoring the policies of
> the traits type's creator could betray his/her vision of usage.

In early versions of my library, filters and resources had traits types as well
as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a careful
study and found that traits could be eliminated from the public interface of the
filter/resource module of the library without sacrificing generality or
correctness, except in the case of the return type of get, which is still

    std::char_traits<char_type>::int_type.

Even this could be eliminated by having get return optional<char>. For a more
ambitious proposal along these lines, see http://tinyurl.com/6r8p2.

Of course, filter and resources authors may need to use char_traits to implement
member functions read, write, etc. .... But I'm not sure I see where this
discussion is going.

> > 2. Sources and sinks can be reused in cases where standard streams and
stream
> > buffers are either unnecessary or are not the appropriate abstraction. For
> > example, suppose you want to write the concatenation of three files to a
> > string.
> > You can do so like this:
> >
> > string s;
> > boost::io::copy(
> > concatenate(
> > file_source("file1"),
> > file_source("file2"),
> > file_source("file3")
> > ),
> > back_insert_resource(s)
> > );

> A straw-man? Wouldn't an iterator-based solution have been better? (There
> are stream(-buffer) iterators, and (string) insert iterators. If the Boost
> iterator library provides a chaining iterator type, then the standard
> copying procedure could be used.)

It's tempting to try to do everything using iterators. In fact, Robert Ramey's
original suggestion to expand the library to handle filtering suggested that it
be based on iterator adapters.
(http://lists.boost.org/MailArchives/boost/msg48300.php)

The problem with this approach is that it misses the opportunity for many
important optimizations that can be made when one is presented with a contiguous
buffer full of characters, instead of one character at a time.

> >> The whole framework seems like "I/O done 'right'", a "better"
implementation
> >> of the ideas/concepts shown in the standard I/O framework.
> >
> > I'd say thanks here if 'right' and 'better' weren't in quotes ;-)
>
> It looked like you changed the interface just to change the interface, not
> out of any actual need. What about the following (untested) code:

I'm going to ignore the code, which seems sarcastic. (Don't name stuff after me
until I'm dead.)

Instead, let me quote part of my response to Dietmar Kuehl:

Jonathan Wrote:
> ... The protected virtual interface of
> basic_streambuf is, IMO, quite strange. The function have wierd names:
> underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff,
> etc -- the functions read, write, and seek are much more intuitive. The
> specifications of the standard functions are tricky, too. For example,
overflow
> (one of the better-named functions), is specified roughly like this:
>
> virtual int_type overflow(int_type c = traits_type::eof());
>
> "If c is not eof, attempts to insert into the output sequence
> the result of converting c to a character. If this can't be done,
> returns eof or throws an exception. Otherwise returns any value
> other than eof."
>
> Contrast this with
>
> void write(const char_type* s, std::streamsize n);
>
> "Writes the sequence of n characters starting at s to the
> output sequence, throwing an exception in case of error."

What I've tried to do with the library is to factor out the essential
functionality necessary to define a stream buffer. I've found that in most cases
writing a stream buffer can be reduced to implementing one or two functions with
simple names and specifications. It seems like an obvious win to me.

<snip lots of code>

> >> The price is a
> >> code size many times larger than the conventional system,
> >
> > Are you talking about the size of the libray or the size of the generated
> > code?
>
> The size of the library.

1. The library is big partly because it contains a lot of special pupose
components, such as compression filters. You don't pay for them if you don't use
them.

2. The support for the generic read and write operations is quite lightweight.

3. If you use the library just to define new stream buffer types, then in
addition to (3) the main code comes from

    <boost/io/detail/streambufs/indirect_streambuf.hpp>,

which is the generic streambuf implementation, and from

   <boost/io/detail/adapters/resource_adapter.hpp>
   <boost/io/detail/adapters/filter_adapter.hpp>

which are lightweight wrappers that allow indirect_streambuf to interact with
filters and resources using a single interface.

4. If you want to chain filters, then in addition to (2) and (3), the main code
comes from

   <boost/io/detail/chain.hpp>

which at 16k is a small price to pay for a flexible filtering framework.

> >> and a large chunk
> >> of it is a "poor man's" reflection system.
> >
> > Do you mean the i/o categories? This follows the example of the standard
> > library and the boost iterator library. It's better than reflection, since
you
> > can't get accidental conformance.
>
> No, I'm talking about the code you used to get the existing standard I/O
> framework to inter-operate with your framework.

Specifically?

> >> ... The sample stream-buffer in
> >> More-I/O generally had added-value member functions attached, that perform
> >> inspection or (limited) reconfiguration. Those member functions also have
> >> to be manually carried over to the final derived stream class. ...
> >> The Iostreams
> >> framework seems to totally ignore the issue! ...

> > With a streambuf_facade or stream_facade you can access the underlying
> > resource
> > directly using operators * and ->. E.g.,
> >
> > stream_facade<tcp_resource> tcp("www.microsoft.com", 80);
> > ...
> > if (tcp->input_closed()) {
> > ...
> > }
> >
> > Maybe I should stress this more in the documentation. (I imagine some people
> > won't like the use of operators * and -> here, but these can be replaced by
a
> > member functions such as resource().)
>
> I didn't like the iterator "look" those operations have.

Noted.

> Also, is a stream-façade an actual stream?

Yes.

> 1. Are there really any important sources/sinks that can't be put through
> the existing Standard I/O framework?

The standard library handles non-blocking, asynchronous and multiplexed i/o
awkwardly at best. In contrast, for a generic i/o framework, adding such support
should be fairly straightforward. We just need to introduce the right concepts.

> 2. An existing source/sink, if it wants to work with Standard C++, would
> work with the standard framework already.

To summarize: an existing source/sink, if it wants to work with the standard
framework, already works with the standard framework?

> You have a potential problem:
> standard C++ I/O is "too hard"
> But you got the wrong solution:
> throw away the standard I/O's legacy and start over from scratch
> (but include transition code)

I hope it's possible to improve some of the standard library I/O framework in
the future. Perhaps experience with the current library will help form the basis
for a proposal. But that's not the point of the current library. The point is to
make easy what is currently not-so-easy, and to reduce the difficulty of what is
currently very difficult.

> This is independent of the decisions on memory-mapped files, file
> descriptors, binary I/O, and filters. Couldn't all of those been
> implemented around the standard framework?

Of couse -- with massive code duplication.

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk