Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2004-09-05 18:00:58


On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:

> "Daryle Walker" <darylew_at_[hidden]> wrote:
>
>> On 8/28/04 8:09 PM, "Jeff Garland" <jeff_at_[hidden]> wrote:
>>
>>> Today (August 28th, 2004) is the start of the formal review of the Iostreams
>>> library by Jonathan Turkanis. I will be serving as review manager. Note
>>> that this is a somewhat unusual situation in that we have several libraries
>>> that overlap in the same area, so comments related to the MoreIo overlap are
>>> needed.
>>>
>>>
>>> http://home.comcast.net/~jturkanis/iostreams/
>
> Thanks for the review!
>
>> Some concerns over what qualifies:
>>
>> 1. Aren't memory-mapped files and file descriptors highly platform
>> specific?
>
> Yes, just like threads, sockets and directory iteration.
>
>> Code that works with them would have to be non-portable, so I
>> don't think they're appropriate for Boost.
>
> It achieves portability the same way boost.thread and boost.filesystem do: by
> having separate implementations for different systems. See
> http://www.boost.org/more/imp_vars.htm ("Implementation variations").

But for the thread and file-system libraries, we can define default
behavior. Thread-less environments act as if no spare threads can be
allocated. All file-systems can simulate a tree/container calculus, so a
portable interface can be defined. But memory-mapped files and file
descriptors are totally meaningless on some environments; what would the
code map to in those cases?

>> 2. This library does what a lot of other text-I/O libraries do, try to fit
>> in "kewl" compression schemes. The problem is that the types of compression
>> here are binary oriented; they convert between sets of byte streams.
>> However, characters are not bytes (although characters, like other types,
>> are stored as bytes).
>
> Are you saying there are problems with the implementation of the compression
> filters, e.g., that they make unwarranted assumptions about 'char'? If so,
> please let me know. I'm sure it can be fixed.

I'm complaining that binary I/O should _not_ be treated as a variant of text
I/O (which your library assumes). Binary I/O only concerns itself with
bytes, which is too low-level for text I/O. There can and should be
bridging code, but the concepts of text sources/sinks should be distinct
from binary sources/sinks.

>> There are issues of character-to-byte-sequence
>> conversion. These issues, and binary I/O itself, should _not_ be snuck in
>> through a text-I/O component. Worse, we have an upcoming library
>> (Serialization) that also has to deal with binary I/O stuff, so we should
>> have a binary I/O strategy. That means that the compression stuff should be
>> skipped for now.
>
>> (If someone says that this library is supposed to be text and/or binary
>> I/O, that's even worse! The two types should be distinct and not just
>> glommed together.)
>
> I don't see the iostream framework as relating to text streams only: streams
> can handle text and binary. In some cases, you want text and binary to work
> together. E.g., suppose you have a compressed text file ("essay.z") and you
> want
> to read a 'family-friendly' version of it. You can do so as follows:
>
> filtering_istream in;
> in.push(regex_filter(regex("damn"), "darn"));
> in.push(zlib_decompressor());
> in.push(file_source("essay.z"));
> // read from in.
>
> Isn't this perfectly natural and convenient? What's wrong with using the
> decompressor and the regex filter in the same chain?

By itself, nothing. But these compression schemes only work with bytes, so
you have hidden at least one text <-> binary converter in your code. The
issue of text vs. binary is too big to be trivially dismissed! If someone
introduces a binary-I/O scheme to Boost, we would have to through all this
code again (since Zlib, Bzip, and the like would be better to be redone as
binary I/O filters).

>> Now to the review itself:
>
> I thought it had already started ;-)
>
>> 1. Actual code using this library is very slick and easy to set up. This
>> ease of use/set-up also applies to the plug-in filters and/or resources.
>
> Thanks.
>
>> The end-user experience is so awesome that we could just say "ship it"
>> and go home. However, the experience is the only good point. That ease
>> comes with a nasty little price tag, or more accurately, a nasty BIG price
>> tag.
>>
>> The core question on keeping this library is:
>>
>> Besides filtering/chaining, is there any part of the interface that
>> isn't already covered by the standard stream (buffer) system?
>
> Can I rephrase this as follows: InputFilters and OutputFilters are a useful
> addition to the standard library, but Sources and Sinks just duplicate
> functionality alread present? If this is not your point please correct me

Yes, that's my point. I looked through your code, and thought "this is just
a rearrangement of what's already in streams and stream-buffers". I got
really convinced of this once I saw that you added member functions for
locale control. I've recently noticed that even your documentation for the
Resource and Filter concepts admit that they're just like certain C++ or C
I/O functions.

> There are two main resons to write Sources and Sinks instead of stream
> buffers:
>
> 1. Sources and Sinks and sinks express just the core functionality of a
> component. Usually you have to implement just one or two functions with very
> natural interfaces. You don't have to worry about buffering or about putting
> back characters. I would have thought it would be obvious that it's easier to
> write:
>
> template<typename Ch>
> struct null_buf {
> typedef Ch char_type;
> typedef sink_tag category;
> void write(const Ch*, std::streamsize) { }
> };
>
> than to write your null_buf, which is 79 lines long.

That really misleading. The null-sink I have does a lot more. I keep track
of how many characters passed through (i.e. a value-added function), and I
optimize for single vs. multiple character output. Also, I'm verbose in my
writing style. If I wanted to be compact I could just do:

//========================================================================
template < typename Ch, class Tr = std::char_traits<Ch> >
class basic_nullbuf
    : public std::basic_streambuf<Ch, Tr>
{
protected:
    // Overriden virtual functions
    virtual int_type overflow( int_type c = traits_type::eof() )
    { return traits_type::not_eof( c ); }
};
//========================================================================

That's a lot less than 79 lines.

And for those of you who think that "traits_type" is scary: get over it!
Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and
WRONG. The whole point of the traits class is so that a character type
isn't forced to define those operators. Worse, those operators could exist
but be inappropriate. For example, Josuttis' STL book has a string type
that implements case-insensitive comparisons with a custom traits type.
Using operator== directly would have missed that. Ignoring the policies of
the traits type's creator could betray his/her vision of usage.

> 2. Sources and sinks can be reused in cases where standard streams and stream
> buffers are either unnecessary or are not the appropriate abstraction. For
> example, suppose you want to write the concatenation of three files to a
> string.
> You can do so like this:
>
> string s;
> boost::io::copy(
> concatenate(
> file_source("file1"),
> file_source("file2"),
> file_source("file3")
> ),
> back_insert_resource(s)
> );
>
> This is IMO clearer and possibly much more efficient than:
>
> string s;
> ofstream file1("file1");
> ofstream file2("file2");
> ofstream file3("file3");
> ostringstream out;
> out << file1.rdbuf();
> out << file2.rdbuf();
> out << file3.rdbuf();
> s = out.str();
>
> (Note: concatenate is unimplemented, pending resolution of some open issues
> such as the return type of 'write').

A straw-man? Wouldn't an iterator-based solution have been better? (There
are stream(-buffer) iterators, and (string) insert iterators. If the Boost
iterator library provides a chaining iterator type, then the standard
copying procedure could be used.)

> Another example, for the future, might be resources for asynchronous and
> multiplexed i/o. (See 'Future Directions', http://tinyurl.com/6r8p2.)

We can deal with that later. For now, what's the advantage of the library
over synchronous single-plexed I/O?

>> The whole framework seems like "I/O done 'right'", a "better" implementation
>> of the ideas/concepts shown in the standard I/O framework.
>
> I'd say thanks here if 'right' and 'better' weren't in quotes ;-)

It looked like you changed the interface just to change the interface, not
out of any actual need. What about the following (untested) code:

//========================================================================
template < typename Ch = char >
class source_streambuf
    : public virtual std::basic_streambuf<Ch>
{
protected:
    // Lifetime management
    source_streambuf() : use_buf_( false ), any_more_( true ) {}

    // Overriden virtual functions
    virtual std::streamsize xsgetn( char_type *s, std::streamsize n )
    {
        if ( this->any_more_ )
        {
            std::streamsize const nn = this->jon_read( s, n );

            this->any_more_ = ( nn >= n );
            this->use_buf_ = ( nn >= 1 );

            if ( this->use_buf_ )
            {
                traits_type::assign( this->buf_, s[nn - 1] );
            }

            return nn;
        }
        else
        {
            return 0;
        }
    }

    virtual int_type underflow()
    {
        return this->use_buf_
         ? traits_type::to_int_type( this->buf_ )
         : this->uflow();
    }

    virtual int_type uflow()
    {
        char_type c;

        return ( 1 == this->xsgetn(&c, 1) )
         ? traits_type::to_int_type( c ) : traits_type::eof();
    }

private:
    // Override this instead of creating a "read" in Jon's library
    virtual std::streamsize jon_read( Ch *s, std::streamsize n ) = 0;

    // Member data
    char_type buf_;
    bool use_buf_, any_more_;

};

template < typename Ch = char >
class sink_streambuf
    : public virtual std::basic_streambuf<Ch>
{
protected:
    // Overriden virtual functions
    virtual std::streamsize xsputn( char_type const *s,
     std::streamsize n )
    {
        this->jon_write(s, n);
        return n;
    }

    virtual int_type overflow( int_type c = traits_type::eof() )
    {
        if ( !traits_type::eq_int_type(c, traits_type::eof()) )
        {
            char_type const cc = traits_type::to_char_type( c );
            this->xsputn( &cc, 1 );
        }

        return traits_type::not_eof( c );
    }

private:
    // Override this instead of creating a "write" in Jon's library
    virtual void jon_write( Ch const *s, std::streamsize n ) = 0;

};

template < typename Ch = char >
class inoutresource_streambuf
    : public source_streambuf<Ch>
    , public sink_streambuf<Ch>
{
    typedef source_streambuf<Ch> i_base_type;
    typedef sink_streambuf<Ch> o_base_type;

public:
    // Redefine the standard typedef's here due to dual inheritance
    typedef Ch char_type;

    typedef std::char_traits<Ch> traits_type;

    typedef typename traits_type::int_type int_type;
    typedef typename traits_type::pos_type pos_type;
    typedef typename traits_type::off_type off_type;

protected:
    // Overriden virtual functions
    using i_base_type::xsgetn;
    using i_base_type::underflow;
    using i_base_type::uflow;
    using o_base_type::xsputn;
    using o_base_type::overflow;

};

template < typename Ch = char >
class seekableresource_streambuf
    : public inoutresource_streambuf<Ch>
{
protected:
    // Overriden virtual functions
    virtual pos_type seekoff( off_type off, std::ios_base::seekdir way,
     std::ios_base::openmode which = std::ios_base::in
     | std::ios_base::out )
    {
        off_type result( -1 );

        if ( (std::ios_base::in | std::ios_base::out) == which )
        {
            result = static_cast<off_type>( this->jon_seek(
             static_cast<std::streamoff>( off ), way) );
        }

        return static_cast<pos_type>( result );
    }

    virtual pos_type seekpos( pos_type sp, std::ios_base::openmode
     which = std::ios_base::in | std::ios_base::out )
    {
        return this->seekoff( off_type(sp), std::ios_base::beg, which );
    }

private:
    // Override this instead of creating a "seek" in Jon's library
    virtual std::streamoff jon_seek( std::streamoff off,
     std::ios_base::seekdir way ) = 0;

};
//========================================================================

Notes:
    * The source/sink variants that use a pointer pair could just
      use something like the Pointer-based streams I provided.
    * People who feel that the standard interface is "too hard" would
      simply override the "jon_*" member functions and leave the other
      re-implementations alone.
    * You can even ignore the traits type in your implementations of
      the "jon_*" methods (although I think that could be a bad idea).

What disadvantage would these adapter classes like these have over Jon's
re-imagining over the stream interface? Jon's higher level code could have
been built over something like this (if he really didn't want to optimize an
implementation with direct use of stream-buffer methods).

>> The price is a
>> code size many times larger than the conventional system,
>
> Are you talking about the size of the libray or the size of the generated
> code?

The size of the library.

> Most of the filter and resource infrastructure, as found in headers such as
> <boost/io/io_traits.hpp>, <boost/io/operations.hpp> and
> <boost/io/detail/resource_adapter.hpp>, should compile down to nothing with
> optimizations enabled. A typical instantation of streambuf_facade should be
> only slightly larger, if at all, than a typical hand-written stream buffer.
> The main difference is that more protected virtual function may be implemented
> than are actually needed. Even this can be fixed, if necessary; at this point
> it seems like a premature optimization, unless you have some data.
>
>> and a large chunk
>> of it is a "poor man's" reflection system.
>
> Do you mean the i/o categories? This follows the example of the standard
> library and the boost iterator library. It's better than reflection, since you
> can't get accidental conformance.

No, I'm talking about the code you used to get the existing standard I/O
framework to inter-operate with your framework.

>> 1a. An issue that appeared during the previous review (my More-I/O library)
>> was the necessity of the stream base classes that could wrap custom
>> stream-buffer classes. The complaint was that the wrappers couldn't be a
>> 100% solution because they can't forward constructors, so a final derived
>> class has to be made the manually carries over the constructors. Well, that
>> is true, mainly because C++ generally doesn't provide any member forwarding
>> (besides in a limited way with inheritance). The sample stream-buffer in
>> More-I/O generally had added-value member functions attached, that perform
>> inspection or (limited) reconfiguration. Those member functions also have
>> to be manually carried over to the final derived stream class. The point
>> (after all this exposition) is that the More-I/O framework at least
>> acknowledges support for value-added member functions. The Iostreams
>> framework seems to totally ignore the issue! (So we could have a 90%
>> solution for a quarter of the work, but triple the #included code length!)
>
> With a streambuf_facade or stream_facade you can access the underlying
> resource
> directly using operators * and ->. E.g.,
>
> stream_facade<tcp_resource> tcp("www.microsoft.com", 80);
> ...
> if (tcp->input_closed()) {
> ...
> }
>
> Maybe I should stress this more in the documentation. (I imagine some people
> won't like the use of operators * and -> here, but these can be replaced by a
> member functions such as resource().)

I didn't like the iterator "look" those operations have.

Also, is a stream-façade an actual stream? Does it inherit from
std::basic_istream and/or std::basic_ostream? If not, I think it should
change.

>> 2. Another issue that appeared during the More-I/O review was that plug-ins
>> for Iostreams could be used for other ultimate sources/sinks unrelated to
>> standard streams. What would those be? If a system in question supports
>> Standard C++, it probably uses the Standard I/O framework for (text) I/O
>> If the system doesn't/can't use Standard I/O, it has to use a custom I/O
>> framework. To use it with Iostreams framework plug-ins, an adapter would
>> have to be made for the custom I/O routines. In that case, an adapter could
>> be made for the Standard I/O framework instead. (As I said in the prelude,
>> I'm skipping over binary-I/O concerns.)
>
> I don't follow this.

1. Are there really any important sources/sinks that can't be put through
the existing Standard I/O framework?
2. An existing source/sink, if it wants to work with Standard C++, would
work with the standard framework already.
3. If a source/sink doesn't work with either framework, so there would be a
choice between making an adapter for the standard framework or making an
adapter for your framework, why shouldn't the adapter be made to the
standard framework?

>> To sum it up:
>>
>> I would REJECT the library (at least for now). Maybe chain-filtering
>> stream-buffer and stream base classes could be made instead?
>>
>
> Sorry to hear that -- I hope I can change your mind.

You have a potential problem:
     standard C++ I/O is "too hard"
But you got the wrong solution:
     throw away the standard I/O's legacy and start over from scratch
     (but include transition code)

This is independent of the decisions on memory-mapped files, file
descriptors, binary I/O, and filters. Couldn't all of those been
implemented around the standard framework? (Possibly starting from
something like the simplified abstract base buffers I provided.)

I'm burned out from writing this much, so I'll talk about making a version
of filters that use the standard framework later.

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk