Boost :

Date view	Thread view	Subject view	Author view

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-08 11:47:03

Next message: Chau Johnthan: "[boost] Re: [ANN] xpressive 0.9"
Previous message: Jeff Garland: "Re: [boost] Re: [admin] Overlapping reviews -- should this be allowed?"
In reply to: Daryle Walker: "Re: [boost] IOStreams formal review start"
Next in thread: Carlo Wood: "Re: [boost] IOStreams formal review start"

"Daryle Walker" <darylew_at_[hidden]> wrote:
> On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:
> > On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:
> >>> "Daryle Walker" <darylew_at_[hidden]> wrote:

> >>>> 1. Aren't memory-mapped files and file descriptors highly platform
> >>>> specific?

> >> But for the thread and file-system libraries, we can define default
> >> behavior.
> >
> > We can do this for memory mapped files as well. Either including the
> > appropriate header could cause a static assertion, or construction of mapped
> > file resources could fail at runtime. Right now I've followed the example of
> > Boost.Filesystem and assumed that every system is either Windows or Posix
> > This can easily be changed to produce more informative errors. Good point
>
> An object that can never be configured to work (for those deficient
> platforms) isn't very useful.

On those platforms, yes. On supported platforms, it is can be very useful.

> I know that thread (and rarely file-system)
> classes have the same potential drawback, but I feel that threads and file
> systems are more general "computer science concepts" than memory mapped
> files, and so allowances could be made for the latter class ideas.

Threads and filesystem support are good additions to boost (and would be to the
standard) because they are useful, not because they are general "computer
science concepts".

> >> Thread-less environments act as if no spare threads can be
> >> allocated.
> >
> > That's not the approach of Boost.Thread, IIRC. If thread support is
> > unavailable, you get a preprocessor error (at least on windows.)
>
> Maybe that should be considered a bug.

It's useful in contexts where thread support can be turned on or off with a
command-line switch. It's probably a bad approach on systems which don't support
threads at all.

> >> Binary I/O only concerns itself with bytes, which is too low-level for text
> >> I/O. There can and should be bridging code, but the concepts of text
> >> sources/sinks should be distinct from binary sources/sinks.
> >
> > This just doubles the number of concepts, for little gain.
>
> Not separating concepts that have notable distinctions is not a service.
> (That's why a separated regular pointer-based streams from the ones for
> pointers-to-const in my library. The "savings" in making only one set of
> class code wasn't worth mixing the semantics of the two stream types.)

What's wrong with this analogy:

Saying that a sequence of characters represents 'text' is like saying that a
sequence of characters represents a 'picture' (i.e., that it conforms to some
image file format specification, such as jpeg, png, etc.)

In order to interpret the data properly, the user must know something about its
internal structure, and must in general apply an additional layer of software
for the content to be usable.

In the case of a sequence of characters representing Chinese text, the user must
apply code conversion to produce a wide character representation. In the case of
a sequence of characters representing a jpeg image, the user must apply a jpeg
interpretter to produce an object representing the image size, pixel data. etc.

In the first case, it would be naive to expect that sending the raw character
sequence to std::cout will print Chinese characters to the console. In the
second case, it would be naive to expect that sending the raw character sequence
to std::cout will display a jpeg image on the console.

So, do we need another family of resource concepts for 'pictures'?

> If you're going to start over from scratch with I/O, why not go all the way
> and finally split-off binary I/O? Stop it from being treated as "text I/O
> with funny settings".

I'm not starting from scratch. I'm trying to make it easier to use the existing
framework. (In the future, the library may be extended beyond the existing
framework.)

> >>> filtering_istream in;
> >>> in.push(regex_filter(regex("damn"), "darn"));
> >>> in.push(zlib_decompressor());
> >>> in.push(file_source("essay.z"));
> >>> // read from in.

> > All that's assumed in this example is that the characters in the essay file
> > can be mapped directly to chars. If they can't, one would have to add a
layer
> > of code conversion (using converter) after the decompression, and use a
> > wide-character filtering stream and wide-character regex_filter.

> That a major implicit assumption.

It's not fundamentally different from the assumption that a sequence of
characters conatins a gif image.

   filtering_istream in;
   in.push(gif_to_jpeg())
   in.push(file_source("pony.gif"));
   // read jpeg data from in.

Trust the programmer.

> >>> Can I rephrase this as follows: InputFilters and OutputFilters are a
useful
> >>> addition to the standard library, but Sources and Sinks just duplicate
> >>> functionality alread present? If this is not your point please correct me
> >>
> >> Yes, that's my point. I looked through your code, and thought "this is
just
> >> a rearrangement of what's already in streams and stream-buffers". I got
> >> really convinced of this once I saw that you added member functions for
> >> locale control.
> >
> > I found I had to add this, rather late in development, to implement
converting
> > streams and stream buffers (which still aren't finished). What's wrong with
> > locales? You say it like it's a dirty word.
>
> I have no problems with locales. I was noting that the more features you
> added to the base classes, the more they looked like the rearrangements of
> the standard I/O base classes.

Localizability is an optional behavior. Most filters and resources won't
implement it. Filters and resources *do not* have to derive from the convenience
base classes source, sink, input_filter, etc. Since localizability was so easy
to add as a no-op, I gave these base classes no-op implementations of imbue and
i/o categories refining localizable_tag.

Programmers will rarely use this feature, but it imposes no runtime overhead and
very little compile-time overhead, so I don't see any problem.

> >> I've recently noticed that even your documentation for the
> >> Resource and Filter concepts admit that they're just like certain C++ or C
> >> I/O functions.
> >
> > You mean when I say, for example,
> >
> > "Filters are class types which define one or more member
> > functions get, put, read, write and seek having interfaces
> > resembling the functions fgetc, fputc, fread, fwrite and fseek
> > from <stdio.h>"
> >
> > ?
>
> Yes. But I was thinking more of the equivalent paragraph you gave in the
> documentation about Resources.

I think I need to change this part of the documentation. Unlike fread, etc, the
basic_streambuf member functions can't be assumed to be familiar to most
programmers. I should probably use istream::read, istream::write, etc. The
reason I didn't is that these functions don't have the right return types, which
is not a good reason since neither does streambuf::sputn.

> > template<typename Ch>
> > class null_buf {
> > public:
> > typedef Ch char_type;
> > typedef sink_tag category;
> > buf() : count_(0) { }
> > void write(const Ch*, std::streamsize n) { count_ += n}
> > int count() const { return count_; }
> > private:
> > int count_;
> > };
> >
> > This will lead to a stream buffer which keeps track of how many characters
> > pass through, is optimized for single vs. multiple character output, *and*
is
> > buffered by default.
>
> I don't see any buffering. (I guess it'll be in whatever class you hook
> this up too, like "streambuf_façade".)

Right.

> Which version, the first or second?

The second.

> (Hopefully the first, since I wrote my
> code above after the first version, and you wrote the second as a response.)
> If it's the first, then what is my version missing? (If it's the second,
> then look at the version of the code under my review before comparing.)

I did. That's how I knew it was 79 lines long. It doesn't provide buffering, as
far as I can tell.

> The traits type carries the policies for comparing and copying (and EOF
> issues). Does the user have the option for overriding policies so they're
> not based on "std::char_traits<Ch>"?

As I said, the only place character traits are used in the public interface of
filters and resources is in the return type of get. For this purpose,
std::char_traits<Ch>::int_type should always be sufficient. At any rate, I'm
considering changing it either to optional<char> or to a class type that can
store a char, and eof indicator, or a 'no input available -- try back later'
indicator. Then there would be absolutely no use of character traits.

If you want to define a stream_facade with a custom char_traits type, you can do
so using the second template parameter.

      template< typename T,
              typename Tr = ...
              typename Alloc = ... >,
              typename Mode = ... >
     class streambuf_facade;

> > What I've tried to do with the library is to factor out the essential
> > functionality necessary to define a stream buffer. I've found that in most
> > cases writing a stream buffer can be reduced to implementing one or two
> > functions with simple names and specifications. It seems like an obvious win
> > to me.
>
> But is it always worth the extra layer of indirection you introduce (when
> you need to interface with standard-looking I/O)?

The indirection, mostly contained in <boost/io/operations.hpp>, is fairly
lightweight. Users never need to look at it. I'm not sure why you're so
concerned about it.

> [SNIP concerns about total code size (in terms of header text length)]
> >>>> and a large chunk of it is a "poor man's" reflection system.
> >>>
> >>> Do you mean the i/o categories? This follows the example of the standard
> >>> library and the boost iterator library. It's better than reflection, since

> >>> you can't get accidental conformance.
> >>
> >> No, I'm talking about the code you used to get the existing standard I/O
> >> framework to inter-operate with your framework.
> >
> > Specifically?
>
> Just the large amount of "detail"-level headers.

Fairly typical for boost, I'm afraid.

> [SNIP about forwarding to the base-stream's value-added functions and on the
> nature of the stream facades.]
> >> 1. Are there really any important sources/sinks that can't be put through
> >> the existing Standard I/O framework?
> >
> > The standard library handles non-blocking, asynchronous and multiplexed i/o
> > awkwardly at best. In contrast, for a generic i/o framework, adding such
> > support should be fairly straightforward. We just need to introduce the
right
> > concepts.
>
> Whoa.
>
> I just had my "a-ha" moment.
>
> I thought you re-did the interface for streaming concepts just to be
> arbitrary. But you actually did it because you have issues about the
> architectural philosophy used by the standard I/O framework, right?! You
> want to fix the problems with current streaming with re-imagining the
> architecture (i.e. starting from scratch), and you decided to re-do the
> interface to match.

As I said above, I don't think I'm redoing it from scratch -- I'm just
generalizing a little. Later, I might generalize even more.

> I guess one issue is that you're extending functionality through templates,
> while the standard framework uses virtual member functions.

I don't think virtual functions are an issue. Virtual function calls are only
slightly more expensive that ordinary (non-inlined) function calls, and one
can't expect all function calls to be inlined when you have a chain of
non-trivial filters. One must relying on buffering to mitigate the function call
overhead.

Since the static types of the filtering streams and stream buffers do not depend
on the static types of the filters and resources in the underlying chain, some
type of runtime indirection, such as virtual functions, is required. I'm
actually taking advantage of the streambuf virtual functions as a feature -- not
a liability. If I didn't have basic_streambuf to serve as the 'glue' for filter
chains, I'd have to write my own version, probably using virtual functions.

> >> 2. An existing source/sink, if it wants to work with Standard C++, would
> >> work with the standard framework already.
> >
> > To summarize: an existing source/sink, if it wants to work with the standard
> > framework, already works with the standard framework?
>
> I meant that existing libraries would have already chosen to base their I/O
> around the standard framework, if they had no need to customize the I/O
> experience.

If the library is accepted -- and becomes widely used -- I except that
developers will want to write sources and sinks instead of stream buffers.
Existing stream buffers can be rewritten as source or sinks fairly easily in
many cases.

> >> You have a potential problem:
> >> standard C++ I/O is "too hard"
> >> But you got the wrong solution:
> >> throw away the standard I/O's legacy and start over from scratch
> >> (but include transition code)
> >
> > I hope it's possible to improve some of the standard library I/O framework
in
> > the future. Perhaps experience with the current library will help form the
> > basis for a proposal. But that's not the point of the current library. The
> > point is to make easy what is currently not-so-easy, and to reduce the
> > difficulty of what is currently very difficult.
>
> I gave an example (the code you snipped) of how the simplified core
> interface could be integrated with the standard framework. What are the
> other difficulties?

I don't understand what's wrong with the way I've done it.

> >> This is independent of the decisions on memory-mapped files, file
> >> descriptors, binary I/O, and filters. Couldn't all of those been
> >> implemented around the standard framework?
> >
> > Of couse -- with massive code duplication.
>
> Duplication where? (My question above assumed that your new architecture
> never existed and you built your other stuff around the standard framework.)

Right. A lot of typcial stream buffer implemention is boilerplate, esp. if
buffering is used.

> About the Overlap Between Our Contributions
>
> A bunch of people during my I/O review wanted to defer decisions to see your
> I/O review. I'm not sure that there's a need to pick one-or-the-other due
> to how they work.

The review managers will sort this out.

> I had no intention of redoing the concepts of I/O, so all my sources and
> sinks extend the standard framework.
>
> You built a whole new framework, hopefully to address problems with the
> standard framework.

Again, I just wanted to make the standard framework easier to use.

> You build the your sources and sinks to work with your
> framework. And you added adaptors so the new-I/O classes can work with
> std-I/O classes.

It's really the other way around. And the adapters are so thin you could crush
them just be leaning against them ;-)

> There's no problems with efficiency if new-I/O is used through-out the
> user's code, since you use a lot of template goodness. However, if the user
> needs to interface with std-I/O, at the user end or the final destination
> end, they will have to take a performance hit since std-I/O will call
> virtual functions which you can't remove. (The guy who writes the
> "xpressive" library seems to have techniques around the problem, but I'm not
> sure they can be applied here. [I don't know what the techniques are.] The
> std-I/O virtual call dispatch takes place in the standard stream classes, so
> the "xpressive" technique can't work if code changes are needed.) In these
> mixed cases, using the new framework can be a win if the applied task takes
> more time in the new framework than in the adaptor code. If the task at
> hand has a std-I/O interface, doesn't touch the issues that new-I/O was
> meant to solve, and can be succinctly expressed with std-I/O, then there is
> no advantage to making and/or using a new-I/O version, since the layer of
> indirection given by the adaptor class is the bigger bottleneck. (The
> pointer-based streams are an example of this.)

I think there's a basic misunderstanding here. The adapters generally have no
virtual functions and function calls through the adapters are optimized away
entirely. (I've confirmed this on several compilers. It should be true for any
decent optimizing compiler.) There is currently an inefficiency when you add a
standard stream or stream buffer to the end of a filtering stream, as I describe
in the message "IOStreams Formal review -- Guide for Reviewers". This will be
eliminated entirely if the library is accepted.

> The point is that one set of class doesn't preclude the usage of the other.
> Each one has situations where it's the better solution.

As far as I can tell, the two valid points you have made, w.r.t. our two
contributions, are:

1. Using my library to define a null_buff, pointerbuf or value_buf causes more
code to be included. This is a legitimate criticism, but I don't think you've
made the case that the amount of code included is so enormous that there should
be two versions of the same components in boost.

2. The object code will be slightly larger when using a streambuf_facade
(actually, I'm not sure you made that point, but I think it's correct.) This can
be mitigated somewhat if it turns out to be a problem, but I don't think you
have shown yet that it is.

Best Regards,
Jonathan

Next message: Chau Johnthan: "[boost] Re: [ANN] xpressive 0.9"
Previous message: Jeff Garland: "Re: [boost] Re: [admin] Overlapping reviews -- should this be allowed?"
In reply to: Daryle Walker: "Re: [boost] IOStreams formal review start"
Next in thread: Carlo Wood: "Re: [boost] IOStreams formal review start"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk