Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2004-09-08 03:34:04


On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:

> On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews_at_[hidden]> wrote:
>
>>> "Daryle Walker" <darylew_at_[hidden]> wrote:
>
>>>> 1. Aren't memory-mapped files and file descriptors highly platform
>>>> specific?
>>>
>>> Yes, just like threads, sockets and directory iteration.
>>>
>>>> Code that works with them would have to be non-portable, so I
>>>> don't think they're appropriate for Boost.
>>>
>>> It achieves portability the same way boost.thread and boost.filesystem do:
>>> by having separate implementations for different systems. See
>>> http://www.boost.org/more/imp_vars.htm ("Implementation variations").
>>
>> But for the thread and file-system libraries, we can define default
>> behavior.
>
> We can do this for memory mapped files as well. Either including the
> appropriate header could cause a static assertion, or construction of mapped
> file resources could fail at runtime. Right now I've followed the example of
> Boost.Filesystem and assumed that every system is either Windows or Posix
> This can easily be changed to produce more informative errors. Good point

An object that can never be configured to work (for those deficient
platforms) isn't very useful. I know that thread (and rarely file-system)
classes have the same potential drawback, but I feel that threads and file
systems are more general "computer science concepts" than memory mapped
files, and so allowances could be made for the latter class ideas.

>> Thread-less environments act as if no spare threads can be
>> allocated.
>
> That's not the approach of Boost.Thread, IIRC. If thread support is
> unavailable, you get a preprocessor error (at least on windows.)

Maybe that should be considered a bug.

>> All file-systems can simulate a tree/container calculus, so a
>> portable interface can be defined.
>
> Again, Boost.Filesystem doesn't do this.

Considering the discussions of issues that Boost.File-system brings up,
maybe it should do what I suggested. (Will give more information if a
Boost.File-system person asks.)

>> But memory-mapped files and file
>> descriptors are totally meaningless on some environments; what would the
>> code map to in those cases?
>
> See above.
>
>>
>>>> 2. This library does what a lot of other text-I/O libraries do, try to fit
>>>> in "kewl" compression schemes. The problem is that the types of
>>>> compression here are binary oriented; they convert between sets of byte
>>>> streams. However, characters are not bytes (although characters, like other
>>>> types, are stored as bytes).
>>>
>>> Are you saying there are problems with the implementation of the compression
>>> filters, e.g., that they make unwarranted assumptions about 'char'? If so,
>>> please let me know. I'm sure it can be fixed.
>>
>> I'm complaining that binary I/O should _not_ be treated as a variant of text
>> I/O (which your library assumes).
>
> All I/O is treated as streams of characters. When these streams of characters
> require special 'textual' interpretation, you can use a newline_filter, for
> line-ending conversion, or a converter, for code conversion.
>
>> Binary I/O only concerns itself with bytes, which is too low-level for text
>> I/O. There can and should be bridging code, but the concepts of text
>> sources/sinks should be distinct from binary sources/sinks.
>
> This just doubles the number of concepts, for little gain.

Not separating concepts that have notable distinctions is not a service.
(That's why a separated regular pointer-based streams from the ones for
pointers-to-const in my library. The "savings" in making only one set of
class code wasn't worth mixing the semantics of the two stream types.)

>>> I don't see the iostream framework as relating to text streams only: streams
>>> can handle text and binary. In some cases, you want text and binary to work
>>> together.

This is why I'm concerned about the text vs. binary issues:

In (old) C, the "char" type was used to represent character data. It also
was used to represent individual bytes. The problem is that C meshed the
two concepts together, which I disagree with. Due to this equivalence, some
of the text I/O functions were given a "binary mode" that suppresses any
text/binary translation. (To muddy the waters further, that translation was
a no-op on C's first environment, UNIX.) Later on, C got more power in the
character processing department with "wchar_t" and a locale system, but it
never ungrouped binary I/O as a "subset" of text I/O.

C++ encapsulated I/O in a class, but followed a path similar to C. It was
"char" only, then developed "wchar_t" and locale support. Further, the
character type was generalized with templates, which also added support for
changing the operation policies with a traits class.

C and C++ added more inherent features for I/O that were text-based. Binary
I/O stayed as a switch away from text I/O because it was "good enough," even
though binary I/O doesn't need extended character types, traits types, and
locales. (Translating objects to/from byte sequences would take place in a
higher layer.)

If you're going to start over from scratch with I/O, why not go all the way
and finally split-off binary I/O? Stop it from being treated as "text I/O
with funny settings".

>>> E.g., suppose you have a compressed text file ("essay.z") and you
>>> want to read a 'family-friendly' version of it. You can do so as follows:
>>>
>>> filtering_istream in;
>>> in.push(regex_filter(regex("damn"), "darn"));
>>> in.push(zlib_decompressor());
>>> in.push(file_source("essay.z"));
>>> // read from in.
>>>
>>> Isn't this perfectly natural and convenient? What's wrong with using the
>>> decompressor and the regex filter in the same chain?
>>
>> By itself, nothing. But these compression schemes only work with bytes, so
>> you have hidden at least one text <-> binary converter in your code.
>
> (BTW, the file_source above should have been opened in binary mode.)

OK.

> All that's assumed in this example is that the characters in the essay file
> can be mapped directly to chars. If they can't, one would have to add a layer
> of code conversion (using converter) after the decompression, and use a
> wide-character filtering stream and wide-character regex_filter.

That a major implicit assumption.

> If the above example were disallowed, then in the common case that output is
> stored in a form which can be directly mapped to the internal character set
> without code conversion, the user would be forced to insert a do-nothing
> adapter.

So you're trying to optimize code that takes advantage of the "char" vs.
byte "equivalence".

> The current library trusts users to know when they are dealing with data which
> must be converter to a wide character type before it can be processed by
> text-oriented filters.
>
>>> Can I rephrase this as follows: InputFilters and OutputFilters are a useful
>>> addition to the standard library, but Sources and Sinks just duplicate
>>> functionality alread present? If this is not your point please correct me
>>
>> Yes, that's my point. I looked through your code, and thought "this is just
>> a rearrangement of what's already in streams and stream-buffers". I got
>> really convinced of this once I saw that you added member functions for
>> locale control.
>
> I found I had to add this, rather late in development, to implement converting
> streams and stream buffers (which still aren't finished). What's wrong with
> locales? You say it like it's a dirty word.

I have no problems with locales. I was noting that the more features you
added to the base classes, the more they looked like the rearrangements of
the standard I/O base classes.

>> I've recently noticed that even your documentation for the
>> Resource and Filter concepts admit that they're just like certain C++ or C
>> I/O functions.
>
> You mean when I say, for example,
>
> "Filters are class types which define one or more member
> functions get, put, read, write and seek having interfaces
> resembling the functions fgetc, fputc, fread, fwrite and fseek
> from <stdio.h>"
>
> ?

Yes. But I was thinking more of the equivalent paragraph you gave in the
documentation about Resources.

> The functions boost::io::read, boost::io::write, etc., are indeed generic
> versions of these familiar functions. I mention the familiar functions as a
> way to introduce readers to the generic versions. The benefits of generic
> programming are well known, I hope.
>
>>> There are two main resons to write Sources and Sinks instead of stream
>>> buffers:
>>>
>>> 1. Sources and Sinks and sinks express just the core functionality of a
>>> component. Usually you have to implement just one or two functions with very
>>> natural interfaces. You don't have to worry about buffering or about putting
>>> back characters. I would have thought it would be obvious that it's easier
>>> to write:
>>>
>>> template<typename Ch>
>>> struct null_buf {
>>> typedef Ch char_type;
>>> typedef sink_tag category;
>>> void write(const Ch*, std::streamsize) { }
>>> };
>>>
>>> than to write your null_buf, which is 79 lines long.
>>
>> That really misleading. The null-sink I have does a lot more. I keep track
>> of how many characters passed through (i.e. a value-added function), and I
>> optimize for single vs. multiple character output.
>
> Okay,
>
> template<typename Ch>
> class null_buf {
> public:
> typedef Ch char_type;
> typedef sink_tag category;
> buf() : count_(0) { }
> void write(const Ch*, std::streamsize n) { count_ += n}
> int count() const { return count_; }
> private:
> int count_;
> };
>
> This will lead to a stream buffer which keeps track of how many characters
> pass through, is optimized for single vs. multiple character output, *and* is
> buffered by default.

I don't see any buffering. (I guess it'll be in whatever class you hook
this up too, like "streambuf_façade".)

>> Also, I'm verbose in my
>> writing style. If I wanted to be compact I could just do:
>>
>> //========================================================================
>> template < typename Ch, class Tr = std::char_traits<Ch> >
>> class basic_nullbuf
>> : public std::basic_streambuf<Ch, Tr>
>> {
>> protected:
>> // Overriden virtual functions
>> virtual int_type overflow( int_type c = traits_type::eof() )
>> { return traits_type::not_eof( c ); }
>> };
>
> But that doesn't do what my version, listed above, does.

Which version, the first or second? (Hopefully the first, since I wrote my
code above after the first version, and you wrote the second as a response.)
If it's the first, then what is my version missing? (If it's the second,
then look at the version of the code under my review before comparing.)

>> And for those of you who think that "traits_type" is scary: get over it!
>> Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and
>> WRONG. The whole point of the traits class is so that a character type
>> isn't forced to define those operators. Worse, those operators could exist
>> but be inappropriate. For example, Josuttis' STL book has a string type
>> that implements case-insensitive comparisons with a custom traits type.
>> Using operator== directly would have missed that. Ignoring the policies of
>> the traits type's creator could betray his/her vision of usage.
>
> In early versions of my library, filters and resources had traits types as
> well as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a
> careful study and found that traits could be eliminated from the public
> interface of the filter/resource module of the library without sacrificing
> generality or correctness, except in the case of the return type of get, which
> is still
>
> std::char_traits<char_type>::int_type.
>
> Even this could be eliminated by having get return optional<char>. For a more
> ambitious proposal along these lines, see http://tinyurl.com/6r8p2.
>
> Of course, filter and resources authors may need to use char_traits to
> implement member functions read, write, etc. .... But I'm not sure I see where
> this discussion is going.

The traits type carries the policies for comparing and copying (and EOF
issues). Does the user have the option for overriding policies so they're
not based on "std::char_traits<Ch>"?

>>> 2. Sources and sinks can be reused in cases where standard streams and
>>> stream buffers are either unnecessary or are not the appropriate
>>> abstraction. For example, suppose you want to write the concatenation of
>>> three files to a string. You can do so like this:
>>>
>>> string s;
>>> boost::io::copy(
>>> concatenate(
>>> file_source("file1"),
>>> file_source("file2"),
>>> file_source("file3")
>>> ),
>>> back_insert_resource(s)
>>> );
>
>> A straw-man? Wouldn't an iterator-based solution have been better? (There
>> are stream(-buffer) iterators, and (string) insert iterators. If the Boost
>> iterator library provides a chaining iterator type, then the standard
>> copying procedure could be used.)
>
> It's tempting to try to do everything using iterators. In fact, Robert Ramey's
> original suggestion to expand the library to handle filtering suggested that
> it be based on iterator adapters.
> (http://lists.boost.org/MailArchives/boost/msg48300.php)

Interesting.

> The problem with this approach is that it misses the opportunity for many
> important optimizations that can be made when one is presented with a
> contiguous buffer full of characters, instead of one character at a time.

OK.

>>>> The whole framework seems like "I/O done 'right'", a "better"
>>>> implementation of the ideas/concepts shown in the standard I/O framework.
>>>
>>> I'd say thanks here if 'right' and 'better' weren't in quotes ;-)
>>
>> It looked like you changed the interface just to change the interface, not
>> out of any actual need. What about the following (untested) code:
[SNIPped class templates derived from std::basic_streambuf<> that contain
pure virtual member functions from Jon's idea of the simplified interface.
The current stream-buffer member functions that handle the same issue just
forward to the new member function.]

> Instead, let me quote part of my response to Dietmar Kuehl:
>
> Jonathan Wrote:
>> ... The protected virtual interface of basic_streambuf is, IMO, quite
>> strange. The function have wierd names: underflow, uflow, pbackfail,
>> overflow, showmanyc, xsptun, xsgetn, seekoff, etc -- the functions read,
>> write, and seek are much more intuitive. The specifications of the standard
>> functions are tricky, too. For example, overflow (one of the better-named
>> functions), is specified roughly like this:
>>
>> virtual int_type overflow(int_type c = traits_type::eof());
>>
>> "If c is not eof, attempts to insert into the output sequence
>> the result of converting c to a character. If this can't be done,
>> returns eof or throws an exception. Otherwise returns any value
>> other than eof."

(BTW, notice that the public members of "basic_streambuf" that may call
"overflow" can't call it with EOF. I'm guess that using EOF means that
"overflow" should do the output-specific flushing. That code should not be
directly written in the "sync" member function [as it's usually done 99% of
the time]; "sync" should instead call "overflow(EOF)" and also do any
input-specific flushing.)

>> Contrast this with
>>
>> void write(const char_type* s, std::streamsize n);
>>
>> "Writes the sequence of n characters starting at s to the
>> output sequence, throwing an exception in case of error."
>
> What I've tried to do with the library is to factor out the essential
> functionality necessary to define a stream buffer. I've found that in most
> cases writing a stream buffer can be reduced to implementing one or two
> functions with simple names and specifications. It seems like an obvious win
> to me.

But is it always worth the extra layer of indirection you introduce (when
you need to interface with standard-looking I/O)?

[SNIP concerns about total code size (in terms of header text length)]
>>>> and a large chunk of it is a "poor man's" reflection system.
>>>
>>> Do you mean the i/o categories? This follows the example of the standard
>>> library and the boost iterator library. It's better than reflection, since
>>> you can't get accidental conformance.
>>
>> No, I'm talking about the code you used to get the existing standard I/O
>> framework to inter-operate with your framework.
>
> Specifically?

Just the large amount of "detail"-level headers.

[SNIP about forwarding to the base-stream's value-added functions and on the
nature of the stream facades.]
>> 1. Are there really any important sources/sinks that can't be put through
>> the existing Standard I/O framework?
>
> The standard library handles non-blocking, asynchronous and multiplexed i/o
> awkwardly at best. In contrast, for a generic i/o framework, adding such
> support should be fairly straightforward. We just need to introduce the right
> concepts.

Whoa.

I just had my "a-ha" moment.

I thought you re-did the interface for streaming concepts just to be
arbitrary. But you actually did it because you have issues about the
architectural philosophy used by the standard I/O framework, right?! You
want to fix the problems with current streaming with re-imagining the
architecture (i.e. starting from scratch), and you decided to re-do the
interface to match.

I guess one issue is that you're extending functionality through templates,
while the standard framework uses virtual member functions.

>> 2. An existing source/sink, if it wants to work with Standard C++, would
>> work with the standard framework already.
>
> To summarize: an existing source/sink, if it wants to work with the standard
> framework, already works with the standard framework?

I meant that existing libraries would have already chosen to base their I/O
around the standard framework, if they had no need to customize the I/O
experience.

>> You have a potential problem:
>> standard C++ I/O is "too hard"
>> But you got the wrong solution:
>> throw away the standard I/O's legacy and start over from scratch
>> (but include transition code)
>
> I hope it's possible to improve some of the standard library I/O framework in
> the future. Perhaps experience with the current library will help form the
> basis for a proposal. But that's not the point of the current library. The
> point is to make easy what is currently not-so-easy, and to reduce the
> difficulty of what is currently very difficult.

I gave an example (the code you snipped) of how the simplified core
interface could be integrated with the standard framework. What are the
other difficulties?

>> This is independent of the decisions on memory-mapped files, file
>> descriptors, binary I/O, and filters. Couldn't all of those been
>> implemented around the standard framework?
>
> Of couse -- with massive code duplication.

Duplication where? (My question above assumed that your new architecture
never existed and you built your other stuff around the standard framework.)

****************

About the Overlap Between Our Contributions

A bunch of people during my I/O review wanted to defer decisions to see your
I/O review. I'm not sure that there's a need to pick one-or-the-other due
to how they work.

I had no intention of redoing the concepts of I/O, so all my sources and
sinks extend the standard framework.

You built a whole new framework, hopefully to address problems with the
standard framework. You build the your sources and sinks to work with your
framework. And you added adaptors so the new-I/O classes can work with
std-I/O classes.

There's no problems with efficiency if new-I/O is used through-out the
user's code, since you use a lot of template goodness. However, if the user
needs to interface with std-I/O, at the user end or the final destination
end, they will have to take a performance hit since std-I/O will call
virtual functions which you can't remove. (The guy who writes the
"xpressive" library seems to have techniques around the problem, but I'm not
sure they can be applied here. [I don't know what the techniques are.] The
std-I/O virtual call dispatch takes place in the standard stream classes, so
the "xpressive" technique can't work if code changes are needed.) In these
mixed cases, using the new framework can be a win if the applied task takes
more time in the new framework than in the adaptor code. If the task at
hand has a std-I/O interface, doesn't touch the issues that new-I/O was
meant to solve, and can be succinctly expressed with std-I/O, then there is
no advantage to making and/or using a new-I/O version, since the layer of
indirection given by the adaptor class is the bigger bottleneck. (The
pointer-based streams are an example of this.)

The point is that one set of class doesn't preclude the usage of the other.
Each one has situations where it's the better solution.

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk