Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-15 14:34:47


"Rob Stewart" <stewart_at_[hidden]> wrote in message:
> > "Rob Stewart" <stewart_at_[hidden]> wrote in message:
> > > From: "Jonathan Turkanis" <technews_at_[hidden]>:
> > > > "Rob Stewart" <stewart_at_[hidden]> wrote in message:
> >
> > > If both remain, then each needs more
> > > information and rationale so users understand which to choose for
> > > a given use case.
> >
> > This should be the explanation:
> >
> > "If you have an existing streambuf implementation and you can't or don't
> > want to reimplement it as a Resource, use streambuf_wrapping.hpp;
> > otherwise, you should probably reimplement is as a Resource and and use
> > stream_facade."
>
> I'm not sure Daryle would agree with that. Anyway, my point is
> that if Boost accepts both libraries, then the two of you need to
> determine the synergies and differences between your libraries
> and ensure users understand the value of each approach.

I see your point now -- someone wants (no apostrophe ;-) to write a
streambuf/stream pair.

    Daryle: Write the streambuf from scratch, then use streambuf_wrapping.hpp
    Jonathan: Write a Resource, then use streambuf_facade and stream_facade

I agree with Jonathan :-)

I've already given most of the reasons in my reply to Dietmar Kuehl, so if you
don't mind, I'll quote myself (sorry for the length):

"Jonathan Turkanis" <technews_at_[hidden]> wrote:
> Writing a Source or Sink lets you express just the core functionality of a
> component, and nothing else. Typcially, all you need to do is inherit from a
> convenience base class and implement one or two of the functions read, write,
> seek and close, which have intuitive names and specifications. (Close can be
> messy, but only in unusal cases.)
>
> The details which you don't need to worry about include:
>
> A) Buffer manipulation. You mention this as eliminating only a small fraction
of
> boilerplate code. I think in a typical buffered implementation of a derived
> basic_streambuf class, a substantial portion of the code is devoted to buffer
> manipulation. More important than the amount of code, however, is the
obscurity
> of the buffer-manipulation interface for those who do not routinely use it. In
> addition to buffer allocation, which is entirely the programmer's
> responsibility, there are ten buffer manipulation functions:
>
> eback, egptr, epptr, gbump, gptr, pbase, pbump, pptr, setg, setp
>
> I'd guess that other than hard-core iostreams junkies few can say exactly what
> all these functions do without consulting documentation. Most people -- even
> some iostreams junkies -- should be pleased not to have to use these functions
> at all.
>
> B) Character traits. Writing a stream buffer from scratch means dealing with
> some or all of the follwing member functions of the character traits type:
>
> eof, not_eof, to_char_type, to_int_type, eq_int_type
>
> and some or all of the member types:
>
> int_type, off_type, pos_type
>
> As above, the functions and types are unfamiliar to many. They also make code
> difficult to read, esp. when prefixed with 'traits_type::'.
>
> By contrast, when writing Filters and Resources, there is no traits type. The
> type streamoff is used everywhere instead of traits_type::off_type. When
writing
> Sources and Sinks the above member functions are not needed at all. When
> writing filters, they are use only in the case of unbuffered input; even here,
> only std::char_traits is used. (For a way to avoid using traits even in this
> case, see 'Future Directions' at http://tinyurl.com/6r8p2.)
>
> C) Protected virtual stream buffer functions. The protected virtual interface
of
> basic_streambuf is, IMO, quite strange. The function have wierd names:
> underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff,
> etc -- the functions read, write, and seek are much more intuitive. The
> specifications of the standard functions are tricky, too. For example,
overflow
> (one of the better-named functions), is specified roughly like this:
>
> virtual int_type overflow(int_type c = traits_type::eof());
>
> "If c is not eof, attempts to insert into the output sequence
> the result of converting c to a character. If this can't be done,
> returns eof or throws an exception. Otherwise returns any value
> other than eof."
>
> Contrast this with
>
> void write(const char_type* s, std::streamsize n);
>
> "Writes the sequence of n characters starting at s to the
> output sequence, throwing an exception in case of error."

I should add: even if someone reads one of the several available books on the
standard iostreams library and decides to write a stream buffer from scratch,
there's a good chance the implementation will suffer one of the following
problems:

(i) buffering will be omitted, since it's hard to do correctly.
(ii) buffering will be provided, but mistakes in pointer arithmetic will cause
subtle errors
(iii) sub-optimal algorithms will be used.

Note that two of Daryle's stream buffers suffer from defect (i).

> > > > > "Peekable" does not imply being able to put back a character.

> > > Many applications have only one level of undo and don't allow
> > > everything to be undone. Consequently, I don't think this is
> > > much of a problem. How about "revertable?"
> >
> > This still sounds too general. Maybe 'PutbackResource'?
>
> That, of course, doesn't follow the "able" convention you've
> established. Otherwise, it does get right to the point clearly.

I know -- that's because 'Putbackable' is ugly, even when joined to 'Resource.'

Here are some other ideas (based on your suggestions and a thesaurus):
RevertableSouce, RestorableSource, UndoableSource, ReinsertableSource.

> > > > In addition, allowing filters to be pushed after a resource would give
many
> > new
> > > > users the impression that they can add filters *after* i/o is in
progress.
> > As
> > > > has been discussed during the review, this is not currently supported;
> > support
> > > > can be added in limited circumstances, but not generally.
> > > >
> > > > Consider:
> > > >
> > > > filtering_ostream out;
> > > > out.push(file_sink("log"));
> > > > out.push(base_64_encoder());
> > > > out << "hello world!\n"; // stream is implicity 'open'
> > > > out.push(zlib_compressor()); // error!
> > >
> > > This won't be a problem with complete() or add_resource().
> >
> > If you mean that the above should be rewritten
> >
> > filtering_ostream out;
> > out.push(file_sink("log"));
> > out.complete(base_64_encoder());
> > out << "hello world!\n";
> > out.push(zlib_compressor()); // error!
> >
> > you may be right that users would be less likely to make this mistake. I
don't
>
> Yes.
>
> > see how add_resource would help at all.
>
> Because "add_resource" was offered as a synonym for "complete."

But here, the component being added with add_resource (the base_64_encoder) is
not a resource at all!

> > I believe the current stack-like interface is elegant and intuitive.
Reversing
> > the order will also be confusing if I adopt JC van Winkel's pipe notation,
which
> > I plan to do. If I adopt both changes, the following would be equivalent:
> >
> > filtering_ostream out;
> > out.push(file_sink("log"));
> > out.push(base_64_encoder());
> > out.complete(newline_filter(newline::windows));
> >
> > ---
> >
> > filtering_ostream out(
> > newline_filter(newline::windows) |
> > base_64_encoder() |
> > file_sink("log") );
>
> The first example is using the proposed, new syntax, so I'd
> prefer to see it written like this:
>
> filtering_ostream out;
> out.push(base_64_encoder());
> out.push(file_sink("log"));
> out.complete(newline_filter(newline::windows));
>
> Then, the second, which is confusing as written, should be:
>
> filtering_ostream out(
> base_64_encoder() |
> file_sink("log") |
> newline_filter(newline::windows));
>
> Then, the two are quite similar.

This seems totally screwy to me. ;-) There are two resonable conventions:

    I. Push the resource first, then push the filters, in order, starting with
the one furthest from the user.
    II. Push the filters first, in order (the reverse of I), starting with the
one closest to the user, then push the resouce.

II is the convention I adopted, for reasons already explained. In the above
example, there are two possibilities:

    I. file_sink <-- base64_encoder <-- newline_filter
    II. newline_filter --> base64_encoder --> file_sink

(The arrows indicate the flow of data.)

I can't see any justification for putting the resource in the middle, as you
have done.

> > > > > _______________________________
> > > > > basic_newline_filter

> > Under your proposal, would a typical construction of a newline_filter look
like
> > this:
> >
> > newline_filter(write_CR, accept_LF | accept_CR | accept_CRLF )
> >
> > instead of
> >
> > newline_filter(write_CR | accept_LF | accept_CR | accept_CRLF )
>
> Yes.

That sounds like a good idea. Then there would be two constructors, used as
follows:

    newline_filter(write_LF | accept_LF | accept_CR | accept_CRLF);
    newline_filter(posix);

I guess this is what you already said.

> > > > > _________________________________________________________________
> > > > > boost::io::reverse

> > > > > That is, why isn't a filter's interface based upon these
> > > > > semantics:
> > > > >
> > > > > char_type filter(char_type ch);
> > > > > streamsize filter(char const * input, streamsize n,
> > > > > char const * output);

> > Neither of your suggested interfaces is sufficient. The first one allows
only
> > character-for-character substitutions. The second, depending on the
>
> I presume, then, that this would work:
>
> boost::optional<char_type>
> filter(char_type ch);

Right. Or a basic_character<char_type>, to deal with the 'no input currently
available -- try back later' case.

> > interpretation of the return value, needs to be augmented to indicate how
many
> > characters of the input sequence or the output sequence were consumed. It's
>
> Easily solved.
>
> > somtimes necessary, e.g., to achieve a good compression ratio, to allow
> > symmetric filters to output fewer characters than possible. In that case,
one
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> I'd like to see that!

This is the case with zlib. The longer it can store up input, the better the
compression ratio it achieves. See

     http://www.gzip.org/zlib/manual.html#deflate

Of course, if I merge the InputFilter and OutputFilter concepts, I can have both
Filter and FlushableFilter concepts without increasing the number of concepts.
(Except I want to use Flushable for something else.)

> I don't quite understand your point, but that's immaterial. It
> sounds like something like this would work:
>
> std::pair<streamsize, streamsize>
> filter(char const * input, streamsize n,
> char const * output);
> Provided those interfaces are close, wouldn't this make writing
> symmetric filters easier?

There are still two problems:

   1. the output buffer is const, which seems wrong.
   2. the filter has no way of knowing the size of one of the two provided
buffers, depending on the interpretation of the streamsize parameter.

So putting aside the issue of flushing, your suggested interface should be

    std::pair<streamsize, streamsize>
    filter( char const* input, streamsize input_size,
             char* output, streamsize output_size );

I consider this interface pretty much equivalent to mine. In fact, I considered
having SymmetricFilters return std::pair<streamsize, streamsize> -- I can't
remember why I chose the present interface. At any rate, I consider them
equivalent and don't see how your version makes things easier.

There's another problem with throwing out the current InputFilter and
OutputFilter concepts. A filter which performs both input and output with two
separate character sequences -- currently called InoutFilter but soon to be
renamed BidirectionalFilter -- needs some way to know whether it's being asked
to perform input or output. So the full interface becomes:

     boost::optional<char_type>
     filter(char_type ch, ios::openmode);

for one-character-at-a-time filtering, and

    std::pair<streamsize, streamsize>
    filter( char const* input, streamsize n,
             char* output, streamsize n, ios::openmode );

for multi-character filtering.

I'm much more comfortable sticking with the simple InputFilter and OutputFilter
concepts, and providing SymmetricFilters for advanced applications. I should
probably put a note in the tutorial that in theory a filtering algorithm is
independent of whether input or output is being performed, and provide links to
the documentation for reverse and for SymmetricFilter.

> > > > There are several choices for this type of passage:
> > > >
> > > > 1. Use the passive voice everywhere.
> > > > 2. Use 'we' -- this sounds natural to me because it's used in
mathematical
> > > > papers.
> > > > 3. Use 'you'
> > > > 4. Use 'the user'
> > > >
> > What about in the ordinary case (not comments, not tutorials)?

> What is an "ordinary" case? Personal correspondence? Scientific
> report? Essay on the current geopolitical state of the world?

Reference documentation.

> > How would you phrase this stuff?
>
> I suggested "location" as a way around "repositionable
> positions." Nevertheless:
>
> Modes can be categorized in several ways...reading or writing
> stream positions are Seekable and, if so, whether there....
>
> and:
>
> Seekable: a single sequence of characters for input and
> output, with a common read/write stream position that can be
> moved to different parts of the sequence.

I like it.

> > > You need a supercategory of Filter and Resource. "Component?"
> >
> > The trouble is I want the concept names to be unique not just within the
library
> > but in a wider context, including the standard library and the rest of
Boost. So
> > the concept name should have IO in it somewhere.
>
> Neither "Filter" nor "Resource" contain "IO." I think you're
> alluding to the generality of "Component" but isn't that a
> problem for "Resource," too?
>
> You can add an "IO" prefix, if you like, but offhand, I can't
> think of anything better.

I'm leaning toward Device, which carries i/o connotations, instead of Resource.
Your thoughts?

Best Regards,
Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk