Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-14 15:20:43


"Rob Stewart" <stewart_at_[hidden]> wrote in message:
> From: "Jonathan Turkanis" <technews_at_[hidden]>:
> > "Rob Stewart" <stewart_at_[hidden]> wrote in message:

> > streambuf_wrapping doesn't really overlap with the present library either. I
> > voted to reject because I thought it didn't simplify writing streams enough
to
> > justify including it. If it is accepted, however, I don't see any real
conflict.
>
> I see a conflict because of the dramatically different approaches
> to "wrapping" a streambuf.

My library doesn't allow users to wrap a streambuf. stream_facade wraps a
streambuf internally, but that's an implementation detail.

> If both remain, then each needs more
> information and rationale so users understand which to choose for
> a given use case.

This should be the explanation:

    "If you have an existing streambuf implementation and you can't or don't
    want to reimplement it as a Resource, use streambuf_wrapping.hpp;
    otherwise, you should probably reimplement is as a Resource and and use
    stream_facade."

> > So the real question is: if one or more of null_buf, pointerbuf array_buf
and
> > value_buf (I may be spelling these wrong) is accepted, would it be
advantageous
> > to reimplement it using the present library.
> >
> > There are two issues:
> >
> > 1. Performance
> >
> > Reimplementing pointerbuf and array_buf using the present library should
produce
> > exactly the same runtime performce, since resources which present their
> > character sequences as in-memory arrays are specially optimized (see
> > direct_streambuf.fppp at http://tinyurl.com/3qrsx).
>
> Great.
>
> > Reimplementing null_buf and value_buf should improve runtime performance,
since
> > the implementations from More IO are not buffered.
>
> The question, then, is whether MoreIO should add buffering to
> remove that difference.
>
> > The reimplemented versions of null_buf and value_buf will have somewhat
larger
> > generated code size. Perhaps this would also be true for pointerbuf and
> > array_buf.
>
> With the addition of buffering, this nebulous difference may be
> moot. Real numbers will make it possible to judge the value of
> both approaches, of course.

>
> > 2. Interface.
> >
> > Streams and stream buffers generated using the present library have a simple
> > open/is_open/close interface similar to std::fstream. I think including the
> > streams and stream buffers from More IO as is, rather than giving them a
> > compatible interface, will be confusing to users.
>
> This is one of the reasons why I'm concerned about keeping both.
> I like the simplicity of your approach, especially since the code
> size and performance is nearly the same. Nevertheless, without
> hard numbers on these differences, it's hard to judge
> objectively. There's also the issue of whether MoreIO would be
> of more benefit to those doing embedded C++, for example.

There are some memory-use optimizations which I didn't apply, to avoid further
obfuscating the code. For instance, output-only streambuf_facades have a
pback_size_ member, and streambuf_facade's representing Resources have a next_
pointer which is always null. This can be eliminated with metaprogramming.

> > > _______________________________
> > > Design Evaluation
> > >
> > > The library is well conceived and executed. The design seems
> > > sound. There are still questions regarding the overlap and
> > > differences between this library and MoreIO. There are
> > > fundamental, philosophical differences between the libraries.
> >
> > Could you elaborate?
>
> Daryle thinks the Standard IOStreams I/F is acceptable as is and
> worked within it. You hide the complexities of that I/F behind a
> minimal, but sufficient -- at least until async is dealt with
> fully -- I/F. His approach is documented in the standard and
> many C++ books.

I provide a framework to make it easier to define new stream buffers. Daryle
provides implementations of a few specific stream buffers. Since I don't claim
that all stream buffers should be constructed using my framework, and since
users of Daryle's stream buffers will generally never need to know how they are
implemented, I don't see any conflict.

> > > _________________________________________________________________
> > > Design Comments
> > > _______________________________
> > > Miscellaneous
> > >
> > > Wouldn't the InoutResource concept be better named
> > > "BidirectionalResource," though that is longer? (Same for
> > > InoutFilter.)
> >
> > Bidirectional was the name I first gave it (http://tinyurl.com/4gskg). I
changed
> > it because the filters and resources in question manage two separate
sequences,
> > whereas bidirectional iterators represent a single sequence. I chose 'inout'
> > because such components act like an input component glued to an output
> > component.
> >
> > I don't like the name much, and would happily revert to bidirectional, I
people
> > don't feel that it's misleading.
>
> Generally, "bidirectional" doesn't convey any information
> regarding a common or separate buffer, so I think it works fine.

Good -- if there are no objections, I'll switch.

> > > "Peekable" does not imply being able to put back a character.
> >
> > I know. The natural choice would have been Putbackable, which sounds
terrible. I
> > choice Peekable because if you can put back a character than you can peek
ahead
> > in the stream by reading a character and then putting it back.
> >
> > > Wouldn't "Undoable" be a better name for this optional behavior?
> >
> > It sounds a bit to general. For instance, an Undoable Sink might allow
> > characters already written to be cancelled.
>
> Many applications have only one level of undo and don't allow
> everything to be undone. Consequently, I don't think this is
> much of a problem. How about "revertable?"

This still sounds too general. Maybe 'PutbackResource'?

> > > The normal template parameter abbreviation for a character type
> > > is "Char." I don't think abbreviating it to "Ch" is helpful.
> >
> > The standard uses charT; so do Langer and Kreft. The Dinkumware docs use
Elem. I
> > don't recall seeing Char, but I'm not against it.
>
> You're not using the "T" suffix, so "Char" is the same as "charT"
> in my estimation.

Neither the standard nor Langer and Kreft use the convention that template
parameters should be lowercase identifiers with 'T' appended. So I don't see why
"Char" is the same as "charT". But Char is slightly more explicit, so I think I
may switch.

> > > There's no information provided on the performance of this
> > > library. How much overhead does it add compared to custom
> > > solutions?
> >
> > Performance comparable to hand-written components is listed as one of the
main
> > design goals. (Rationale-->Design Goals-->Item 3, at
http://tinyurl.com/3z9ou).
> > The footnote to that item (http://tinyurl.com/4th3v) mentions some
performace
> > comparisons I performed.
>
> The information, as I recall, wasn't rigorous, or at least not
> reported that way.

It was fairly rigorous, but not reported that way. I didn't finish the project
and publich the results because

- It was taking a long time :-)
- I couldn't claim that my tests represented typical use cases
- Initial results satisfied me that at most some fine-tuning would be required;
no major overhaul was needed.

> > To summarize, I found that for output, stream_facade<file_descriptor_sink>
was
> > slightly faster than the Dinkumware std::ofstream (but it does no
> > synchonization.) Since the overhead of filesystem access might be expected
to
> > dominate in that comparison, I also tested a custom ostringstream against
the
> > Dinkumware component, and found identical performance.
>
> Yes, I recall that now. It does suggest that your library does
> about as well as can be done. This should be discussed and
> highlighted better, I think. Also, a wider range of tests on
> many compilers would lend greater credibility to the conclusions.

Once I get some tests which I believe represent typical use cases, I can publish
results for a wide range of platforms.

> > > For example, how much memory and runtime overhead is
> > > there wrt MoreIO's mechanisms?
> >
> > There should be slightly more memory overhead with the reimplementation of
> > null_buf and value_buf using my library, since it provides buffering by
default.
>
> The only fair comparison is, I think with both buffered.
> Conversely, what about a version from your library that isn't
> buffered?

Depending on whether a streambuf_facade represents a single sequence or two
separate sequences,
streambuf_facade has one or two members representing buffers. Since buffering
has to be turned off at runtime, there's no way to eliminate these members for
unbuffered streambuf_facades. (This will change if I add a buffering policy as a
template parameter.) If the buffers are never allocated, the overhead of each is
unused member variables of type char* and a std::streamsize.

> > > The filter chain ordering for input versus output is awkward. It
> > > should always be from the beginning to end; let the library
> > > reverse the order.
> >
> > I'm surprised nobody mentioned this earlier.
> >
> > Having the resource at the end dramatically simplifies the interface. It
allows
> > an instance filtering_stream of filtering streambuf to be considered 'open'
as
> > soon as a resource is pushed, and 'closed' as soon as it is popped. This
allows
> > a simple stack-like interface. The alternative would be
>
> I understand.
>
> > (i) to add open/close functions (which are part of the interface of
> > streambuf_facade, but would currently be redundant for filtering streams, OR
>
> I don't think this would be a problem, but I don't see -- and I'm
> not going to go look right now, sorry -- why it is redundant for
> filtering streams. Also, couldn't the function be called
> "complete" or "add_resource?"

It's redundant because adding a resource is currently the equivalent of 'open',
and popping a resource is the equivalent of 'close'.

> > In addition, allowing filters to be pushed after a resource would give many
new
> > users the impression that they can add filters *after* i/o is in progress.
As
> > has been discussed during the review, this is not currently supported;
support
> > can be added in limited circumstances, but not generally.
> >
> > Consider:
> >
> > filtering_ostream out;
> > out.push(file_sink("log"));
> > out.push(base_64_encoder());
> > out << "hello world!\n"; // stream is implicity 'open'
> > out.push(zlib_compressor()); // error!
>
> This won't be a problem with complete() or add_resource().

If you mean that the above should be rewritten

   filtering_ostream out;
   out.push(file_sink("log"));
   out.complete(base_64_encoder());
   out << "hello world!\n";
   out.push(zlib_compressor()); // error!

 you may be right that users would be less likely to make this mistake. I don't
see how add_resource would help at all.

I believe the current stack-like interface is elegant and intuitive. Reversing
the order will also be confusing if I adopt JC van Winkel's pipe notation, which
I plan to do. If I adopt both changes, the following would be equivalent:

   filtering_ostream out;
   out.push(file_sink("log"));
   out.push(base_64_encoder());
   out.complete(newline_filter(newline::windows));

   ---

   filtering_ostream out(
      newline_filter(newline::windows) |
      base_64_encoder() |
      file_sink("log") );

> > > _______________________________
> > > boost::io::converter

> > template< typename Resource,
> > typename Codecvt = fetch_from_global_locale >
> >
> > If Codecvt::extern_type does not match the character type of Resource, but
> > Codecvt::inter_type does, the codecvt will be automatically 'reversed'.
> ^^^^^^^^^^
> int_type, I presume

It shoudl be intern_type. In other words, the codecvt would be used *as is* if
its external character type is the same as the character type of Resource. It
would then be a normal 'widening' codevt. If, OTOH, its internal character type
matches the character type of Resource, the codevt would be 'reversed' so that
it would be a 'narrowing' codecvt. (As I mentioned elsewhere, the terms
'widening' and 'narrowing' are somewhat misleading.)

E.g.,

     converter< file_descriptor_source,
                      utf8_codecvt_facet >

would allow one to read a sequence of wide characters from a file containing
multibyte characters. OTOH

     converter< array_source<wchar_t>,
                      utf8_codecvt_facet >

would allow one to read a sequence of multibyte characters from an array
containing wide characters.

The first use would be by far the most common. If the proposed formulation is
considered too error prone, a boolean template parameter could be added:

      template< typename Resource,
                      typename Codecvt = fetch_from_global_locale,
                      bool Reverse = false >
      class converter;

> > > _______________________________
> > > basic_newline_filter
> > >
> > > I realize that this is only an example, but there is no apparent
> > > protection from misconfiguring the constructor flags. Grouping
> > > the related options into enumerated types and taking separate
> > > parameters for each group would make it safer to use. (You can
> > > overload the bitwise OR operator to permit combining them
> > > conveniently.) That is, since print_CR, print_LF, and print_CRLF
> > > are mutually exclusive, they should be part of an enumerated type
> > > that does not permit bitwise OR'ing. Since the posix, mac, and
> > > windows values are meant to supplant all of the other values,
> > > they should be part of their own enumerated type and should be
> > > used in a separate constructor. The remaining options can be
> > > part of a third enumerated type, with bitwise OR support, that
> > > constitutes the second argument to the existing constructor.
> >
> > I was trying to keep things simple. Specifying more than one of print_CR,
> > print_LF and print_CRLF yeilds a runtime error. Even if print_xxx had it's
own
> > enumeration type, it would still be possible to specify illegal values.
>
> If no operators are defined for the enumerated type, then one
> must go out of one's way to provide illegal values.

Under your proposal, would a typical construction of a newline_filter look like
this:

     newline_filter(write_CR, accept_LF | accept_CR | accept_CRLF )

instead of

     newline_filter(write_CR | accept_LF | accept_CR | accept_CRLF )

?

> > > _________________________________________________________________
> > > boost::io::reverse

> > > That is, why isn't a filter's interface based upon these
> > > semantics:
> > >
> > > char_type filter(char_type ch);
> > > streamsize filter(char const * input, streamsize n,
> > > char const * output);

> > The problem with making symmetric filters the default (or exclsuive) filter
> > concept is that it's unexpectedly difficult to implement them correctly.

> I don't recall reading information on this rationale. It should
> be part of the docs, maybe a FAQ.

Okay.

> However, I'd like more
> information on why filters are difficult to write
> bidirectionally. The I/F I mention above seems quite simple.

Neither of your suggested interfaces is sufficient. The first one allows only
character-for-character substitutions. The second, depending on the
interpretation of the return value, needs to be augmented to indicate how many
characters of the input sequence or the output sequence were consumed. It's
somtimes necessary, e.g., to achieve a good compression ratio, to allow
symmetric filters to output fewer characters than possible. In that case, one
needs a boolean parameter to instruct the filter to flush it's buffers.

As an example of the difficulty of writing symmetric filters, look at
http://tinyurl.com/6bu23. It took my two hours to get the
toupper_symmetric_filter to work properly. (Note that I added the internal
buffer just to simulate a realistic use case.)

If you're not convinced, try implementing the example filters as Symmetric
Filters. Not impossible by any means, but substantially less intuitive. Also,
even when the basic idea of the implementation is clear, it's easy to make
mistakes with pointer arithmetic.

> > > _________________________________________________________________
> > > Documentation Comments

> > > _________________________________________________________________
> > > tutorial.html
> >
> > > " Both streambuf_facade and stream_facade have constructors and
> > > overloads of open which forward their arguments to filter or
> > > ^^^^^^^^^
> > > to the filter
> > >
> > > resource constructor. Therefore, the first example could have
> > > ^^^^^
> > > previous
> >
> > I see a grammatical problem here -- "forward their arguments to filter or
> > resource constructor" -- but I don't understand what changes you are
suggesting.
>
> "...forward their arguments to the filter or resource
> constructor. Therefore, the previous example could...."

Got it.

> > There are several choices for this type of passage:
> >
> > 1. Use the passive voice everywhere.
> > 2. Use 'we' -- this sounds natural to me because it's used in mathematical
> > papers.
> > 3. Use 'you'
> > 4. Use 'the user'
> >
> > Which should one prefer? Settling on one of the above and using it
consistently
> > should resolve many of your (snipped) suggestions/objections below.
>
> I use we in comments (design notes, and such), and you in
> tutorial type text. After all, I'm teach "you," the reader in
> such text. I think you should write the docs as if you are
> writing to a single reader. That makes it easiest to get things
> consistent.

What about in the ordinary case (not comments, not tutorials)?

> > > _______________________________
> > > Regex OutputFilter
> > >
> > > "This example uses a regex_filter to remove C and C++-style
> > > comments from a code excerpt. Although somewhat more
> > > ^^^^^^^^^^^^^^^^^^^
> > > omit
> > >
> > > sophisticated than the Uncommenting InputFilter, above, the
> >
> > What's the problem here?
>
> It sounds as though the filter is only good for a hard coded
> string or something. Omitting "from a code excerpt" makes it
> sound more general.

Okay.

> > > _______________________________
> > > Overview
> > >
> > > "...an InputFilter, which filters input read from a Source, and
> > > an OutputFilter, which filters output written to a Sink."
> > >
> > > I suggest a little rephrasing to make this read better and fill
> > > in a couple of gaps:
> > >
> > > an InputFilter, which filters input read from a Source or
> > > another InputFilter, and an OutputFilter, which filters output
> > > read from a Source or another OutputFilter, before it is
> > > written to a Sink.
> >
> > But that would be false. In order to chain filters, the library has to wrap
> > filters with adapters so that they become resources.
>
> Suffice it to say that I wrote it that way because I thought that
> was accurate. Therefore, it wasn't sufficiently clear to prevent
> my jumping to the wrong conclusion.

I guess when I'm describing how chains work, I'll mention that filters need to
be adapted before being fed to the upstream filter. I think the concept
documentation and the tutorial are pretty clear that the downstream component
must be a resource.

> > > "SeekableResource" refers to a read/write "head," but few, if
> > > any, Resources for which this library will be used are physical
> > > devices with read/write heads. I suggest using "location,"
> > > "position," or "offset" in lieu of "head" here and in other parts
> > > of the documentation.
> >
> > How about 'stream position indicator'?
>
> Why "indicator?"

Substituting 'stream position' for 'reading head', etc., yields some funy stuff
like:

  "Modes can be categorized in several ways ... 3. Whether the reading or
  writing stream positions are repositionable, and if so, whether there are
  separate stream positions for reading and writing or a single read/write
  stream positions."

  "Seekable: a single sequence of characters, for input and output, with
   a combined repositionable read/write stream position."

How would you phrase this stuff?

> > > In "Direct," you refer to a socket-like interface, but not
> > > everyone will understand what you mean. I suggest using the
> > > phrase, "function-based interface."
> >
> > I see your point, but Direct Resources also have a function based interface.
>
> Well, you need something more specific and plain than
> "socket-like interface."

How about:

  "A resource is Direct if it provides access to its controlled sequences
  as in-memory arrays rather incrementally using functions such as read
  or write."

> > > _________________________________________________________________
> > > modes.html
> >
> > > "Readers new to the Iostreams Library should feel free to
> > > ^^^^^^^ ^^^^^^^^^^^^
> > > Users omit
> > > concentrate primarly on these two modes."
> >
> > How would you phrase the whole sentence?
>
> "Users new to the IOStreams Library should concentrate primarily
> on these two modes."

Okay.

> > > _______________________________
> > > Definitions of the Modes
> > >
> > > "This is useful to help reduce...different of filter types."
> > > ^^
> > > omit
> >
> > I don't follow.
>
> Change to, "This is useful to help reduce...different filter
> types."

I see -- the extraneous 'of'

> > > _______________________________
> > > The Metafunction mode
> > >
> > > You need to include an example and the signature of mode so we
> > > know how to use it.
> >
> > This section contains a link to the reference docs for mode:
> > http://tinyurl.com/5oqsy. Wouldn't it suffice to provide an example there?
>
> I did see mode.html later, but I don't think I noticed the link
> when I was looking at this page. An example on that other page
> is the right place, but maybe you can call attention to the link?

See "The Class Template Mode" for example usage.

?

> > > ______
> > > Synopsis
> > >
> > > Using "T" as the name of a policy class template parameter is
> > > confusing. "T" is, of course, commonly used for a data type
> > > stored in a container and for similar purposes. The closest
> > > thing to "T" in streambuf_facade is the "Ch" nested type in the
> > > policy class. Change "T" to a more meaningful name.
> >
> > I adopted this convention from "C++ Templates", p. 10. FilterOrResource
would be
> > more decriptive, but it's too long. What would you suggest?
>
> You need a supercategory of Filter and Resource. "Component?"

The trouble is I want the concept names to be unique not just within the library
but in a wider context, including the standard library and the rest of Boost. So
the concept name should have IO in it somewhere.

Best Regards,
Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk