|
Boost : |
From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-13 14:22:01
"Rob Stewart" <stewart_at_[hidden]> wrote in message
news:200409131343.i8DDhOa15411_at_lawrencewelk.systems.susq.com...
> From: "Jeff Garland" <jeff_at_[hidden]>
> >
> > We are extending the IOStreams review for one week until 9/11/04. While we
> > have had many good reviews, the library is quite large and we would like to
> > see addtional discussion and review.
>
> I apologize for not getting my review in by 9/11, but I simply
> couldn't finish it by then. Even now, I cut things short.
> Nevertheless, here is my review.
>
> There are many comments below, but I'll provide the executive
> summary first.
Thanks for your very detailed comments! I really appreciate the enormous amount
of work you have obviously done examining the library interface and
documentation.
Responding to your comments is also an enormous amount of work ;-) Here is my
reply to the points you make in about the first half of your post. I'll try to
reply to the rest tomorrow.
> _________________________________________________________________
> Should the library be accepted into Boost?
>
> Absent Daryle Walker's strong comments against this library, I'd
> have said, "Absolutely." Given those comments, I must be more
> guarded.
>
> The MoreIO library did less to simplify creating and managing
> streams and the cognitive complexity thereof, and it does not
> provide the filtering that the Iostreams library offers. Still,
> MoreIO offers some facilities and an approach that leads to
> questions regarding whether these two are competing or
> complementary libraries. While more investigation regarding the
> strengths and synergies of the two libraries seems warranted, it
> may be best to accept both and sort out the matter later.
There is no overlap between the manipulators from More IO and the present
library. I voted to accept all the manipulators (partly as a result of your
comments) so I don't see any problems there.
streambuf_wrapping doesn't really overlap with the present library either. I
voted to reject because I thought it didn't simplify writing streams enough to
justify including it. If it is accepted, however, I don't see any real conflict.
So the real question is: if one or more of null_buf, pointerbuf array_buf and
value_buf (I may be spelling these wrong) is accepted, would it be advantageous
to reimplement it using the present library.
There are two issues:
1. Performance
Reimplementing pointerbuf and array_buf using the present library should produce
exactly the same runtime performce, since resources which present their
character sequences as in-memory arrays are specially optimized (see
direct_streambuf.fppp at http://tinyurl.com/3qrsx).
Reimplementing null_buf and value_buf should improve runtime performance, since
the implementations from More IO are not buffered.
The reimplemented versions of null_buf and value_buf will have somewhat larger
generated code size. Perhaps this would also be true for pointerbuf and
array_buf.
Using the reimplemented versions will also cause more source code to be included
during compilation. This is because Daryle's implementations are self contained
while and mine depend on parts of type traits and mpl as well as a certain
amount of internal infrastructure. Daryle clearly feels this is an important
issue; I think the amount of code included is in keeping with other boost
libraries.
2. Interface.
Streams and stream buffers generated using the present library have a simple
open/is_open/close interface similar to std::fstream. I think including the
streams and stream buffers from More IO as is, rather than giving them a
compatible interface, will be confusing to users.
> _______________________________
> Design Evaluation
>
> The library is well conceived and executed. The design seems
> sound. There are still questions regarding the overlap and
> differences between this library and MoreIO. There are
> fundamental, philosophical differences between the libraries.
Could you elaborate?
> Use cases should be conceived and compared to see how the
> libraries can handle each. Only then will we know whether they
> are complementary or competing and, if the latter is the case,
> which proves superior according to various criteria including
> ease of use and performance.
See the above discussion on performance.
> It may well be that the stream[buf]
> parts of MoreIO should simply be added to, even integrated with,
> this library.
I prefer integration.
> _______________________________
> Documentation Evaluation
>
> The documentation is astoundingly thorough and quite clear
> overall.
Thanks!
> Many have already made suggestions for improvement, but
> those can only be seen as making good better.
I'm glad to hear you say that. There were so many comments on the inadequacey of
the docs that I was starting to think they were simply awful.
> I have included
> lengthy comments on the documentation, but I could not even visit
> every page!
> _________________________________________________________________
> Library Name
>
> I agree with the sentiment of other reviewers in that the library
> is misnamed. Since this library provides filtering, complete
> with chaining, as well as to make it easy to create streambufs
> and streams, the current name fails to capture all of these
> features.
I agree, but couldn't think of anything better.
> Since the Standard Library provides streambufs,
> streams, manipulators, etc., and yet it is called the IOStreams
> Library, perhaps there's nothing wrong with simply calling this
> library Boost.IOStreams. (I do think this library should use the
> same capitalization: "IOStreams" not "Iostreams.")
Steve Teal's "C++ IOStreams Handbook" and Langer and Kreft's "Standard C++
IOStreams and Locales" use the spelling 'IOStreams'. The C++ standard,
however.uses 'Iostreams'. (See 27.1 [lib.iostreams.requirements]). I think
'IOStreams' looks a bit funny, but I'm happy to use it if that's what people
like.
> _________________________________________________________________
> Design Comments
> _______________________________
> Miscellaneous
>
> Wouldn't the InoutResource concept be better named
> "BidirectionalResource," though that is longer? (Same for
> InoutFilter.)
Bidirectional was the name I first gave it (http://tinyurl.com/4gskg). I changed
it because the filters and resources in question manage two separate sequences,
whereas bidirectional iterators represent a single sequence. I chose 'inout'
because such components act like an input component glued to an output
component.
I don't like the name much, and would happily revert to bidirectional, I people
don't feel that it's misleading.
> "Peekable" does not imply being able to put back a character.
I know. The natural choice would have been Putbackable, which sounds terrible. I
choice Peekable because if you can put back a character than you can peek ahead
in the stream by reading a character and then putting it back.
> Wouldn't "Undoable" be a better name for this optional behavior?
It sounds a bit to general. For instance, an Undoable Sink might allow
characters already written to be cancelled.
> The normal template parameter abbreviation for a character type
> is "Char." I don't think abbreviating it to "Ch" is helpful.
The standard uses charT; so do Langer and Kreft. The Dinkumware docs use Elem. I
don't recall seeing Char, but I'm not against it.
> There's no information provided on the performance of this
> library. How much overhead does it add compared to custom
> solutions?
Performance comparable to hand-written components is listed as one of the main
design goals. (Rationale-->Design Goals-->Item 3, at http://tinyurl.com/3z9ou).
The footnote to that item (http://tinyurl.com/4th3v) mentions some performace
comparisons I performed.
To summarize, I found that for output, stream_facade<file_descriptor_sink> was
slightly faster than the Dinkumware std::ofstream (but it does no
synchonization.) Since the overhead of filesystem access might be expected to
dominate in that comparison, I also tested a custom ostringstream against the
Dinkumware component, and found identical performance.
It would have been natural to test input and random access as well. I didn't do
this because I wasn't convinced that the tests I was running (from
<libs/io/test/detail/verification.hpp>) had any relation to typical use cases.
(BTW, I expect random-access performance won't be so good right now, since I
haven't optimized streambuf_facade::seekoff for short, frequent seeks. This can
be done if necessary.)
I didn't test filtering performance because at the time I didn't have anything
to compare it with except for James Kanze's implementation, which uses
pre-standard streams. Now that I have JC van Winkel's filtering implementation
and Robert Ramey's Dataflow Iterators, I can run some head-to-head tests. I have
spent a lot of effort on optimization, and expect my implementation to compare
favorably.
> For example, how much memory and runtime overhead is
> there wrt MoreIO's mechanisms?
There should be slightly more memory overhead with the reimplementation of
null_buf and value_buf using my library, since it provides buffering by default.
> How much overhead is there wrt a
> custom filtering solution? These are answers users of the
> library need to judge whether it is sufficient or must they
> choose another approach.
See above.
> The filter chain ordering for input versus output is awkward. It
> should always be from the beginning to end; let the library
> reverse the order.
I'm surprised nobody mentioned this earlier.
Having the resource at the end dramatically simplifies the interface. It allows
an instance filtering_stream of filtering streambuf to be considered 'open' as
soon as a resource is pushed, and 'closed' as soon as it is popped. This allows
a simple stack-like interface. The alternative would be
(i) to add open/close functions (which are part of the interface of
streambuf_facade, but would currently be redundant for filtering streams, OR
(ii) to have a first-time switch which automatically 'opens' the filtering
stream as soon as the first i/o is performed.
In addition, allowing filters to be pushed after a resource would give many new
users the impression that they can add filters *after* i/o is in progress. As
has been discussed during the review, this is not currently supported; support
can be added in limited circumstances, but not generally.
Consider:
filtering_ostream out;
out.push(file_sink("log"));
out.push(base_64_encoder());
out << "hello world!\n"; // stream is implicity 'open'
out.push(zlib_compressor()); // error!
> _______________________________
> boost::io::converter
>
> "converter" isn't well-named.
How about code_converter?
> It only widens characters for the
> resources it converts, so it should be called something else.
converter is basically an easy to use wrapper for codecvt, which is the standard
library's component for code conversion. So 'code_coverter' (or converter, for
short) seems appropriate. George Garner suggested that reverse code conversion
be supported too, and I am inclined to agree, though it is much less useful.
> So
> that begs the new name. I thought of "widener," but while it's
> correct English, it's also odd sounding, at least to my ear.
> "character_widener," would be a little more explicit.
This is a bad choice, since 'widening' is performed by a ctype, not a codecvt.
> Perhaps
> "expander" or "character_expander" would work.
Same as widener -- to me it implies a character-by-character transformation.
> Can converter be used with a wide character resource? Imagine a
> generic "widen" which ensures that the resulting streambuf or
> stream uses wide characters.
In theory, yes. If you supply a locale containing a codecvt<wchar_t, wchar_t>,
then you could use
converter<wfile_source>
to "widen" a wide-character stream. However, you can't depend on an
implementation providing specializations codecvt<wchar_t, wchar_t> or even
codecvt<char, wchar_t>.
I'm inclined to parameterize converter by a codevt type, as suggested by George
Garner:
template< typename Resource,
typename Codecvt = fetch_from_global_locale >
If Codecvt::extern_type does not match the character type of Resource, but
Codecvt::inter_type does, the codecvt will be automatically 'reversed'.
> convert's "Int" template parameter is poorly named. "Int" is, of
> course, a common abbreviation for "integer" in C++. I suggest
> using "Intern" or even "Internal." The name of "IntTr" should
> follow suit.
True. I think I was trying to fit within the 80 character line limit, without
too many line breaks.
> _______________________________
> basic_newline_filter
>
> I realize that this is only an example, but there is no apparent
> protection from misconfiguring the constructor flags. Grouping
> the related options into enumerated types and taking separate
> parameters for each group would make it safer to use. (You can
> overload the bitwise OR operator to permit combining them
> conveniently.) That is, since print_CR, print_LF, and print_CRLF
> are mutually exclusive, they should be part of an enumerated type
> that does not permit bitwise OR'ing. Since the posix, mac, and
> windows values are meant to supplant all of the other values,
> they should be part of their own enumerated type and should be
> used in a separate constructor. The remaining options can be
> part of a third enumerated type, with bitwise OR support, that
> constitutes the second argument to the existing constructor.
I was trying to keep things simple. Specifying more than one of print_CR,
print_LF and print_CRLF yeilds a runtime error. Even if print_xxx had it's own
enumeration type, it would still be possible to specify illegal values.
> The ignore_CR and ignore_LF values are poorly named. I suggest
> ignore_extra_CRs and ignore_extra_LFs.
Sounds reasonable.
> The print_CR, print_CRLF,
> and print_LF values would be better if named write_CR,
> write_CRLF, and write_LF, since no printing is occuring, at least
> in general.
How about output_xxx instead of print_xxx?
> _________________________________________________________________
> boost::io::reverse
>
> Isn't the direction of a filter independent of the filtering?
Yes, in theory, put in practice some filters are much easier to express as an
output filter than an input filter, or vice versa.
> That is, why isn't a filter's interface based upon these
> semantics:
>
> char_type filter(char_type ch);
> streamsize filter(char const * input, streamsize n,
> char const * output);
> Then, the source and destination of the data is under the control
> of the filter() function and means that whether the filtering is
> used on input or output is an orthogonal decision. This
> interface should be every bit as efficient as what you've
> specified, but makes filters more straightforward and flexible.
This is what symmetric filter does (http://tinyurl.com/3lz82). I'm planning to
optimize the case where a symmetric filter is added to a filter chain, so that
symmetric filters should be ruthlessly efficient.
The problem with making symmetric filters the default (or exclsuive) filter
concept is that it's unexpectedly difficult to implement them correctly.
> _________________________________________________________________
> Documentation Comments
Since I agree with most of your suggestions, I'll only comment on those where I
disagree or where further explanation is needed.
> In case it makes a difference, I was reading the documentation at
> http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/ on 4
> SEP 2004 rather than downloaded documentation.
That's fine.
> _______________________________
> General
>
> "boiler plate" is one word; I'm guessing it will appear in
> various places throughout the documentation, so I'm calling it
> out at the top.
Thanks for pointing that out. (I'd guess it went through the typical
transformation: boiler plate --> boiler-plate --> boilerplate.)
> When formatting source code may I suggest not indenting twice for
> braces? Doing so pushes code too far right and reduces
> readability due to long lines. (This occurs on tutorial.html and
> read.html, for example.)
Do you mean:
struct tolower_filter : public buffered_input_filter {
template<typename Source>
streamsize read(Source& src, char* s, streamsize n)
{
streamsize result = boost::io::read(src, s, n);
for (streamsize z = 0; z < result; ++z)
s[z] = tolower(s[z]);
return result;
}
};
Should be
struct tolower_filter : public buffered_input_filter {
template<typename Source>
streamsize read(Source& src, char* s, streamsize n)
{
streamsize result = boost::io::read(src, s, n);
for (streamsize z = 0; z < result; ++z)
s[z] = tolower(s[z]);
return result;
}
};
?
Sounds like an improvement.
> When referring to concepts, always capitalize them. Thus,
> "filter" becomes "Filter" in most uses, for example.
I realize I don't follow a consistent convention, but I've tried to distinguish
beetwen concepts, models of concepts, and instances of models of concepts.
Obviously I haven't done a very good job, but I'm not sure using the capitalized
version everywhere is the right solution. (Maybe it is)
> In many places you mention that something works with a Source or
> a "standard input stream or a standard stream buffer" or
> similar. Wouldn't it be better to clearly define once that
> std::basic_istream models Source, std::basic_ostream models Sink,
> std::basic_iostream and std::basic_streambuf model both?
This is just because I didn't update the docs well enough after the most recent
rewrite. FWIW,
1. Originally, standard streams and stream buffers were not models of Resource.
2. Then I went out of my way -- by making the concepts 'external' in Thorsten
Ottosen's sense -- to make streams and stream buffers full-fledged models of
Resource.
3. Then, for exception safety reasons, I decided that Resources must be
CopyConstructible, so that streams and stream buffers would simply be 'treated
as' Resources.
4. Finally, I dropped the CopyConstructible requirement but neglected to update
the docs everywhere.
> _________________________________________________________________
> menu.html
> _______________________________
> Contents
>
> This is confusing. Apparently, the leading "+" means a different
> page, whereas entries without the "+" are on the current page,
> but there is no legend explaining that.
'+' means that the entry is expandible by clicking. '-' means the entry is
collapsable. See http://tinyurl.com/65234.
> Why do some pages, like tutorial.html have a mini-TOC at the top,
> whereas others, like home.html, which may be the only unique one,
> get their sections listed in the Contents frame?
When I added the tree control, I forgot to get rid of all the local TOC's.
> There's nothing wrong with what you've done, but why not use the
> phrase, "standard streambuf" instead of "standard stream buffer?"
> It's shorter and in keeping with what anyone with some
> familiarity of Standard IOStreams would expect.
'Stream buffer' is used by the standard and by Langer and Kreft, presumably
because streambuf is the specialization of basic_streambuf for the type char.
> _______________________________
> Footnotes
>
> I don't see any text that references the footnotes at the bottom
> of the page, so their presence is confusing.
I forgot to remove them when I moved Tutorial to its own page.
> (Further reading
> reveals that these footnotes appear at the bottom of every page,
I hope not!
> regardless of whether they are referenced.
> _________________________________________________________________
> tutorial.html
> " Both streambuf_facade and stream_facade have constructors and
> overloads of open which forward their arguments to filter or
> ^^^^^^^^^
> to the filter
>
> resource constructor. Therefore, the first example could have
> ^^^^^
> previous
I see a grammatical problem here -- "forward their arguments to filter or
resource constructor" -- but I don't understand what changes you are suggesting.
> "The function boost::io::get reads a character from an arbitrary
> Source; it is provided by the Iostreams Library to allow all
> Sources to be accessed with a single interface."
>
> That's worded awkwardly. I suggest rephrasing like this:
>
> The function boost::io::get reads a character from an
> arbitrary Source. This is possible because the Iostreams
> Library provides the same interface for all Sources.
I see your point, but it might be better just to use your first sentence. The
second sentence provides more information than needed here.
> "The last item added to the chain could be any Source."
> In the following example, reproduced here, the order seems
> backward:
>
> filtering_streambuf<input> in;
> in.push(alphabetic_input_filter());
> in.push(random_source());
>
> If that's the correct order, then you need to explain, even here
> in the tutorial, why the order is as shown.
I mentioned the rationale, but obviously I need to point out the ordering issues
in the tutorial.
> "We could also add any number of InputFilters to the chain before
> adding the Source."
> "They could also use the following convenience typedefs:"
I think the first sentence was changed without updating the second.
There are several choices for this type of passage:
1. Use the passive voice everywhere.
2. Use 'we' -- this sounds natural to me because it's used in mathematical
papers.
3. Use 'you'
4. Use 'the user'
Which should one prefer? Settling on one of the above and using it consistently
should resolve many of your (snipped) suggestions/objections below.
> _______________________________
> Efficient Filtering: Buffered Filters
> You've changed from "we" to "one" and it sounds odd. Try this:
Agreed.
>
> In such cases, reading a character from an ordinary
> InputFilter means calling its get member function, thus
> incurring the overhead of a (virtual?) function call.
It's not virtual.
> (It isn't clear from the tutorial whether read supplants get or
> augments it. I assumed the former in writing the above.)
You assumed correctly.
> _________________________________________________________________
> examples.html
> Presidential OutputFilter
>
> The political humor of this example is unwarranted in Boost
> documentation.
I'm not sure I agree. But noted.
> Perhaps you could rename it to "Texas
> OutputFilter" or "Southern U.S. OutputFilter."
I don't think this would help. Texans don't typically say 'subliminable,' for
instance.
> There is no description of what the filter actually does in this
> section.
I'm thinking each example should have its own page, with a detailed description
of what the filter does, a guided tour of the source code, and an explanation of
the example output.
> _______________________________
> Regex OutputFilter
>
> "This example uses a regex_filter to remove C and C++-style
> comments from a code excerpt. Although somewhat more
> ^^^^^^^^^^^^^^^^^^^
> omit
>
> sophisticated than the Uncommenting InputFilter, above, the
> ^^^
What's the problem here?
> _______________________________
> Overview
>
> "...an InputFilter, which filters input read from a Source, and
> an OutputFilter, which filters output written to a Sink."
>
> I suggest a little rephrasing to make this read better and fill
> in a couple of gaps:
>
> an InputFilter, which filters input read from a Source or
> another InputFilter, and an OutputFilter, which filters output
> read from a Source or another OutputFilter, before it is
> written to a Sink.
But that would be false. In order to chain filters, the library has to wrap
filters with adapters so that they become resources.
> "SeekableResource" refers to a read/write "head," but few, if
> any, Resources for which this library will be used are physical
> devices with read/write heads. I suggest using "location,"
> "position," or "offset" in lieu of "head" here and in other parts
> of the documentation.
How about 'stream position indicator'?
> _______________________________
> Filter Concepts
>
> InputFilter, OutputFilter, InoutFilter, and SeekableFilter refer
> to a "stream buffer." Shouldn't they refer to Sources and
> Filters instead?
Right. Until the latest revision, filters had to communicate with stream
buffers.
> "Notifications" is unexpected and you refer to a stream. Try
> this:
>
> Closable: A Filter or Resource which is told to close
> immediately before a Source or Sink is closed.
How about:
Closable: A Filter or Resource with a member function close which is called
when the end of a character sequence is reached.
> In "Direct," you refer to a socket-like interface, but not
> everyone will understand what you mean. I suggest using the
> phrase, "function-based interface."
I see your point, but Direct Resources also have a function based interface.
> _________________________________________________________________
> modes.html
> "Readers new to the Iostreams Library should feel free to
> ^^^^^^^ ^^^^^^^^^^^^
> Users omit
> concentrate primarly on these two modes."
How would you phrase the whole sentence?
> _______________________________
> Definitions of the Modes
>
> "* Whether a filter or resource is
> Closable."
>
> This should be a single line.
Seems to be a mozilla bug. Even doesn't help.
> "This is useful to help reduce...different of filter types."
> ^^
> omit
I don't follow.
> _______________________________
> The Metafunction mode
>
> You need to include an example and the signature of mode so we
> know how to use it.
This section contains a link to the reference docs for mode:
http://tinyurl.com/5oqsy. Wouldn't it suffice to provide an example there?
> _______________________________
> Reference
> "Policy-based stream buffer template with an interface...."
> ^^^^^^^^
> class template
"Policy-based class template deriving from a specialization of basic_streambuf
with an interface...." ?
> ______
> Synopsis
>
> Using "T" as the name of a policy class template parameter is
> confusing. "T" is, of course, commonly used for a data type
> stored in a container and for similar purposes. The closest
> thing to "T" in streambuf_facade is the "Ch" nested type in the
> policy class. Change "T" to a more meaningful name.
I adopted this convention from "C++ Templates", p. 10. FilterOrResource would be
more decriptive, but it's too long. What would you suggest?
> ______
> Template Parameters
>
> "T - A model of the filter or resource concepts. Specializations
> of streambuf_facade with filter types are used internally by the
> Iostreams Library to construct chains of filters and
> resources. Users of the library never need to specialize
> streambuf_facade with a filter type."
>
> Since only the library uses filter types for "T", why not name
> "T" "Resource" and let the library worry about using filters for
> the "Resource" parameter? Also, given that only the library does
> this, omit the last two sentences or move them to an Advanced
> Topics section.
Okay, except that I've been toying with exposing some of the undocumented stuff.
Thanks again -- see you tomorrow ;-)
Jonathan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk