|
Boost : |
From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-07 20:21:19
"Jonathan Graehl" <jonathan_at_[hidden]> wrote in message
news:413E1101.40404_at_graehl.org...
> First, let me apologize for not being able to review the actual code
> (yet). The interface and correctness/performance of implementation are
> all I really care about for now :)
No apology necessary.
> >There are some special cases where the copying is wasteful, though. For
> >instance:
> 1. If filter1 is just a passive observer, simply counting the number of
> occurences of '\n', or copying the data to a logging stream, then the end user
> should really be writing directly to buf2. Filter1 could process the data
using
> the same buffer as filter2.
>
> 2. Same as above, but filter1 also modifies the data in-place, making
> character-by-character modifications. (E.g., a toupper filter). This can be
> handled the same way.
>
> 3. If 'resource' is a stream or stream buffer, it could be assumed to do its
own
> buffering. In that case, filter2 should be writing directly to resource,
instead
> of to streambuf3:
> I'd certainly feel a bit more proud of the library if it handled these
> cases (1 and 3 seem most important). It seems well worth a few days' delay.
I plan to add this functionality, if the library is accepted.
(3) is easy. Im going to change the name of the 'Buffered' concept to
'MultiCharacter', and use 'Buffered' to indicate that a component has its own
buffer. Streams and stream buffers will be models of Buffered by default. I
believe this will involve just a few lines of code.
Even if (1) is more important than (2), I think (2) subsumes (1) and involves
about the same amount of work. Let's call such filters 'in-place' filters. If
in-place filters are added to a chain one at a time, their static type is lost,
so to make them work will require a certain (small) amount of runtime
indirection. Using the van Winkel/van Krieken pipe notation mentioned by Dietmar
Kuehl:
filtering_ostream out(tee(cout) | line_counter() | to_upper() |
file("log.txt"));
the in-place filters tee, line_counter and to_upper can be fused together at
compile-time. (Another proof that this notation is not just syntactic sugar.)
> >There is another class of cases in which the current setup is wasteful, but I
> >think it is rather domain-specific:
> >
> >4. Most filters in the chain modify only small parts of character sequences,
> >leaving big chunks unchanged.
> >
> >
> >
> Well, basic_newline_filter would do this - at least when replacing CRLF
> with a single '\n' character.
I think the real optimizations are possible only when a filter can tell that it
doesn't need to modify a block just by reading the header information. A
newline_filter has to scan the whole text to determine if changes need to be
made.
> >To optimize case 4 would require a major library extension. My feeling is
that
> >it is not necessary at this point, but I'd like to know what others think.
> It should certainly wait if you don't have a design already in mind.
> The library is good enough without this.
Thanks.
> >Part III: Interface questions:
> >
> >1. How to handle read and write requests which return fewer characters than
> >requested, though there has been no error, and EOF has not been reached. I
think
> >some answer to this question is necessary to allow the library to be extended
> >later to handle models orther than ordinary blocking i/o. I mention three
> >possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm
> >interested to know how important pople think this issues is, and what is the
> >best way to resolve it.
> I think #2 would be most in line with what people are used to under
> Posix (-1/EAGAIN). Blocking (option 1) actually doesn't make any sense
> at all except at the ends (source/sink), unless you put each filter in a
> chain into its own thread and use something like semaphores.
> I suppose the idea is, if you're in the middle of a filter chain, and
> somebody gives you some input which would overflow your buffer, you
> attempt to empty out your buffer to the next guy, but if he can't take
> enough of it to allow you to accept the whole input (or even, none of
> it), you have to tell the guy who sent it to you you can't take it, and
> he has to hold onto it in his buffer. That seems reasonable.
Okay. BTW, I noticed an error in proposal #2: having both an implicit conversion
to char and a safe-bool conversion to test for eof and unavil is unworkable. To
test whether a member function is valid, I'll probably have to add an ordinary
member function. Perhaps:
template<typename Ch>
struct basic_character {
....
operator Ch() const;
bool good() const;
bool eof() const;
bool fail() const;
};
(Here I'm using 'fail' instead of 'unavail' or 'EAGAIN', but the main point is
the addition of the member 'good()').
Now, looking at the alphabet_input filter from the tutorial, instead of
struct alphabetic_input_filter : public input_filter {
template<typename Source>
int get(Source& src)
{
int c;
while ((c = boost::io::get(src)) != EOF && !isalpha(c))
;
return c;
}
};
you'd write:
struct alphabetic_input_filter : public input_filter {
template<typename Source>
int get(Source& src)
{
character c;
while ((c = boost::io::get(src)).good() && !isalpha(c))
;
return c;
}
};
Here, eof and fail values are passed on to the caller unchanged. If you want to
send an eof or fail notification explicitly, you'd write return eof() or return
fail().
Now the big question: is the above formulation too convoluted to teach to an
average user who is interested only in plain, blocking i/o?
> I might in fact prefer that my source or sink resource act like #1
> (block until at least one character or EOF), and I assume this would be
> the default behavior if I open it in the default, blocking mode, but it
My idea was that the initial filter concepts need to be designed so that they
can be used unchanged when the library is extended to handle other i/o models.
For resources, it's easy enough simply to introduce new concepts Non-Blocking
Sink, Asynchronous Sink, etc.
> isn't possible to have filters act that way; they need to pass the
> "can't take your data" feedback all the way back through the stack to
> the end user, who then needs to hold onto it and select/spin/whatever on
> the underlying sink resource ...
Right. For now, I'm not worrying about what the proper abstraction will be to
represent a chain of filters with, say, an Asynchronous Source at the end. It
might be an 'async_istream', and ordinary filtering_istream which hides the
asynchonous nature of the source, or some entirely new abstraction not related
to the current standard i/o library.
All I want to ensure is that filters written today will be usable in the future.
> >2. The stack interface. Is the interface to the underlying filter chains rich
> >enoguh? Originally is was similar to std::list, so that you could disconnect
> >chains at arbitrary points, store them, and reattach them later. I decided
there
> >wasn't much use for this, so I simplified the interface.
> I'm sure someone, somewhere, sometime will want to perform
> splices/appends on filter chains, but I can't imagine why, either. At
> least, keep the simple interface and put the more complicated, flexible
> one in the appendix.
In that case, it's better to add it when someone actually needs it. It won't be
hard since filter chains are still implemented as std::lists.
> >3. Exceptions. James Kanze has argued repeatedly that protected stream buffer
> >functions should not throw exceptions (http://tinyurl.com/5o34x). I try to
make
> >the case for exceptions here: http://tinyurl.com/6r8p2. What do people think?
> I sympathize with both arguments; either way seems fine to me. There is
> no real performance penalty for an exception that is thrown at most once
> per stream (EOF),
Just to clarify, I agree with JK that exceptions should not be used to signal
EOF. You need a return value to tell you how many characters were successfully
read. (See http://tinyurl.com/3waf8 'Exceptions')
> but he's right that the existing interface (which end
> users never see) seems to specify a return value in that case. But, as
> you say, if you want to support async IO, it's moot - you do have to
> return the number of characters successfully read/written, so you need
> the std::streamsize return type, so ... you may as well return EOF
> instead of throwing it. That is, I see no reason to throw an EOF value
> if you support async IO (which I think would be lovely).
> >So the question is: Should an open() function be added to the closable
> >interface, to eliminate the need for first-time switches? Alternatively,
should
> >there be a separate Openable concept?
> Without a doubt, an Openable concept. If you add open() to Closeable
> you'd really want to change the concept name ;)
Of course ;-) But I can't think of a good one.
> For example, if I
> implement a first-time flag (no real hardship), I'll have to remember to
> add the first-time test not only when data is processed, but also when
> the stream is closed (need to handle empty input properly). I suspect
> people will forget this at least once.
Good point.
> There's also a minor performance
> gain:
Very minor. The filter/resource members are typically called by streambuf
virtual functions.
> the interface would be called when the stream is initialized, I
> assume, and not require a first-time flag check with each use.
> Admittedly, inconsequential.
> -Jonathan Graehl
Thanks.
Best Regards,
Jonathan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk