Boost logo

Boost :

From: Jonathan Graehl (jonathan_at_[hidden])
Date: 2004-09-07 14:50:25


First, let me apologize for not being able to review the actual code
(yet). The interface and correctness/performance of implementation are
all I really care about for now :)

>There are some special cases where the copying is wasteful, though. For
>instance:
>
>1. If filter1 is just a passive observer, simply counting the number of
>occurences of '\n', or copying the data to a logging stream, then the end user
>should really be writing directly to buf2. Filter1 could process the data using
>the same buffer as filter2.
>
>2. Same as above, but filter1 also modifies the data in-place, making
>character-by-character modifications. (E.g., a toupper filter). This can be
>handled the same way.
>
>3. If 'resource' is a stream or stream buffer, it could be assumed to do its own
>buffering. In that case, filter2 should be writing directly to resource, instead
>of to streambuf3:
>
> filter2.write(resource, buf2, buf2+ n2).
>
>These three cases can be handled easily by modifying the existing framework. I
>didn't add special treatment because it occurred to me rather late in
>development.
>
>
>

I'd certainly feel a bit more proud of the library if it handled these
cases (1 and 3 seem most important). It seems well worth a few days' delay.

>There is another class of cases in which the current setup is wasteful, but I
>think it is rather domain-specific:
>
>4. Most filters in the chain modify only small parts of character sequences,
>leaving big chunks unchanged.
>
>
>
Well, basic_newline_filter would do this - at least when replacing CRLF
with a single '\n' character.

>To optimize case 4 would require a major library extension. My feeling is that
>it is not necessary at this point, but I'd like to know what others think.
>
>
>
It should certainly wait if you don't have a design already in mind.
The library is good enough without this.

>-----------------------------------------------------
>
>Part III: Interface questions:
>
>1. How to handle read and write requests which return fewer characters than
>requested, though there has been no error, and EOF has not been reached. I think
>some answer to this question is necessary to allow the library to be extended
>later to handle models orther than ordinary blocking i/o. I mention three
>possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm
>interested to know how important pople think this issues is, and what is the
>best way to resolve it.
>
>
>
I think #2 would be most in line with what people are used to under
Posix (-1/EAGAIN). Blocking (option 1) actually doesn't make any sense
at all except at the ends (source/sink), unless you put each filter in a
chain into its own thread and use something like semaphores.

I suppose the idea is, if you're in the middle of a filter chain, and
somebody gives you some input which would overflow your buffer, you
attempt to empty out your buffer to the next guy, but if he can't take
enough of it to allow you to accept the whole input (or even, none of
it), you have to tell the guy who sent it to you you can't take it, and
he has to hold onto it in his buffer. That seems reasonable.

I might in fact prefer that my source or sink resource act like #1
(block until at least one character or EOF), and I assume this would be
the default behavior if I open it in the default, blocking mode, but it
isn't possible to have filters act that way; they need to pass the
"can't take your data" feedback all the way back through the stack to
the end user, who then needs to hold onto it and select/spin/whatever on
the underlying sink resource ...

>2. The stack interface. Is the interface to the underlying filter chains rich
>enoguh? Originally is was similar to std::list, so that you could disconnect
>chains at arbitrary points, store them, and reattach them later. I decided there
>wasn't much use for this, so I simplified the interface.
>
>
I'm sure someone, somewhere, sometime will want to perform
splices/appends on filter chains, but I can't imagine why, either. At
least, keep the simple interface and put the more complicated, flexible
one in the appendix.

>3. Exceptions. James Kanze has argued repeatedly that protected stream buffer
>functions should not throw exceptions (http://tinyurl.com/5o34x). I try to make
>the case for exceptions here: http://tinyurl.com/6r8p2. What do people think?
>
>
>
>
I sympathize with both arguments; either way seems fine to me. There is
no real performance penalty for an exception that is thrown at most once
per stream (EOF), but he's right that the existing interface (which end
users never see) seems to specify a return value in that case. But, as
you say, if you want to support async IO, it's moot - you do have to
return the number of characters successfully read/written, so you need
the std::streamsize return type, so ... you may as well return EOF
instead of throwing it. That is, I see no reason to throw an EOF value
if you support async IO (which I think would be lovely).

>So the question is: Should an open() function be added to the closable
>interface, to eliminate the need for first-time switches? Alternatively, should
>there be a separate Openable concept?
>
>
>
Without a doubt, an Openable concept. If you add open() to Closeable
you'd really want to change the concept name ;) For example, if I
implement a first-time flag (no real hardship), I'll have to remember to
add the first-time test not only when data is processed, but also when
the stream is closed (need to handle empty input properly). I suspect
people will forget this at least once. There's also a minor performance
gain: the interface would be called when the stream is initialized, I
assume, and not require a first-time flag check with each use.
Admittedly, inconsequential.

-Jonathan Graehl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk