Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-03-05 22:17:29


Rob Stewart wrote:
> From: "Jonathan Turkanis" <technews_at_[hidden]>

>> If the library is well-received, however, I expect eventually there
>> will be a large body of user-defined filters, and I wouldn't want
>> them to have to be rewritten.
>
> Given the filters written with sync I/O in mind are quite
> different from those written for async I/O, is that really an
> issue?

Do you mean that different filtering operations will be desired, or that the
implementation will be different?

My hope was to make the implementation of non-blocking filters easy enough that
I could require all filters to be non-blocking. Maybe my original message was
unclear on this point.

>> (This is another topic that didn't get much attention during review.
>> There are some, notably James Kanze, who insist that well-designed
>> stream buffers should not throw exceptions. My defense of exceptions
>> is here: http://tinyurl.com/59xv6. At the time of the review, I was
>> prepared to switch to error codes, but it's a bit late now.)
>
> I apparently missed that during the review. On what basis does
> James Kanze make his assertion?

See this message by James Kanze, my reply, his reply and my further reply:
http://tinyurl.com/4tkj3. I don't find his argument very convincing on this
point. But he is very knowledgable, so I'm reluctant to dismiss his view.

> So long as EOF and EAGAIN do not
> result in exceptions,

right

> given how commonly they occur, I have no
> problems with exceptions. However, it sounds like you might be
> throwing an exception to indicate EOF. If so, I don't like it.

How did I give you that impression?

> It is not entirely uncommon to read to EOF, do something to a
> file, and continue reading. For example, we have certain files
> that we tail throughout the day in our production apps. EOF is a
> common occurence because we quickly process the new data appended
> to the files and reach EOF again. That is, the sink is faster
> than the source. This behavior would be complicated by EOF
> throwing an exception. An error indication of EOF, plus the
> ability to clear that condition and try again, fits this usage
> better. (Granted, that use of a file is not widespread, but
> since you offer no choice, such usage is not possible.)
> You mentioned in the cited documentation section that you didn't
> want to complicate things such that users would wonder which of
> the various ways a given function indicates errors. I
> sympathize, but isn't that what you're introducing with your
> basic_character idea?

I'm not treating EOF or EAGAIN as errors. What I'm trying to avoid is having a
single error (or any other type of condition, for that matter) represented in
more than one way.

> How about putting a state variable in filters, sources, and sinks
> to indicate EAGAIN, EOF, and GOOD. Then, functions that only
> read or write a single character can return a bool, and functions
> that read or write multiple characters can return a count. If
> the bool is false or the count is zero, the code can query the
> state to determine whether EAGAIN or EOF occurred. That keeps
> the error indication in the return value simple and even leaves
> room for adding additional error states should that prove
> necessary. Hard errors can still be signaled via exception.

This was one of the options I seriously considered. I believe the very first
version of the library I posted had this feature. I tend to think it would make
life harder for people writing filters and devices to force them to implement an
additional function. As things stand, you can often get by with a implementing a
single function, and it works pretty well. Also, it makes it harder to fit
standard streams and stream bufferss into the framework: stream buffers don't
have a state indicator at all; streams do, but it doesn't map well to
good/eof/eagain.

I'm willing to be convinced otherwise, however.

>>>> V. Problems ----------------------------
>>>>
>>>> 1. Harder to learn. Currently the function get and the concept
>>>> InputFilter are very easy to explain. I'm afraid having to
>>>> understand the basic_character template before learning these
>>>> functions will discourage people from using the library.
>>>
>>> The class template is hardly complicated. I can't imagine it
>>> would be a show stopper, though it does add some complexity.
>>
>> I'm thinking of decribing it as a replacement for
>> traits_type::int_type. Compared to using int_type and its family of
>> helper functions (eq_int_type, eof., not_eof, ...) basic_character
>> is a snap. ;-)
>
> That's a good way to make it palatable to those already familiar
> with writing streambufs, but it hardly applies to the majority of
> your library audience, does it.

The current interface has get() return an instance of int_type, so if I keep the
interface how it is, I have to say something in the docs about char_traits. But
you have a good point.

>>>> 2. Harder to use. Having to check for eof and fail make writing
>>>> simple filters, like the above, slightly harder. I'm worried that
>>>> the effect on more complex filters may be even worse. This applies
>>>> not just to get, but to the other functions as well, since their
>>>> returns values will require more careful examination.
>>>
>>> That's a real issue.
>>
>> It's become even more clear in this thread, given my botched
>> uncommenting_filter implementation.
>
> Keep running notes on the things one can forget or do wrong so
> your documentation can warn about them.

Good idea. I'll definitely do this; the question is whether it will be in the
advanced section or will apply to all filters and devices.

>>> (The member functions good(), eof(), and fail() on
>>> basic_character could be made non-member functions and could be
>>> implemented for char and wchar_t.

I forgot to say that I like the idea of making them non-members. All I am unsure
about is whether they should be implemented for char and wchar_t (actually for
int and std::char_traits<wchar_t>::int_type)

>>>That would help code deal with
>>> synchronous code in the asynchronous style.)
>>
>> I guess I could implement
>>
>> bool eof(int n) { return n == EOF; }
>> bool good(int n) { return n != EOF; }
>>
>> bool weof(std::char_traits<wchar_t>::int_type n)
>> { return n == WEOF; }
>> bool wgood(std::char_traits<wchar_t>::int_type n)
>> { return n != WEOF; }
>>
>> I tend to think that n == EOF and n == WEOF are more readable,
>> however.
>
> Yes, they are more readable, but if you don't document the
> implementation of those functions, library users will expect to
> use them in all situations. Thus, switching to async from sync
> I/O will be less of a stylistic change. (That also leaves room
> for you to change the implementation details if need be.)

Let me make sure we're on the same page: We're assuming I decide to provide
separate concepts for non-blocking i/o, and that some of the non-blocking
concepts use basic_character. Are you saying that someone who is used to writing
non-blocking filters and tries to write a blocking filter may get confused
because eof() doesn't work with int?

Maybe I could have get() return a basic_character even in the blocking case, and
explain that there's no need to test for eagain in blocking i/o. This would
eliminate one of the last traces of char_traits in the public interface of the
library, which would be good. The other reasonable way to eliminate char_traits
would be to have blocking get() return an optional<char> (see my reply to
Thorsten); but if I'm using basic_character for non-blocking i/o, it may make
sense to use it in both places.

> Of course, if you adopt my state idea, then the return value
> would simply indicate that things went well or something less
> than ideal happened. In the latter case, one would query the
> state of the filter/device to learn what happened. That would
> obviate your basic_character class and these functions.

> One problem with code that returns status information is that
> folks can forget to check the status.

This is a real problem with eagain, since testing may not reveal the error
unless eagain happens to occur at the right place in a character sequence.

> You could return a special
> type in lieu of bool, and basic_character in lieu of char_type,
> which require inspecting whether they indicate an error
> condition. If that query is not done, the destructor can
> complain.

Could you elaborate? It sounds like it could lead to very poor performance.

>OTOH, you could just say, "don't do that!"

Best Regards,
Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk