|
Boost : |
From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-03-02 23:03:43
Hi All,
Several important extensions of the Iostreams library have been on hold for over
a month while I have tried to resolve the issue described in this message.
Unfortunately it didn't get much attention during the review, so I'm hoping I
can generate some discussion now.
I'm sorry about the length of this message; I've been stuck on this for a long
time and would really appreciate some help.
I. The Problem ---------------------------
Standard iostreams do not work well with non-blocking or asynchronous i/o. I
would eventually like to extend the library to provide support for non-blocking
and async i/o, and when I do so I expect I will have to introduce some new
Device concepts. However, I would like to modify the *current* filter concepts
so that they will work unchanged when non-blocking and asynchronous devices are
introduced.
There are several reasons for this:
1. Proper isolation of concepts. A filter represents a rule for transforming
character sequences; ideally, how the sequence is accessed should not be
relevant. For example, it would be silly to require separate versions of a
toupper_filter for blocking and non-blocking i/o, since they would both
represent the same simple rule.
2. Maximal code reuse. While it would just be silly to require several versions
of a toupper_filter, it would be extremely wasteful to require several versions
of more complex components like compression or encryption filters.
3. Reduced complexity of the library. The library already has a large number of
concepts; I don't want to double or triple the number of filter concepts when
non-blocking and async i/o is introduced.
II. The Solution (the easy part) ---------------------------
I believe it will suffice to:
- provide the functions put() and write() (both filter member functions and the
free functions with the same names) with a way to indicate that fewer than the
requested number of characters have been written to the underlying data sink
even though no error has occurred.
- Provide the functions get() and read() (both filter member functions and the
free functions with the same names) with a way to indicate that fewer than the
requested number of characters have been read from the underlying data source,
even though no error has occurred and EOF has not been reached.
This is easily achieved for put() and write(), and almost as easily for read():
- Instead of returning void, put() can return a bool indicating whether the
given character was successfully written.
- Instead of returning void, write() can return an integer indicating the number
of characters written.
- Currently, when read returns fewer characters than the requested amount it is
treated as an EOF indication. Instead, we can allow read to return the actual
number of characters read, and reserve -1 to indicate EOF, since it is not
needed as an error indication.
III. The Solution (the ugly part) ---------------------------
The function get presents more of a challenge. Currently it looks like this (for
char_type == char):
struct my_input_filter : input_filter {
template<typename Source>
int get(Source& src);
};
The return type already serves a dual purpose: it can store a character or an
EOF indication. Unfortunately, with non-blocking or async i/o there are now
three possible results of a call to get:
1. A character is successfully retrieved.
2. The end of the stream has been reached.
3. No characters are currently available, but more may be available later.
My preferred solution is to have get() return an instance of a specialization of
a class template basic_character which can hold a character, an EOF indication
or a temporary failure indication:
template<typename Ch>
class basic_character {
public:
basic_character(Ch c);
operator Ch () const;
bool good() const;
bool eof() const;
bool fail() const;
};
typedef basic_character<char> character;
typedef basic_character<wchar_t> wcharacter;
character eof(); // returns an EOF indication
character fail(); // returns a temporary failure indication.
wcharater weof();
wcharater wfail();
[Omitted: templated versions of eof and fail]
Alternatively, the member functions good, eof and fail could be made non-member
functions taking a basic_character.
IV. Examples (feel free to skip) ---------------------------
With these changes, the uncommenting_input_filter (http://tinyurl.com/3ue9r)
could be rewritten as follows:
class uncommenting_input_filter : public input_filter {
public:
explicit uncommenting_input_filter(char comment_char = '#')
: comment_char_(comment_char) { }
template<typename Source>
character get(Source& src)
{
character c = boost::io::get(src);
if (c.good() && c == comment_char_)
while (c.good() && c != '\n')
c = boost::io::get(src);
return c;
}
private:
char comment_char_;
};
Similarly, usenet_filter::get (http://tinyurl.com/6xqvk) could be rewritten:
template<typename Source>
int get(Source& src)
{
// Handle unfinished business.
if (eof_)
return EOF;
if (off_ < current_word_.size())
return current_word_[off_++];
// Compute curent word.
current_word_.clear();
while (true) {
character c;
if (!(c = boost::io::get(src)).good()) {
if (c.eof())
eof_ = true;
if (current_word_.empty())
return c;
else
break;
} else if (isalpha((unsigned char) c)) {
current_word_.push_back(c);
} else {
// Look up current word in dictionary.
map_type::iterator it =
dictionary_.find(current_word_);
if (it != dictionary_.end())
current_word_ = it->second;
current_word_.push_back(c);
off_ = 0;
break;
}
}
return this->get(src); // Note: current_word_ is not empty.
}
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are
very easy to explain. I'm afraid having to understand the basic_character
template before learning these functions will discourage people from using the
library.
2. Harder to use. Having to check for eof and fail make writing simple filters,
like the above, slightly harder. I'm worried that the effect on more complex
filters may be even worse. This applies not just to get, but to the other
functions as well, since their returns values will require more careful
examination.
3. Performance. It's possible that the change will have a negative effect on
performance. I was planning to implement it and then perform careful
measurements, but I have run out of time for this. I think the effect will be
slight.
VI. Benefits ------------------------
A positive side-effect of this change would be that I can rename the filter
concepts
InputFilter --> PullFilter
OutputFilter --> PushFilter
and allow both types of filter to be added either to input or to output streams.
Filter writers could then choose the filter concept which best expressed the
filtering algorithm without worrying whether it will be used for input or
output.
VII. Alternatives.
1. Adopt the convention that read() always blocks until at least one character
is available, and that get() always blocks. This would give up much of the
advantage of non-blocking and async i/o.
2. Add new non-blocking filter concepts, but hide them in the "advanced" section
of the library. All the library-provided filters would be non-blocking, and
users would be encouraged, but not required, to write non-blocking filters.
If you've made it this far THANK YOU!!!
Please let me know your opinion.
Jonathan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk