|
Boost : |
From: Pedro Lamarão (pedro.lamarao_at_[hidden])
Date: 2005-06-15 12:45:43
Scott Woods wrote:
> 1. The difference (in terms of CPU time) in maintaining a counter
> and inspecting a "current byte" and testing it for "end of message"
> seems minimal. This is stated relatively, i.e. it is far more significant
> that the bytes sent across the network are being scanned at the
> receiver more than once. Even maintaining the body counter is
> a (very low cost...) scan.
> 2. An approach using lex+parse techniques accepts raw byte
> blocks as input (convenient) and notifies the user through some
> kind of accept/reduce return code, that the message is complete
> and already "broken apart", i.e. no further scanning required
> by higher layers.
> 3. Lex+parse techniques do not care about block lengths. An
> accept state or parser reduction can occur anywhere. All the
> "unget" contortions recently mentioned are not needed. Partial
> messages are retained in the parser stack and only finally
> assembled on accept/reduce. This property is something
> much easier to live with than any kind of "fixed-size" approach
> that I have dealt with so far.
This is the kind of application of a network library I'm most intrigued by.
I've experimented with an aproximation of this approach by modifying a
sinister buffering scheme in a C# application by apparently inefficient
calls to the equivalents of send and receive to get only one byte at a
time and implement a simple lexer; I expected terrible losses but
experienced very little of those. Later reapplying a buffering layer at
only two particular points made the difference very difficult to measure.
>>First. We, unfortunately, can't pass std::vector to the operating
>>system, so, at some point, we are allocating fixed sized buffers, and
>>passing it to our IO primitives. There is no escape.
>
>
> Errrrr. Not quite following that. Are you saying that
>
> send( socket_descriptor, &vector_buffer[ 0 ], vector_buffer.size() )
>
> is bad?
No. What I meant was, the operating system won't resize std::vector for
you. It expects a fixed-size amount of memory.
Because of this, every "dynamically safe buffering" must be a layer over
a "fixed size error-prone" buffering done somewhere. That is a
constraint of our primitives.
The intention of a streambuf implementation is precisely to conceal such
a fixed size buffering, offering the most generic interface to what now
becomes a concealed "sequence" (as the documentation I have at hand
would call it).
> Yes you make some very good points. The product I am currently working
> on is a vipers' nest of the protocols you talk about and more. There have
> been some unpleasant suggested uses for protocols such as IMAP4. Trying
> to build a generic network messaging library that facillitates clear concise
> application protocols *and* can cope with the likes of IMAP4 is, I believe,
> unrealistic.
The skeleton of a "protocol message" as I've been working is more or less:
//----------
class protocol_message;
{
public:
void clear () { /* Clear data members. */ }
template <typename IteratorT>
parse_info<IteratorT> parse (IteratorT begin, IteratorT end);
// Defined later.
};
template <typename CharT, typename TraitsT>
basic_ostream<CharT, TraitsT>&
operator<< (basic_ostream<CharT, TraitsT>& o,
protocol_message const& m);
// However is a message in the net...
template <typename CharT, typename TraitsT>
basic_istream<CharT, TraitsT>&
operator<< (basic_istream<CharT, TraitsT>& i, protocol_message& m)
{
using namespace boost::spirit;
// Here we use the Magic Glue
typedef multi_pass<std::istreambuf_iterator<char> > iterator_t;
iterator_t begin(i);
iterator_t end = make_multi_pass(std::istreambuf_iterator<char>());
parse_info<iterator_t> info = m.parse(begin, end);
if (!info.hit)
i.setstate(std::ios_base::failbit);
return i;
}
namespace detail
{
class grammar : public boost::spirit::grammar<grammar>
{
public:
grammar (protocol_message& m) : _M_m(m) {}
template <typename ScannerT>
class definition;
// We'll write in _M_ but the constructor takes a const reference.
private:
protocol_message mutable& _M_;
}
}
template <typename IteratorT>
boost::spirit::parse_info<IteratorT>
message::parse (IteratorT begin, IteratorT end) {
using namespace boost::spirit;
this->clear();
detail::grammar g(*this);
return boost::spirit::parse(begin, end, g);
}
//---------
Note how operator>> sets failbit in case of an unsuccessful parse: it
allows us to write:
iostream stream;
protocol_message message;
while (stream >> message)
{
// Work.
}
// Parsing failed or other error; try to recover?
No exception is thrown. But an exception could be thrown; iostream can
be configured to do that, and throw an ios_base::failure.
The current implementation of the irc_client example distributed in the
package I uploaded to the Sandbox is in this URI:
https://mndfck.org/svn/socketstream/branches/boost/libs/network/example/irc_client/message.hpp
This version has a Spirit grammar for a (modified) version of the IRC
grammar as defined in 2812. It's still rough in the edges, but much
better than used to be.
IRC is a very uninsteresting application, but it's an interesting
protocol to experiment with as there is no guarantee when a message is
coming from where. "Synchronized" protocols like SMTP are much easier;
client sends, server responds, and that's pretty much it.
I'm very interested in these kinds of applications of a "netbuf" and the
implementation of reusable "protocol message" classes for common
protocols; I'm probably going after HTTP next, and try to write a
simplified wget.
There was also a concern earlier in this thread about excessive
buffering in streambuf's with "fixed-sized message" protocols I'd like
to address with an example.
-- Pedro Lamarão Desenvolvimento Intersix Technologies S.A. SP: (55 11 3803-9300) RJ: (55 21 3852-3240) www.intersix.com.br Your Security is our Business
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk