Boost logo

Boost :

From: Scott Woods (scottw_at_[hidden])
Date: 2005-06-15 17:43:19


Hi Pedro,

Still trying to get my Outlook to indent (>) and failing. I have all
the proper options set but they are ignored. Go figure. Re-install time.

I've inserted my comments with *

----- Original Message -----
From: "Pedro Lamarão" <pedro.lamarao_at_[hidden]>

> 3. Lex+parse techniques do not care about block lengths. An
> accept state or parser reduction can occur anywhere. All the
> "unget" contortions recently mentioned are not needed. Partial
> messages are retained in the parser stack and only finally
> assembled on accept/reduce. This property is something
> much easier to live with than any kind of "fixed-size" approach
> that I have dealt with so far.

This is the kind of application of a network library I'm most intrigued by.

I've experimented with an aproximation of this approach by modifying a
sinister buffering scheme in a C# application by apparently inefficient
calls to the equivalents of send and receive to get only one byte at a
time and implement a simple lexer; I expected terrible losses but
experienced very little of those. Later reapplying a buffering layer at
only two particular points made the difference very difficult to measure.

* Ah yes. Dont know about sinister buffering or C# but think
* I follow enough from context. And your observations are consistent
* with what I have seen.

[snip code]

iostream stream;
protocol_message message;

while (stream >> message)
{
  // Work.
}

* Very nice.

No exception is thrown. But an exception could be thrown; iostream can
be configured to do that, and throw an ios_base::failure.

The current implementation of the irc_client example distributed in the
package I uploaded to the Sandbox is in this URI:

https://mndfck.org/svn/socketstream/branches/boost/libs/network/example/irc_
client/message.hpp

* I did try to decompress your package with Windows utilities. These failed
* with messages about "not bzip2"; can you indicate a specific utility?

This version has a Spirit grammar for a (modified) version of the IRC
grammar as defined in 2812. It's still rough in the edges, but much
better than used to be.

IRC is a very uninsteresting application, but it's an interesting
protocol to experiment with as there is no guarantee when a message is
coming from where. "Synchronized" protocols like SMTP are much easier;
client sends, server responds, and that's pretty much it.

I'm very interested in these kinds of applications of a "netbuf" and the
implementation of reusable "protocol message" classes for common
protocols; I'm probably going after HTTP next, and try to write a
simplified wget.

There was also a concern earlier in this thread about excessive
buffering in streambuf's with "fixed-sized message" protocols I'd like
to address with an example.

**************************************
Nice use of boost. Did you mention this in the "who's using boost" thread?

>From your code snippets I can see the layering of activity and how
ultimately
it is flexible enough to cope with the likes of IRC and (possibly :-) IMAP4.
My concern about multi-pass is probably superseded by that exact ability
to cope with ugly protocols (in same cases the ugliness is more correctly
described as part of the encoding).

In previous threads addressing similar issues the suggestion was to use an
"envelope" approach; that delivered the same benefits as your low-level
header+body.

It is a little bit tragic to concede this point for me as I have invested
quite
heavily in a technology that parses straight from the network block to a
variant. The variant is capable of holding a vector of variants as a "value"
(yes, a recursive definition). Operator>> is overloaded in such a way
that you can code in this manner;

struct routable_message
{
    unsigned long to_address;
    unsigned long from_address;
    net_variant payload;
};

routable_message &
operator>>( net_variant &v, routable_message &m )
{
    vector<net_variant> &a = net_array<3>( v ); // Access the expected
tuple
    a[ 0 ] >> m.to_address;
    a[ 1 ] >> m.from_address;
    a[ 2 ] >> m.payload;
    return m;
}

At the point where a variant is completed (e.g. part way through a network
block),
it is presented to a receiver e.g.

void
message_router::operator()( net_variant &v )
{
    operator()( v >> routable_message() );
}

void
message_router::operator()( routable_message &m )
{
    iterator f = find( m.to_address );
    if(f == end()) return;
    (* f->second )( m.payload, m.from_address );
}

Hopefully this is enough to show how elegant the code becomes even when
dealing
with multiple layers of software, i.e. the message_router has no idea what
type conversions
are performed by the receiver of the payload.

All operator>> implementations are required to use "move" semantics so any
data "new'd"
by the variant parser is exactly the data that is finally moved into the
application type.

To summarize; I have been resisting the header+body (or "envelope")
technique but it
would appear to be more extensible. The separation of "message completion"
and
"content parsing" allows for more protocol-specific handling that I cannot
do as my
"parser" runs over the entire message. Again the protocol-specifics that I
allude to are
often better described as encoding specific as most of the TCP application
suite binds
an encoding inextricably to each protocol. Dealing with continuations and
embedded
objects (different encoder states) may still exhaust the extensibility of
the envelope approach.

There is nothing in the IMAP4 protocol that cannot be represented within
something such as my net_variant, i.e. it does not need a protocol-specific
encoding. The same for SMTP, HTTP, .... How much simpler it could have
been!

gracias,
Scott


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk