Boost logo

Boost :

From: Scott Woods (scottw_at_[hidden])
Date: 2005-06-14 20:58:45


Hi Pedro,

Apologies for any sloppy formatting; mail client woes.

----- Original Message -----
From: <pedro.lamarao_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Wednesday, June 15, 2005 12:08 AM
Subject: Re: [boost] [Ann] socketstream library 0.7

> This "buffering problem" is the problem that leads people to design
> protocols with fixed sizes everywhere.

Yes - exactly. Header (incl length) and body is a perfectly functional
response to a need. Is it the best we can do?

>> To get to the point; I am currently reading blocks off network
connections
>> and presenting them to byte-by-byte lexer/parser routines. These form
>> the structured network messages directly, i.e. fields are already plucked
>> out.
>>
>> So which is better? Direct byte-by-byte conversion to structured network
<> message or multi-pass?
>
> I understood you correctly, I might rephrase that to myself like Do we
> read the whole message before parsing, or are we parsing directly from
> the data source?

Yes. That's a reasonable paraphrasing.

> If we parse directly from the data source, we must analyze byte by byte,
> and so obtain byte by byte. If we want this, we will want a buffering
> layer to keep the amount of system calls to a reasonable level.
>
> streambufs provide such a buffering level, with IO operations proper for
> lexical analysis at such a level: sgetc, snextc, sbumpc.

Yes, thats true. As well as many points you make about streambufs (didnt
realize they were quite that flexible).

>
> If you remember streambuf_iterators exist, and imagine a multi_pass
> iterator (hint, hint), many other interesting things come to mind.
>
> If we read the message completely beforehand, we must know how much we
> have to read, or we must inspect the data source in some way to watch
> for "end of message".

[snip]

> At this point, we have read the same amount of bytes from the data
> source, in whatever order. But the amount of calls made to the IO system
> service is not the same, and the fixed size approach is more efficient
> in this regard.
>
> Also, the fixed size approach solves the "buffering problem" since we
> make no resizings along the way. C++ people, blessed with std::vector,
> already have a mechanismo to do away with such weirdness; think about
> how you do it in C.

Sorry but there is such a gulf between our approaches I'm not
sure I can say anything to help clarify. As a last response the best
I can do is say that;

1. The difference (in terms of CPU time) in maintaining a counter
and inspecting a "current byte" and testing it for "end of message"
seems minimal. This is stated relatively, i.e. it is far more significant
that the bytes sent across the network are being scanned at the
receiver more than once. Even maintaining the body counter is
a (very low cost...) scan.
2. An approach using lex+parse techniques accepts raw byte
blocks as input (convenient) and notifies the user through some
kind of accept/reduce return code, that the message is complete
and already "broken apart", i.e. no further scanning required
by higher layers.
3. Lex+parse techniques do not care about block lengths. An
accept state or parser reduction can occur anywhere. All the
"unget" contortions recently mentioned are not needed. Partial
messages are retained in the parser stack and only finally
assembled on accept/reduce. This property is something
much easier to live with than any kind of "fixed-size" approach
that I have dealt with so far.

>
> First. We, unfortunately, can't pass std::vector to the operating
>system, so, at some point, we are allocating fixed sized buffers, and
>passing it to our IO primitives. There is no escape.

Errrrr. Not quite following that. Are you saying that

send( socket_descriptor, &vector_buffer[ 0 ], vector_buffer.size() )

is bad?

>
>If you are initializing std::vector with the correct size and giving
>&*begin() to these primitives, well... Why not allocate with new? If you
>are allocating it with whatever default size and resizing it later, you
>are losing part of the proposed benefit.

Hmmm. If you are saying this to strengthen your case for streambufs
then I understand.

>
>When we're about to throw a message to the network, how do we know what
>size it is? If our message is composed of, say, a string, another string
>and an int, are we going to call string::size() twice for every message
>dumped? Is the int representation fixed in size? Is this size enough for
>MAX_INT?

[snip large section]

> If you are on the Internet, you have very little guarantees. It's hell
> out here, sir.

Yes you make some very good points. The product I am currently working
on is a vipers' nest of the protocols you talk about and more. There have
been some unpleasant suggested uses for protocols such as IMAP4. Trying
to build a generic network messaging library that facillitates clear concise
application protocols *and* can cope with the likes of IMAP4 is, I believe,
unrealistic.

I didnt know I had a mechanismo until today. Feels great! :-)

Cheers,
Scott


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk