|
Boost : |
From: Carlo Wood (carlo_at_[hidden])
Date: 2004-08-31 16:55:46
On Tue, Aug 31, 2004 at 12:45:30PM -0600, Jonathan Turkanis wrote:
> pubsync is called, the contents of the buffer will be sent to the first filter
> in the chain, like so (assuming it's a 'buffered filter'):
>
> filter_.write(buf, buf + n);
[...]
> Yes, for the time being. If your ideas can eliminate copying further, I'd be
> glad to try to incorporate them. (But I haven't looked at your library yet.)
My idea then would involve the introduction of a 'message' object,
something that abstracts a contiguous piece of data with a finite
size that can be processed as a unit. For example, one line of
text in the case of text filters - or one packet of data when
processing a UDP stream - or one binary packet that starts with
an envelope/header followed by a payload etc.
Then, instead of passing (buf, buf + n), this more abstract 'message'
object should be used then. The message object would contain the
'buf' pointer and the size 'n' - not the complete data of course.
Purely for exposition:
struct Message {
char* buf;
size_t n;
};
A filter should then be allowed to do the follow things with this
object:
1) Tell it that the data can be freed.
If the data is still in the original streambuf then the
message object would take care of telling the streambuf
that the part it was holding is now free again.
2) Process it inline - it would not write outside the buffer
but only examine it and change things perhaps such that
the result still fits in the same buffer.
3) Copy the data to a newly allocated memory block (which now
can be larger than the orginal), filtering it while copying
it if needed. This means that the 'message object' tells
the streambuf that the data is now freed. Subsequential
'freeing' of the message would now delete the allocated
memory block and not that of the stream buf.
To the user of the 'message' only this interface would be
visible (for example):
Message::start() const : Get the start of the message.
Message::size() const : Get the size of the message.
Message::reserved() const : Size of the allocated buffer.
Message::reserve(size) : Increase buffer size (possibly causing a copy).
Message::set_size(size) : Set a new message size.
Message::~Message : Free the underlaying data and destruct the message object.
Message::Message(size) : Create a new Message object with an uninitialized
buffer of size 'size'.
The call to a filter would then become:
filter_.write(message); // Passing a Message
The reason that this is not a trivial change is mostly because
the streambuf must be aware of the existance of these Message objects.
If you would seriously consider to go for this approach then I am
willing to donate my dbstreambuf code.
Filters that can be implemented without the need to increase
the message size can then always work 'in place', without the
need for unnecessary copying.
Filters that need to enlarge a buffer also do not always have
to copy the data; when the message buffer is already large enough
then no copying is needed. For example, to transform a compressed
UNIX text file to a compressed windows text file:
file >> expand_msg(2000) >> decompress >> add_cariage_return >> compress >> file
Only the first filter would copy the data (would call new char [2000] and
copy the size of the real message, which can be much smaller - leaving
rest of the buffer uninitialized). decompress then would not have to
allocate new space - and neither would 'add_cariage_return' etc.
[ However, this still isn't satisfactory because a decompress filter will
ALWAYS have to copy the data. Better would be to be able to pass a
size to the decompress filter:
file >> decompress(2000) >> add_cariage_return >> compress >> file
or, just tell the decompress filter that it should try to make
the resulting message have a buffer that is at least 1 character
larger than the size of the resulting message:
file >> decompress(1) >> add_cariage_return >> compress >> file
Then really only a single copy is needed. On the other hand, the
first is also already advantegous in that only a single allocation
is needed: malloc is slow too *).]
-- Carlo Wood <carlo_at_[hidden]> *) Which seems to indicate that the Message object should have an Allocator template parameter (ie, to implement memory pools).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk