Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-08-31 13:45:30


"Carlo Wood" <carlo_at_[hidden]> wrote in message
news:20040831181013.GA16216_at_alinoe.com...
> On Tue, Aug 31, 2004 at 09:55:19AM -0600, Jonathan Turkanis wrote:
> > > file > bzip2 (binary) > text > (process) > bzip2 (binary) > file
> >
> > Line-ending conversions can be done by sticking a newline filter in between
the
> > binary and text filters. When converting_stream is up and running, code
> > conversion will be inserted at the appropriate place in a filter chain
> > consisting of mixed narrow- and wide-character components.
>
> Hiya, me again :)

Howdy!

> I suppose I should really just download the code and start playing
> with it in order to be able to review it - but I have no time for
> that. So, don't take my questions as formal critics please; I am
> just being interested as a possible (future) user (and possibly
> even contributor).
>
> My concern, when I see those filter chains, remains performance
> as a result of unnecessary copying of data.

Yes, that's a major concern.

> Can you tell me; would the average filter copy the data
> (from one place in memory to another)? I'd suppose so, because
> it is unlikely that an arbitrary filter can be trusted to use
> the streambuf as a 'work space'.

I was planning to write a section on efficiency in the documentation, but I ran
out of time. Here's a typical scenario, for output with a chain of several
filters.

The user (possibly an ostream) of the initial streambuf in the chain writes
characters using sputc and sputn. These function simply copy characters to the
output buffer of the initial streambuf. When the buffer becomes full, or when
pubsync is called, the contents of the buffer will be sent to the first filter
in the chain, like so (assuming it's a 'buffered filter'):

          filter_.write(buf, buf + n);

The filter examines the sequence of charaters it has been provided with, and
writes a possibly modified sequence to the second streambuf in the chain, using
(indirectly) sputc and sputn. This process continues until a sink is reached.

So, yes, characters are copied at each juncture. In general, it is not possible
to eliminate this copying, as far as I know. In special cases, it can be
eliminated, as I mentioned in a previous message, which I repeat here for
convenience:

> There are several optimizations which I have held in reserve which would also
> minimize copying:
> a) Allowing resources to advertise that they are streambuf-based, so that
> i/o is performed directly to the underlying streambuf, with no additional
layer
> of buffering
> b) Giving special treatment to symmetric filters (used to implement
> compression/decompression) to allow them to have direct access to the buffers
of
> adjacent streambufs in a chain.
> c) allowing for a category of 'transparent filters' which simply observe
> character sequences, forwarding them unchanged. This would allow many useful
> filters (such as the offsetbuf suggested by David Abrahams) to have
essentially
> zero overhead.

The category of transparent filters could be generalized any filter which can be
repesented as a function

       char f(char);

so that the length of a sequence of characters is never changed by the filters.

Requiring that all filters be implemented as symmetric filters would also reduce
copying, I beleive, but symmetric filters are surprisingly difficult to write
correctly.

> And if so, does this mean that if I put ten filters in a chain
> that then the data is copied ten times? Please enlighten me :).

Yes, for the time being. If your ideas can eliminate copying further, I'd be
glad to try to incorporate them. (But I haven't looked at your library yet.)

> Thanks for your time in advance!

No trouble.

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk