Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2004-09-09 10:48:21


>I decided not to use iterators because I concluded that it would miss the
>opportunity for many important optimizations that can be made when one is
>presented with a contiguous buffer full of characters instead of one
>character at a time. For example, I don't think zlib is very efficient if
>you pass it one character at a time.

>Still ...

>I wish I had looked at the Dataflow Iterators in your library when I first
>read your message! I put it on my mental 'to do' list then promptly forgot.
>It looks like Dataflow Iterators and input/output filters are intended to
>solve exactly the same sort of problems in many cases. So there are these
>obvious questions:

I didn't want to make a big issue of this because it amounts to comparing
what currently the sketch of an idea (my dataflow iterators) to a fully
developed and implemented idea (your iostreams). It's an unfair (to you)
and often misleading type of comparison. So I see your library getting
accepted more or less as it is because it works, well documented and
addresses a real and pressing need. Dataflow iterators really became
practical with Dave's new iterator façade/adaptor library - with its very
formal definition. But, in spite of the fact that they have been used
successfully in the serialization package for doing some of the things your
library does, Dataflow iterators are a little too fragile for general use.
Someday you or someone might want to re-implement some of your library with
dataflow iterators - or not. At that point it would become an implementation
issue.

I'm very pleased that someone has seen the appeal of this idea. I sort of
feel that I'm out there by myself with my ideas. I think its because I'm
getting old. Anyway.

>1. Where the functionality overlaps, which approach is faster?

One of the prime motivations for the dataflow iterators is gain absolutely
maximum speed. This should occur by collapsing of composed inline functions
by C++ optimizing compilers. My limited investigation into this convinces
me that this occurs as I would hope.

>2. What can your approach do that mine can't?
>3. What can my approach do that yours can't?

Dataflow iterators is basically a compile-time composition concept while
your method is basically a run-time composition concept. There in lies the
difference

Compile time composition
a) compilers can collapse inline code for maximum speed
b) each composition of filters is recompiled - can imply replicated almost
the same code. Code bloat possible
c) new chains can't be made at runtime under program control - its too late
to recompile.

Runtime composition
a) filter chain can be built as needed under program control.
b) each filter is compiled only once and reused. No code bloat
c) no opportunity for "inter-filter" optimization - the code is compiled
before the filters are composed.

>4. If your approach is better all around, or in a broad class of cases, can
>I adapt my library to use it? (I think so -- the pipe notation, which you
>don't like, can give the compiler a clue which filters should be fused at
>compile time.)

My main complaint with the pipe notation is that, is doesn't permit the
following idea to be expressed without destroying its original elegance:

Encrypt(
        Cat(
                Stream 1,
                Stream 2
        )
)

That is, that you have to break it (the pipe idea) in order to save it.
Also I believe that implementing this at compile time by overloading
operators would be a lot of work for mere syntactic sugar. In fact this
hides the operation of your chaining scheme rather than keeping it
transparent. This is the same reason I like to see the streambuf/stream
clarified rather than hidden. Hiding is often a good idea - unix command
line pipes - but I don't think it’s a great idea for programmer libraries.
I want sharp tools whose application and usage are obvious so that I can
compose them into what I need. I don't want a wrapped up solution. So I
like a smaller library (e.g. streambuf) with examples (for cut and paste) on
how to use it. I also think it's easier to document.

I don't think you mess with your library in a major way. Separating out
streams/buffers is useful conceptually but that's really mostly a matter of
documentation. Basically you should finish what you started, let people
start using it then at your leisure if you have the inclination look in this
idea.

5. If my approach is better, can your library use it? (BTW, How do you use
the Dataflow Iterators in your library?)

My use case for this functionality is the following

I want a zlib compression filter and an encryption filter. I want to be able
to use them in a stream either as compress | encrypt or encrypt | compress.
Your library can do that. So the problem is solved in the short term.

Its interesting how my thinking has evolved on this. First I thought what I
need was a stream adaptors to adapt a stream using filters. After I became
more familiear with streams and streambufs I realized that this
functionality didn't belong in streams but in streambufs - exactly where you
put it. I was much encouraged by your efforts and approach - and Jeff
Garland's previous accomplishment of adding zlib compression to a streambuf.

Alas, I've now come to understand where codecvt facet fits in here. My
current thinking - its just speculation at this point - is composition of
codecvt facets using dataflow iterators. This would result in:

a) a codecvt_facade - template for making a codecvt class given a dataflow
iterator chain at compile time.

b) dataflow iterators for encrypt, compress, base64, wchar_t to mbchar,
mbchar to wchar_t, wchar_t to utf-8 etc.

c) a method - codecvt_adaptor ? - for composition of arbitray codecvt
facets. This would permit us to use other existing pre-compiled facets like
translating to Chinese mult-byte characterset.

So given the above, we could make a custom codecvt facet which would take
the output, compress it, encrypt it, convert to base64, convert to Chinese
character set with one compile time statement and use that facet with any
stream. Except for the Chinese part which used a pre-compiled element, the
compiler would optimize away all redundant copying.

Naturally this codecvt facet on steroids could be used with any streambuf
without alteration. (or even recompilation).

So the "final division of effort would be"

Streams - rendering types like int, float, etc as char or wchar strings.
Streambuf - buffering
Codecvt_facet - filtering/transformation

Bear in mind that this glosses over details like the fact that there's a lot
of annoying little differences in standard libraries in codecvt_facet
interface, function prototypes, etc. GCC used 4 by wchar_t, window uses 2
byte wchar_t, endien issues, on and on and on. I've had problems with
several libraries that I don't believe handle codecvt facets properly. Just
using the utf-8 facet written by Ron Garcia - (nice job - no errors) has
been a trial due to these issues.

>I'm going to try to re-implement some of your iterators as Filters, and see
>what happens.

Feel free to experiment at your leisure. But don't get distracted.
Starting something is easy - finishing something is very hard. You're in a
position to get something of substance actually finished and in the hands of
real users. Don't lose focus on that.

In the serialization library - composition of dataflow iterators is used to
implement the following:

Escaping/unescaping html text
Binary <-> base64 translation
Wstring <-> string translation
Maybe others - I don't remember

The dataflow concept was once and object of research interest as a
fundamental programming paradigm. It's somewhat to related to functional
programming in this regard. My dataflow iterator implementation is somewhat
similar to the FP package reviewed earlier this year in that it implements a
different programming paradigm using the facilities of C++ templates.
However, its much less ambitious and much more focused on immediate C++
problems rather producing a complete implementation of another programming
paradigm.

One thing is clear to me. We're going to see lots of new things in the
future of programming. Much more than people think.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk