Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2004-09-09 15:20:34


> "Robert Ramey" <ramey_at_[hidden]> wrote in message

> >I wish I had looked at the Dataflow Iterators in your library when I first
> >read your message! I put it on my mental 'to do' list then promptly forgot.
> >It looks like Dataflow Iterators and input/output filters are intended to
> >solve exactly the same sort of problems in many cases. So there are these
> >obvious questions:
>
> I didn't want to make a big issue of this because it amounts to comparing
> what currently the sketch of an idea (my dataflow iterators) to a fully
> developed and implemented idea (your iostreams). It's an unfair (to you)
> and often misleading type of comparison. So I see your library getting
> accepted more or less as it is because it works, well documented and
> addresses a real and pressing need. Dataflow iterators really became
> practical with Dave's new iterator façade/adaptor library - with its very
> formal definition. But, in spite of the fact that they have been used
> successfully in the serialization package for doing some of the things your
> library does, Dataflow iterators are a little too fragile for general use.
> Someday you or someone might want to re-implement some of your library with
> dataflow iterators - or not. At that point it would become an implementation
> issue.

I didn't mean to suggested that I was considering a redesign of my library to
use iterators. There are several reasons I chose not to do this, some of which I
mentioned already:

1. Many operations are more efficient when they can act on an array of
characters at a time
2. I didn't want user defined components to have to manage the lifetime the
immediately downstream component, since there are some tricky issues in gerenal.
I prefer to pass the downstream component as an argument each time it is needed:

           filter.write(resource, s, n)

Obviously this doesn't work for iterators.
3. When the filter and resource types are composed at compile time, it's
difficult to decide what the constructor arguments should be. I see that with
some of your iterators, you solve this problem using additional template
parameters; this won't work for me, since constructors sometimes have to take
strings, random number generators, etc.

((3) is solved easily using the pipe syntax:

   filtering_stream out( newline_inserter(7) |
                            gzip_compressor(gzip::best_speed) |
                            file("output") ); )

The reason I'm excited about your Dataflow Iterators is that they give me a
chance to test my intuitions about the performance implications of composing
filters at compile time. Until now, I conjectured that compile-time composition
wouldn't make much difference. If I implement some of your filtering algorithms
in my framework, I can run a head-to-head test. BTW, you say the comparison is
unfair to me, but it might be the other way around since I've already spent a
considerable amount of time on optimizations. If your filters are nonetheless
faster, that shows me I need to consider compile-time composition.

> I'm very pleased that someone has seen the appeal of this idea. I sort of
> feel that I'm out there by myself with my ideas. I think its because I'm
> getting old. Anyway.
>
> >1. Where the functionality overlaps, which approach is faster?
>
> One of the prime motivations for the dataflow iterators is gain absolutely
> maximum speed. This should occur by collapsing of composed inline functions
> by C++ optimizing compilers. My limited investigation into this convinces
> me that this occurs as I would hope.

What other approaches did you try?

> >2. What can your approach do that mine can't?
> >3. What can my approach do that yours can't?
>
> Dataflow iterators is basically a compile-time composition concept while
> your method is basically a run-time composition concept. There in lies the
> difference

<snip runtime/compile-time comparsion>

I was thinking iterators might allow additional flexibility since the value
types don't have to match exactly.

> >4. If your approach is better all around, or in a broad class of cases, can
> >I adapt my library to use it? (I think so -- the pipe notation, which you
> >don't like, can give the compiler a clue which filters should be fused at
> >compile time.)
>
> My main complaint with the pipe notation is that, is doesn't permit the
> following idea to be expressed without destroying its original elegance:
>
> Encrypt(
> Cat(
> Stream 1,
> Stream 2
> )
> )

This would be:

   filtering_ostream out(encryptor() | concatenate(out1, out2));

Looks fine to me. (Currently, in my framework, concatenation only works for
input.)

> That is, that you have to break it (the pipe idea) in order to save it.
> Also I believe that implementing this at compile time by overloading
> operators would be a lot of work for mere syntactic sugar.

I've discovered that it's not just sugar, though. Rob Stewart pointed out that
specifying a sequence of filters and resources all at once allows some
compile-time errors which would otherwise show up only at runtime.

Also, I may implement the idea of an in-place filter, which uses the buffer of
the next filter in the chain rather than its own buffer. The two filters can be
fused at complie time if they are pushed in one statement, using the pipe
notation, but not if they are pushed separately.

> clarified rather than hidden. Hiding is often a good idea - unix command
> line pipes - but I don't think it's a great idea for programmer libraries.
> I want sharp tools whose application and usage are obvious so that I can
> compose them into what I need.

Here's one way the 'hiding' manifests itself, which just occured to me:

   filtering_ostream out(tee(log_stream) | encryptor() | concatenate(out1,
out2));
   std::cout << "chain length = " << out.size() << "\n";

This could print 1, 2 or 3 depending on which filters the compiler decides to
fuse.

> I don't think you mess with your library in a major way. Separating out
> streams/buffers is useful conceptually but that's really mostly a matter of
> documentation. Basically you should finish what you started, let people
> start using it then at your leisure if you have the inclination look in this
> idea.

I'm not planning any major time investement in the library, if it's accepted,
over and above improvements decided during the review. However, adding the pipe
functionality took about half an hour (and it actually worked the first time I
tried to use it). I'm interested to see if I can add some compile-time fusion of
adjacent filters without spending more than a few hours or days.

> Alas, I've now come to understand where codecvt facet fits in here. My
> current thinking - its just speculation at this point - is composition of
> codecvt facets using dataflow iterators. This would result in:

I don't think this is a good idea: codevts use virtual functions heavily, and
they are very difficult to write properly. (more below)

> a) a codecvt_facade - template for making a codecvt class given a dataflow
> iterator chain at compile time.

I've had the idea of a codecvt_facade twice (the second time I forgot that I had
tried it before). Both times I ran into the following problem: codevct's
basically have to store all their state in the state type that is passed as an
argument to each member function, and in many cases users are allowed to assume
that the state type is mbstate_t. This makes it hard to do something like this:

    struct Filter {
       // typedefs
        result_type in( const char* from, const char* from_end,
                        const char*& from_next,
                        char* to, char* to_limit, char*& to_next );
        // other members
    };

    typedef codevt_facade<Filter> my_codevt;

> b) dataflow iterators for encrypt, compress, base64, wchar_t to mbchar,
> mbchar to wchar_t, wchar_t to utf-8 etc.
>
> c) a method - codecvt_adaptor ? - for composition of arbitray codecvt
> facets. This would permit us to use other existing pre-compiled facets like
> translating to Chinese mult-byte characterset.

My library offers something very similar. It's called a SymmetricFilter
(http://tinyurl.com/3lz82.) Essentially, it represents a generic version of
codecvt. SymmetricFilters are used to implement wrappers for zlib and bzip2.
Right now, they involve more copying than necessary, but with a planned
optimization they will become one of the most efficient types of filters.

The problem is that they are very difficult to write correctly. The simple test
class toupper_symmetric_filter ( http://tinyurl.com/6bu23), took me two hours to
get working properly. (The buffering is not necessary -- I added it to produce a
better test case).

> So given the above, we could make a custom codecvt facet which would take
> the output, compress it, encrypt it, convert to base64, convert to Chinese
> character set with one compile time statement and use that facet with any
> stream. Except for the Chinese part which used a pre-compiled element, the
> compiler would optimize away all redundant copying.
>
> Naturally this codecvt facet on steroids could be used with any streambuf
> without alteration. (or even recompilation).
>
> So the "final division of effort would be"
>
> Streams - rendering types like int, float, etc as char or wchar strings.
> Streambuf - buffering
> Codecvt_facet - filtering/transformation

I disagree, as explained above. Ultimately, I'd like the division to be:

  stream_facade - formatting

  template< typename FilterOrResource, <---filtering
                  typename Buffering = ..., <----buffering
>
 struct streambuf_facade;

> >I'm going to try to re-implement some of your iterators as Filters, and see
> >what happens.
>
> Feel free to experiment at your leisure. But don't get distracted.
> Starting something is easy - finishing something is very hard. You're in a
> position to get something of substance actually finished and in the hands of
> real users. Don't lose focus on that.

I've been planning to write many of these filters for quite a while, but have
forced myself to wait until I know whether the library is accepted. However, if
it looks like I can steal a good part of the implementation from serialization,
I may no longer be able to resist the temptation. ;-)

> In the serialization library - composition of dataflow iterators is used to
> implement the following:
>
> Escaping/unescaping html text
> Binary <-> base64 translation
> Wstring <-> string translation
> Maybe others - I don't remember
>
> The dataflow concept was once and object of research interest as a
> fundamental programming paradigm.

> One thing is clear to me. We're going to see lots of new things in the
> future of programming. Much more than people think.

I'm sure.

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk