Boost logo

Boost :

From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2008-07-10 06:25:11


Sorry this took a while again... I promise I'll be more prompt with
responses next time an interesting discussion like this is going on.
;-)

On Mon, Jul 7, 2008 at 10:48 PM, Giovanni Piero Deretta
<gpderetta_at_[hidden]> wrote:
> On Mon, Jul 7, 2008 at 12:19 PM, Dean Michael Berris
> <mikhailberis_at_[hidden]> wrote:
<snip>
>>
>> Removing the need to actually create temporary objects.
>
> You would be substituting one light weight temporary (a lazy range)
> for another one, the expression template node.
>

Right, but the expression template "node" will just contain references
-- and actual iterators will only be built once with the exact type,
instead of encapsulating and creating multiple range objects passed
from function to function.

>>
>>
>> True, but these lazy ranges aren't lazy enough -- precisely because
>> they rely on temporaries returned from the previously chained
>> expressions.
>>
>
> Why would that make them not lazy enough? I could say:
> "an expression template node is not lazy enough because it relies on
> temporaries returned from the previously chained expressions"
>

I'm actually thinking of making it lazier than just using lazy ranges
returned from functions in a chain:

  reduce( map( filter( source, f), m, r )

In this case when you have:

  template <class R, class F>
  result_of<F(R::iterator::value_type)>::type
  reduce(R const & r, F const & f) {
    return f(begin(r), end(r), I::value_type());
  };

  template <class R, class F>
  mapped_range<R, F>
  map(R const & r, F const & f) {
    return mapped_range<R, F>(r, f);
  };

  template <class R, class F>
  filtered_range<R, F>
  filter(R const & r, F const & f) {
    return filtered_range<R, F>(r, f);
  };

Consider the number of temporaries you actually create when you chain
them together, compared to something like this:

  source | filter(f) | map(m) | reduce(r) -> T

In essence, what I was thinking is:

  template <class T>
  T create(T) {
    return T();
  };

  create(source | filter(f) | map(m) | reduce(r))(make_tuple(source, f, m, r))

where

  create(T) -> T()

  T()(sequence) -> actual range type

So what should happen is, create(T (*)(_, _)) will create a single
temporary object T (because the compiler would just want the type T
generated from the chained pipe expression), which is polymorphic on
the function operator overload that lazily creates the correct range
type given the sequence argument. This is similar to the technique
used in lambda's, but this time you just want the return type of the
function invocation built by operator|(...) so that you can create the
correct range generator type.

I might be dreaming if this can be done, since I haven't had time to
actually try it out. ;-)

>>
>> When you have a chain of at least three operations, you then need to
>> have at least two returned temporary objects -- I don't know about you
>> but inside a tight loop that's not good especially if the compiler
>> doesn't remove the temporaries for you (which I don't think there's a
>> standard optimization for anyway, without move semantics or something
>> similar that will eliminate the temporaries).
>
> Those temporaries are a bunch of bytes, no dynamic memory allocation,
> at most a couple of iterators. Did you see the example I attached in
> my other post? Which temporaries would you like to see removed and how
> would you remove them?
>
> Using rewrite rules (which is exactly like traversing an expression
> template to optimize it), you can simplify expressions containing more
> than one range, but after some tests I didn't actually measure any
> difference between a plain for loop, non-optimized range expressions
> and optimized range expressions. I.e. the compiler is smart enough to
> simplify it for you.
>
> May be I'm misunderstanding you; could you provide an example and show
> exactly where is the difference between expression templates and lazy
> ranges (IMHO they are exactly the same thing)?
>

I tried above, but generally I think there's a difference between
creating the type first at compile time then reducing the number of
steps actually done at runtime (constructing and destroying objects or
even a bunch of bytes). The idea brought about by the expression
templates allows someone to implement considerably complex operations
using syntactic sugar (or expressive notation) while at the same time
getting the benefit of the efficiency afforded us by the compiler.

So in essence, you'd want expression templates because if:

  reduce(map(filter(source, f), m), r)

Is less efficient than:

  source | filter(f) | map(m) | reduce(r)

Then that's a good reason to either: make the range generation
functions as efficient as the expression template approach, or remove
the range generation functions in favor of the more concise (and
efficient) pipe syntax.

>>
>> The complication introduced should be offset by the efficiency of not
>> having to instantiate and potentially destroy temporary objects when
>> chaining using the compile-time built types. The temporary object
>> construction, copy, and destruction is the killer that when you do it
>> in a tight enough loop (or even enough times in the case of
>> asynchronous massively parallel high performance applications)
>
> eh? These temporaries are constructed of objects having trivial
> destructors: iterators (if at all), function objects and references.
> Any compiler written in the last 15 years can completely omit a call
> to such a destructor.
>

Right, and I'd think 0 bytes of overhead is always better than some
bytes of overhead. ;-) I'm nitpicking, but the idea really is if you
can actually remove all the temporaries or minimize it to one (or none
at all?), then we should think about that approach and see the merits.

>>-- the only recourse is to create *by hand* the exact range type you want so
>> that you don't rely on the temporary objects returned by chaining
>> operations; which actually defeats the purpose of introducing the
>> chaining using operator| in the first place (for ease of use).
>
> I can't still measure any abstraction overhead with a modern compiler
> and at least filter_iterator and mapped_iterator, without any extra
> expression template.
>
> BTW, letting operator| do all magic, means that you get no benefit if
> you actually not use it. I would expect that the only difference
> between:
>
> v = reduce(map(filter(x, f), m), r)
>
> and
>
> v = x | map(_, m) | filter(_, f) | reduce(_, r)
>
> were only the syntactic sugar, not any performance difference (i.e.
> the translation between the two is trivial)
>

Right. Unless you can also optimize the range generation functions to
work as fast as or as efficiently as the magical operator|
implementation. ;-)

>>
>> True, but only if you knew the type beforehand and invoke the rewrite
>> explicitly.
>
> There is no need to invoke the rule explicitly. It is magic :)
>

Oh, right!

> Really, I think that we are talking basically about the same thing. We
> are just discussing about details. In my view, chains of lazy ranges
> *are* a form of expression templates, and I cannot see what a smart
> operator| would add to that: lazy ranges encode the same information
> that you would encode in an expression template, with the same (or
> lack of) penality. Maybe you could show a proof of concept
> implementation...
>

I think using operator|(...) to just create the type and not actually
create the range(s) (like the above) would be something worth looking
into at least. I think that's the way a lazy range generator or lazy
range type should work -- much like how everything else in the lazy
world works.

About the proof of concept, I'll see what I can do. :-)

Thanks, and I hope this helps!

-- 
Dean Michael C. Berris
Software Engineer, Friendster, Inc.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk