|
Boost : |
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-07-07 10:48:42
On Mon, Jul 7, 2008 at 12:19 PM, Dean Michael Berris
<mikhailberis_at_[hidden]> wrote:
> On Mon, Jul 7, 2008 at 5:43 PM, Giovanni Piero Deretta
> <gpderetta_at_[hidden]> wrote:
>> On Mon, Jul 7, 2008 at 6:29 AM, Dean Michael Berris
>> <mikhailberis_at_[hidden]> wrote:
>>>
>>> The advantage would be that if operator| constructed the type by
>>> itself compared to having 'f' construct the type at runtime, then the
>>> compiler can (and usually would) perform optimizations on the code it
>>> generates for the type created by operator|.
>>
>> Which optimizations cannot be performed by an 'f' that returns a lazy range?
>>
>
> Removing the need to actually create temporary objects.
You would be substituting one light weight temporary (a lazy range)
for another one, the expression template node.
>
>>>This is the same reason
>>> why expression templates work very well for Blitz++ and the ublas
>>> library IIRC.
>>
>> <snip blitz example>
>>
>>>
>>> If you use expression templates in this situation, you would do away
>>> with the temporary object(s) created in the intermediate
>>> multiplication -- and have the compiler optimize the code it creates
>>> for you because at compile time you're dealing first with types rather
>>> than values deferred at runtime. The final assignment in the above
>>> line would perform the necessary traversal(s) and assignments with the
>>> benefit of the compiler having optimized already everything it would
>>> have had optimized.
>>>
>>> In the case of implementing operator| in this way, we are then able to
>>> compose/combine the chained lazy range implementation by building the
>>> type at compile time
>>
>> This is not a property of operator|(). Any lazy range library I have
>> seen does it.
>>
>
> True, but these lazy ranges aren't lazy enough -- precisely because
> they rely on temporaries returned from the previously chained
> expressions.
>
Why would that make them not lazy enough? I could say:
"an expression template node is not lazy enough because it relies on
temporaries returned from the previously chained expressions"
>>> -- where at runtime, the code would have
>>> benefitted from compiler opimizations.
>>>
>>> More concretely, if 'f' created the lazy range, then you'd be passing
>>> the instance of the temporary range to the next 'f' in the chain, then
>>> the next, then the next... Then the temporaries you'd be creating are
>>> wasteful which you would avoid when using expression templates.
>>
>> The lazy range temporaries returned by the various 'f' are the
>> equivalent expression templates. Lazy ranges are meant to be very
>> lightweight (little more that two iterators). I see 0 advantages to
>> adding another level of complication.
>>
>
> When you have a chain of at least three operations, you then need to
> have at least two returned temporary objects -- I don't know about you
> but inside a tight loop that's not good especially if the compiler
> doesn't remove the temporaries for you (which I don't think there's a
> standard optimization for anyway, without move semantics or something
> similar that will eliminate the temporaries).
Those temporaries are a bunch of bytes, no dynamic memory allocation,
at most a couple of iterators. Did you see the example I attached in
my other post? Which temporaries would you like to see removed and how
would you remove them?
Using rewrite rules (which is exactly like traversing an expression
template to optimize it), you can simplify expressions containing more
than one range, but after some tests I didn't actually measure any
difference between a plain for loop, non-optimized range expressions
and optimized range expressions. I.e. the compiler is smart enough to
simplify it for you.
May be I'm misunderstanding you; could you provide an example and show
exactly where is the difference between expression templates and lazy
ranges (IMHO they are exactly the same thing)?
>
> The complication introduced should be offset by the efficiency of not
> having to instantiate and potentially destroy temporary objects when
> chaining using the compile-time built types. The temporary object
> construction, copy, and destruction is the killer that when you do it
> in a tight enough loop (or even enough times in the case of
> asynchronous massively parallel high performance applications)
eh? These temporaries are constructed of objects having trivial
destructors: iterators (if at all), function objects and references.
Any compiler written in the last 15 years can completely omit a call
to such a destructor.
>-- the only recourse is to create *by hand* the exact range type you want so
> that you don't rely on the temporary objects returned by chaining
> operations; which actually defeats the purpose of introducing the
> chaining using operator| in the first place (for ease of use).
I can't still measure any abstraction overhead with a modern compiler
and at least filter_iterator and mapped_iterator, without any extra
expression template.
BTW, letting operator| do all magic, means that you get no benefit if
you actually not use it. I would expect that the only difference
between:
v = reduce(map(filter(x, f), m), r)
and
v = x | map(_, m) | filter(_, f) | reduce(_, r)
were only the syntactic sugar, not any performance difference (i.e.
the translation between the two is trivial)
<snip>
>
>>> Having
>>> a 'lazy_range' allows you to apply your own re-write rules (like what
>>> you mention below) when the nested 'type' class is accessed to yield a
>>> real instance of the range.
>>
>> You can do rewrite rules any way.
>>
>
> True, but only if you knew the type beforehand and invoke the rewrite
> explicitly.
There is no need to invoke the rule explicitly. It is magic :)
Really, I think that we are talking basically about the same thing. We
are just discussing about details. In my view, chains of lazy ranges
*are* a form of expression templates, and I cannot see what a smart
operator| would add to that: lazy ranges encode the same information
that you would encode in an expression template, with the same (or
lack of) penality. Maybe you could show a proof of concept
implementation...
-- gpd
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk