Boost logo

Boost :

From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2008-07-07 00:29:46


Sorry this took a while.

On Thu, Jul 3, 2008 at 9:20 PM, Giovanni Piero Deretta
<gpderetta_at_[hidden]> wrote:
> On Thu, Jul 3, 2008 at 2:32 PM, Dean Michael Berris
> <mikhailberis_at_[hidden]> wrote:
>>
>> operator|(r, f) -> lazy_range<typeof(r), apply<typeof(f)> >
>
> IMHO operator| shouldn't be restricted to ranges. Logically it should
> just apply its lhs to its rhs, regardless of the type of both lhs and
> rhs [1]. It is not the job of operator| to create lazy ranges, but of
> 'f'.
>
> [1] in fact, lacking concepts, rhs or lhs should provide a way to find
> operator| via adl, for example inheriting from a pipeable empty class.
> Putting operator| in the global namespace is not an option.
>

Right.

>>
>> Now that allows you to chain operator|(r, f) infinitely, just making
>> sure that along the way you create an instance of a specific type
>> (lazy_range) which when applied a begin(...) or end(...) function will
>> yield the appropriately constructed iterator.
>
> what is the advantage with respect to having 'f' construct the lazy
> range instead? It just complicates the implementation of operator|.
>

The advantage would be that if operator| constructed the type by
itself compared to having 'f' construct the type at runtime, then the
compiler can (and usually would) perform optimizations on the code it
generates for the type created by operator|. This is the same reason
why expression templates work very well for Blitz++ and the ublas
library IIRC. The argument goes like:

 matrix A, B, C;
 C = A * B;

This would be fine if you just had an operator* overload that worked
for two matrices -- then your problem becomes the temporary object
created by the following:

 matrix A, B, C, D;
 D = A * B * C;

If you use expression templates in this situation, you would do away
with the temporary object(s) created in the intermediate
multiplication -- and have the compiler optimize the code it creates
for you because at compile time you're dealing first with types rather
than values deferred at runtime. The final assignment in the above
line would perform the necessary traversal(s) and assignments with the
benefit of the compiler having optimized already everything it would
have had optimized.

In the case of implementing operator| in this way, we are then able to
compose/combine the chained lazy range implementation by building the
type at compile time -- where at runtime, the code would have
benefitted from compiler opimizations.

More concretely, if 'f' created the lazy range, then you'd be passing
the instance of the temporary range to the next 'f' in the chain, then
the next, then the next... Then the temporaries you'd be creating are
wasteful which you would avoid when using expression templates. Having
a 'lazy_range' allows you to apply your own re-write rules (like what
you mention below) when the nested 'type' class is accessed to yield a
real instance of the range.

>>
>> Note that in the examples we've been giving:
>>
>> source | take(10) | filter(filter_function()) | map(mapper())
>>
>> We'd have a lazy_range<...> which has a type:
>>
>> lazy_range<
>> lazy_range<
>> lazy_range<
>> range_type,
>> taken_range_generator
>> >,
>> filer_function
>> >,
>> mapper
>>>
>
> how does lazy_range distinguish from a filter range and a map range?
> lazy_range<A, B> has to somehow delegate the decision to the B
> parameter. At that point you might remove all the smartness from
> operator| and move it to B.
>

You use tag dispatching at compile time.

> Also, how is the above different from
> mapped_range<filtered_range<taken_range<range_type>, Filter>,Mapper>
> which is still lazy?
>

Not much except the actual type yielded by 'lazy_range<T>::type' can
be what mapped_range<filtered_range<taken_range<range_type>,
Filter>,Mapper>.

>>
>> An instance of this is returned by the whole expression template (the
>> argument to the construction of the instance would be the original
>> source range), which we can call begin(...) or end(...) against, and
>> get the correct iterator type. If we have the above typedefed as
>> 'some_lazy_range', doing:
>>
>> begin(some_lazy_range(source)) -> the beginning of the lazy range
>> end(some_lazy_range()) -> the 'end' iterator given the lazy range
>>
>
> How this is an unique property of lazy_range? This would work with all
> the range_ex proposals I have seen so far.
>

Without expression templates, you'll need extra temporaries (which is
wasteful) to achieve things with merely lamda's.

>>
>> Right. Maybe that's a bad example. Considering that we can make a
>> reducer an external function and adapted using
>> 'make_function_output_iterator', we can use it in std::copy (of course
>> something like boost::copy(range, function) ):
>>
>> reducer reducer_instance;
>> copy ( source | take(10) | map(mapper()) ,
>> make_function_output_iterator(reducer_instance)
>> );
>
> A Reduce is a binary function: it takes the accumulated value and the
> i'th element to generate the new accumulated value (think
> std::accumulate). So, the above cannot work: an output iterator is
> logically unary, not binary, unless the reducer stores the accumulator
> value 'inside' (then you should be catching the result of copy to
> retrieve the accumulated result). In any case, you need to know the
> type of the accumulated value, and you are no better than in my
> example.
>

Okay, I confused the reducer as merely the function which took in
values and performed the reduction or accumulation in a stateful
manner.

>>
>> In this case I really don't need to know the exact type of the
>> range(s) created by the pipe syntax, because type deduction will do
>> the trick for me.
>>
>
> You do not need to know the exact type of the range in my example
> either. Just the value of the type returned by the final reducer. A
> reducer doesn't usually (but not always) return a range.
>

Right.

>>>>
>>>> Shouldn't the deforestation already have happened in the zipping and
>>>> application of the function to yield another tuple?
>>>>
>>>
>>> let say you have the rewrite rule:
>>> map<map<Range,A>,B> -> map<Range, compose<A,B> >
>>>
>>> If zip_with returns a map<zip<R1, R2>, A>, you can simplify the
>>> expression map(zip_with(r1, r2, a), b) as:
>>>
>>> map<map<zip<R1, R2>, A>, B> -> map<zip<R1, R2>, compose<A,B> >
>>>
>>> without any other special rule. If zip_with were to return a special
>>> zip<R1, R2, A>, you
>>> need a specific rule to handle this case. I.e. zip_with shouldn't be a
>>> primitive, but should be constructed on top of map and zip.
>>>
>>
>> True, but zip_with can return a range: map<map<zip<transform<R1, F1>,
>> transform<R2, F2> > >, A>, B> which can still be re-written.
>>
>
> eh? What is transform there? What are F1 and F2?
>

Given that zipped_with can result a zipped transform iterator, and F1
and F2 are the types of the functions used/provided with zip_with.

>> BTW, are the rewrite rules already there, or will this have to be
>> something that has to still be written using Boost.MPL?
>
> Of course not, for now this is just 'wouldn't it be great if...' :)
>

Which is precisely what can be implemented with lazy_range<...>. :D

HTH

-- 
Dean Michael C. Berris
Software Engineer, Friendster, Inc.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk