Boost logo

Boost :

Subject: Re: [boost] lifetime of ranges vs. iterators
From: Arno Schödl (aschoedl_at_[hidden])
Date: 2008-09-04 04:55:25


> It ought to be possible to propagate a bool up the stack
> indicating whether the increment just reached the end.

> In principle, if the compile inlines all the functions, then
> it ought to be able to avoid any extra boolean test.

I thought about what can be done in principle, without restricting myself to iterators, to optimize the kind of nested functions on sequences we are discussing. I want to start with the end() check problem in increment. Let's consider a toy example:
 
... F2( T2 ( F1( T1( R ) ) ) )

Handcrafted code would look like this. I wrote it with gotos to avoid pre-judgement:

increment() {
next:
   if( base.empty() ) return;
   base.increment();

   T1::reference t1=T1( *base );
   if( !F1( t1 ) ) goto next;

   T2::reference t2=T2( t1 );
   if( !F2( t2 ) ) goto next;

   ....
};

Since we will use function templates to generate this code, we must now organize the code into the blocks we want to create. We basially have two choices, first the obvious one:

increment() {
   do {
      do {
         {
            if( base.empty() ) return; // one path out
            base.increment();
            // dropping out the bottom is other path out
         }

         T1::reference t1=T1( *base );
      } while( !F1( t1 ) );

      T2::reference t2=T2( t1 );
   } while( !F2( t2 ) );
};

With templates, we can only create classes and their functions, not arbitrary code blocks. With functions in C++, without exceptions, each function can only have a single return path. It cannot "tell" the calling function anything but its return value and out parameters. To act on that, the caller must check them, which is the extra cost.

If we don't allow exceptions, this nesting is out, because we cannot implement the "return".

It is pretty obvious that there is only one substantial alternative, head recursion instead of tail recursion:

increment() {
   while( !base.empty() ) {
      base.increment();

      {
         T1::reference t1=T1( *base );
         if( F1( t1 ) ) {
            T2::reference t2=T2( t1 );
            if( F2( t2 ) ) {
               // same problem, two exits out of this block
               return;
            }
         }
      }
   }
};

It first looks like we have the same problem, and strictly speaking, we have. But since there is no more code in the outer blocks after the return, we can cut down the overhead to a single check for the whole chain:

increment() {
   bool bOk;
   do {
      if( base.empty() ) return; // one path out
      base.increment();

      {
         T1::reference t1=T1( *base );
         if( bOk=F1( t1 ) ) {
            T2::reference t2=T2( t1 );
            bOk=F2( t2 ); // we save a check here, but if any of the blocks further out skip, we pay a single check
         }
      }
   } while( !bOk ); // check is overhead
};

For our special case

... F2( T2 ( F1( T1( R ) ) ) )

we can now reduce the overhead to a single test. I think this is what you, Steven, had in mind.

What kind of function chains could we apply this optimization to? You can see that the basic structure is to generate an element and then have a function that checks only on the basis of this element and internal state (which in this picture would be static variables in the respective block) whether to let this element pass. In other words, we can handle a chain of functions that transforms each input element of one type into zero or one output elements of another type.

transform, filter and strided are like that. difference and union are not, because they have two inputs. unique is almost like that, but needs to have access to the last input element that it let pass. It may be worthwhile to store this element internally, but let's not get hung up on details.

An expression template to generate this code would at the leaves of the function tree separate chains of the type of functions we can support and then generate head recursion for them. It would only work starting at leaves. For everything downstream (to the root of the tree) from a non-compliant function like difference_range the optimization could not be applied, and we would need to generate normal tail recursion like we now do when stacking iterators.

Is it worthwhile? I am asking for your opinion.

The alternative, IMO, would be to stick to the standard tail recursion, and make it as easy as possible for the compiler to optimize the extra checks away. Instead of an increment that throws, it would return whether it reached the end. The base implementation would look like this:

class associated_range {
   ...

   bool try_increment() {
      if( empty() ) return true;
      increment();
      return false;
   }
};

We would call it with

#define increment_or_exit( range ) { if( range.try_increment() ) return true; }

Hopefully, if the compiler encounters this in each function in the chain, it would drop the check.

filter_range would then look like this:

class< Base >
class filter_range {
   void try_increment() {
      for(;;) {
         increment_or_exit( base );
         Base::reference t=base.dereference();
         if( pred( t ) ) return false;
      }
   }

   void increment() {
      try_increment();
   }

   reference dereference() const {
      return base.dereference();
   }
};

If we ever get super-fast exceptions, this implementation can easily be turned into one that takes advantage of them.

Arno

--
Dr. Arno Schoedl · aschoedl_at_[hidden] 
Technical Director 
 
think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany 
http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091
Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk