Boost logo

Boost :

Subject: Re: [boost] non-branching min()
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-09-10 13:41:02


DE wrote:
> i'd like to present some kind of micro optimization towards pipelined
> cpus
>
> traditionaly min() has an implementation like
>
> template<typename type> inline
> type min(type a, type b) { return a<b ? a : b; }
>
> branch free version
>
> template<typename type> inline
> type min(type a, type b)
> {
> const type c[2] = {a, b};
> return c[b<a];
> }

Hmm, I brought this exact idea to boost about three years ago. I had min, abs and median of three.

Here is my median of three implmentation

template <class T>
inline const T& median(const T& a, const T& b, const T& c) {
  const bool alb = a < b;
  const bool blc = b < c;
  const bool alc = a < c;
  const T* input[3] = {&a, &b, &c};
  unsigned int index = 0; //default is to return a
  index += (alb & blc) | (!alb & !blc); //return b
  index += (unsigned int)((alc & !blc) | (!alc & blc)) << 1;
  return *(input[index]);
}

No branches required.

You can improve the efficiency for larger types by using arrays of pointer to argument instead of arrays of argument value.

In my own testing the speedup for prescott was 25% for min and 45% for median of three. I would expect less speedup than that in core2 processors because the pipeline is shorter. Also there is a new instruction in core2 that allows the min to compile to a branchless instruction stream. You need to use the proper flags with the compiler to enable the use of new instructions. If you are compiling in max compatibility mode you are dropping performance on the floor. In our testing branch free min (as you propose) was no faster than a < b ? a:b; when compiling with the new insturctions enabled for a core2 machine. Median of three was still 25% faster though.

The response from boost three years ago was that such low level architecture specific optimizations have no place in portable libraries.

Hope that helps,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk