Hi Boost Users,

I use Boost.Multiprecision's wrappers for MPFR, particularly the boost::multiprecision::mpfr_float.  I've had good success with this library and its types so far, enjoying performance gains because of expression templates, and easy access to mpfr and gmp for floats, ints, and rationals.  However, in my complex number type, I am struggling with named real temporaries in arithmetic operators.  I am looking for help with multiplication, and I can generalize to all other operations needing a real temporary later.  Here's my current code for multiplication, implemented in terms of *=.

/**
Complex multiplication.  uses a single temporary variable
1 temporary, 4 multiplications
*/
complex& operator*=(const complex & rhs)
{
mpfr_float re_cache(real_*rhs.real_ - imag_*rhs.imag_); // cache the real part of the result
imag_ = real_*rhs.imag_ + imag_*rhs.real_;
real_ = re_cache;
return *this;
}

I am using the naive evaluation formula, for better or worse.   But the particular method of evaluation doesn't matter to me right now, it's the temporary variable re_cache used to cache the real part is killing me.  Profiling using Allinea, gprof, and Callgrind reveals that a significant percentage of time in the program is consumed creating and destroying re_cache, as well as other temporaries scattered throughout my code.

In an effort to eliminate heap allocations, I first tried replacing my using mpfr_float = boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0,boost::multiprecision::allocate_dynamic>, boost::multiprecision::et_off>;  with allocate_stack, just for these named temporary variables.

However, the documentation states that allocate_stack only works in fixed precision, so the 0 digits indicating variable precision should cause problems?  I expected 0 digits to fail to compile, but compile it does.  The conversion from the type with allocate_stack to allocate_dynamic is not provided by the = operator, so I can write re_cache.convert_to<mpfr_float_dynamic>().  But using variable precision with stack allocation appears to cause all sorts of problems.  Everything depending on complex multiplication in my program breaks, indicated by massive failures in my test suite.  Thus, I currently regard allocate_stack as a non-solution. 

My question fundamentally regards how to deal with re_cache and the *= operator for complex multiplication.  Anyone have ideas on how to get rid of constant heap de/allocation of re_cache without inducing a ton more arithmetic?  Using a static for it is not a solution, both because I anticipate multithreading in this application in the future, and the fact that precision will vary though the run, so I'd have to check the precision of re_cache on every evaluation.  The C version of the program I am re-implementing used OpenMP for threads, and used a thread-id-indexed global for re_cache.  That solution then forces OpenMP onto anyone wanting to use the complex class in multiple threads.  Hence, I view this as a non-solution, too.

Again, my initial thought was 'heap allocation is the problem here, so I'll use the stack allocated backend'.  But variable precision doesn't appear to work with allocate_stack.  And statics are no good.  

Any thoughts?

Thanks for your time,
Daniel Brake
Postdoc
University of Notre Dame
Applied and Computational Math and Stats