Boost logo

Boost Users :

Subject: Re: [Boost-users] mpfr_float stack allocation
From: John Maddock (jz.maddock_at_[hidden])
Date: 2016-05-10 13:02:01

> In an effort to eliminate heap allocations, I first tried replacing my
> using mpfr_float
> = boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0,boost::multiprecision::allocate_dynamic>,
> boost::multiprecision::et_off>; with allocate_stack, just for these
> named temporary variables.

Shouldn't you be using et_on based on your comments above?

In any case you've found a bug, using
mpfr_float_backend<0,boost::multiprecision::allocate_stack> creates a
one-bit float, which I suspect is not what you wanted ;) I'll fix that
so that it causes a compiler error.

> However, the documentation
> <>
> states that allocate_stack only works in fixed precision, so the 0
> digits indicating variable precision should cause problems? I
> expected 0 digits to fail to compile, but compile it does. The
> conversion from the type with allocate_stack to allocate_dynamic is
> not provided by the = operator, so I can write
> re_cache.convert_to<mpfr_float_dynamic>(). But using variable
> precision with stack allocation appears to cause all sorts of
> problems. Everything depending on complex multiplication in my
> program breaks, indicated by massive failures in my test suite. Thus,
> I currently regard allocate_stack as a non-solution.
> My question fundamentally regards how to deal with re_cache and the*=
> operator for complex multiplication. Anyone have ideas on how to get
> rid of constant heap de/allocation of re_cache without inducing a ton
> more arithmetic? Using a static for it is not a solution, both
> because I anticipate multithreading in this application in the future,
> and the fact that precision will vary though the run, so I'd have to
> check the precision of re_cache on every evaluation. The C version of
> the program I am re-implementing used OpenMP for threads, and used a
> thread-id-indexed global for re_cache. That solution then forces
> OpenMP onto anyone wanting to use the complex class in multiple
> threads. Hence, I view this as a non-solution, too.
> Again, my initial thought was 'heap allocation is the problem here, so
> I'll use the stack allocated backend'. But variable precision doesn't
> appear to work with allocate_stack. And statics are no good.
> Any thoughts?

Yes, but none you may like.

* You could use Boost.Thread for thread-local statics.
* In C++11 you can use thread_local storage, see
* For pre-C++11 you could use __thread or __declspec(thread) in a
non-portable compiler-specific way.
* If this was C99 you could use a variable-length-array and initialize a
temporary mpfr_t yourself.
* Ditto, but using alloca (this does work in some C++ implementations
but not all).
* You could use a temporary buffer big enough for the largest precision
you ever use, and use that to initialize an mpfr_t yourself. Of course
this may run you out of stack space ;)

And finally... you may not see as much speedup as you expect unless the
precision is low - for any significant number of digits the cost of the
arithmetic typically far outweighs everything else.

HTH, John.

Boost-users list run by williamkempf at, kalb at, bjorn.karlsson at, gregod at, wekempf at