Boost-Commit :

Date view	Thread view	Subject view	Author view

Subject: [Boost-commit] svn:boost r75324 - in sandbox/big_number: boost/multiprecision libs/multiprecision/doc libs/multiprecision/performance
From: john_at_[hidden]
Date: 2011-11-05 14:22:20

Author: johnmaddock
Date: 2011-11-05 14:22:18 EDT (Sat, 05 Nov 2011)
New Revision: 75324
URL: http://svn.boost.org/trac/boost/changeset/75324

Log:
Add performance test comparison.
Update docs some more.
Remove dead code in mp_number.hpp.
Added:
   sandbox/big_number/libs/multiprecision/performance/
   sandbox/big_number/libs/multiprecision/performance/sf_performance.cpp (contents, props changed)
Text files modified:
   sandbox/big_number/boost/multiprecision/mp_number.hpp | 9 --
   sandbox/big_number/libs/multiprecision/doc/multiprecision.qbk | 156 +++++++++++++++++++++++++++++++++++++++
   2 files changed, 155 insertions(+), 10 deletions(-)

Modified: sandbox/big_number/boost/multiprecision/mp_number.hpp
==============================================================================
--- sandbox/big_number/boost/multiprecision/mp_number.hpp (original)
+++ sandbox/big_number/boost/multiprecision/mp_number.hpp 2011-11-05 14:22:18 EDT (Sat, 05 Nov 2011)
@@ -545,15 +545,6 @@
       using default_ops::subtract;
       subtract(m_backend, canonical_value(e.left().value()), canonical_value(e.right().value()));
    }
- /*
- template <class Exp>
- void do_assign(const Exp& e, const detail::subtract_and_negate_immediates&)
- {
- using default_ops::subtract;
- subtract(m_backend, canonical_value(e.left().value()), canonical_value(e.right().value()));
- m_backend.negate();
- }
- */
    template <class Exp>
    void do_assign(const Exp& e, const detail::multiply_immediates&)
    {

Modified: sandbox/big_number/libs/multiprecision/doc/multiprecision.qbk
==============================================================================
--- sandbox/big_number/libs/multiprecision/doc/multiprecision.qbk (original)
+++ sandbox/big_number/libs/multiprecision/doc/multiprecision.qbk 2011-11-05 14:22:18 EDT (Sat, 05 Nov 2011)
@@ -24,11 +24,165 @@

[section:intro Introduction]

-The Multiprecision library comes in two distinct parts: an expression template enabled front end `mp_number`
+The Multiprecision library comes in two distinct parts: an expression-template-enabled front end `mp_number`
that handles all the operator overloading, expression evaluation optimization, and code reduction, and
a selection of backends that implement the actual arithmetic operations, and need conform only to the
reduced interface requirements of the front end.

+The library is often used by using one of the predfined typedefs: for example if you wanted an arbitrary precision
+integer type using GMP as the underlying implementation then you could use:
+
+ #include <boost/multiprecision/gmp.hpp> // Defines the wrappers around the GMP library's types
+
+ boost::multiprecision::mpz_int myint; // Arbitrary precision integer type.
+
+Alternatively one can compose your own multiprecision type, by combining `mp_number` with one of the
+predefined backend types. For example, suppose you wanted a 300 decimal digit floating point type
+based on the MPFR library, in this case there's no predefined typedef with that level of precision,
+so instead we compose our own:
+
+ #include <boost/multiprecision/mpfr.hpp> // Defines the Backend type that wraps MPFR
+
+ namespace mp = boost::multiprecision; // Reduce the typing a bit later...
+
+ typedef mp::mp_number<mp::mpfr_float_backend<300> > my_float;
+
+ my_float a, b, c; // These variables have 300 decimal digits precision
+
+[h4 Expression Templates]
+
+Class `mp_number` is expression-template-enabled: that means that rather than having a multiplication
+operator that looks like this:
+
+ template <class Backend>
+ mp_number<Backend> operator * (const mp_number<Backend>& a, const mp_number<Backend>& b)
+ {
+ mp_number<Backend> result(a);
+ result *= b;
+ return result;
+ }
+
+Instead the operator looks more like this:
+
+ template <class Backend>
+ ``['unmentionable-type]`` operator * (const mp_number<Backend>& a, const mp_number<Backend>& b);
+
+Where the "unmentionable" return type is an implementation detail that, rather than containing the result
+of the multiplication, contains instructions on how to compute the result. In effect it's just a pair
+of references to the arguments of the function, plus some compile-time information that stores what the operation
+is.
+
+The great advantage of this method is the ['elimination of temporaries]: for example the "naive" implementation
+of `operator*` above, requires one temporary for computing the result, and at least another one to return it. It's true
+that sometimes this overhead can be reduced by using move-semantics, but it can't be eliminated completely. For example,
+lets suppose we're evaluating a polynomial via Horners method, something like this:
+
+ T a[7] = { /* some values */ };
+ //....
+ y = (((((a[6] * x + a[5]) * x + a[4]) * x + a[3]) * x + a[2]) * x + a[1]) * x + a[0];
+
+If type `T` is an `mp_number`, then this expression is evaluated ['without creating a single temporary value], in contrast
+if we were using the C++ wrapper that ships with GMP - `mpf_class` - then this expression would result in no less than 11
+temporaries (this is true even though mpf_class does use expression templates to reduce the number of temporaries somewhat). Had
+we used an even simpler wrapper around GMP or MPFR like `mpclass` things would have been even worse and no less that 24 temporaries
+are created for this simple expression (note - we actually measure the number of memory allocations performed rather than
+the number of temporaries directly).
+
+This library also extends expression template support to standard library functions like `abs` or `sin` with `mp_number`
+arguments. This means that an expression such as:
+
+ y = abs(x);
+
+can be evaluated without a single temporary being calculated. Even expressions like:
+
+ y = sin(x);
+
+get this treatment, so that variable 'y' is used as "working storage" within the implementation of `sin`,
+thus reducing the number of temporaries used by one. Of course, should you write:
+
+ x = sin(x);
+
+Then we clearly can't use `x` as working storage during the calculation, so then a temporary variable
+is created in this case.
+
+Given the comments above, you might be forgiven for thinking that expression-templates are some kind of universal-panacea:
+sadly though, all tricks like this have their downsides. For one thing, expression template libraries
+like this one, tend to be slower to compile than their simpler cousins, they're also harder to debug
+(should you actually want to step through our code!), and rely on compiler optimizations being turned
+on to give really good performance. Also since the return type from expressions involving `mp_number`'s
+is an "unmentionable implementation detail", you have to be careful to cast the result of an expression
+to the actual number type when passing an expression to a template function. For example given:
+
+ template <class T>
+ void my_proc(const T&);
+
+Then calling:
+
+ my_proc(a+b);
+
+Will very likely result in obscure error messages inside the body of `my_proc` - since we've passed it
+an expression template type, and not a number type. Instead we probably need:
+
+ my_proc(my_mp_number_type(a+b));
+
+Having said that, these situations don't occur that often - or indeed not at all for non-template functions.
+In addition all the functions in the Boost.Math library will automatically convert expression-template arguments
+to the underlying number type without you having to do anything, so:
+
+ mpfr_float_100 a(20), delta(0.125);
+ boost::math::gamma_p(a, a + delta);
+
+Will work just fine, with the `a + delta` expression template argument getting converted to an `mpfr_float_100`
+internally by the Boost.Math library.
+
+One other potential pitfall that's only possible in C++11: you should never store an expression template using:
+
+ auto my_expression = a + b - c;
+
+Unless you're absolutely sure that the lifetimes of `a`, `b` and `c` will outlive that of `my_expression`.
+
+And finally.... the performance improvements from an expression template library like this are often not as
+dramatic as the reduction in number of temporaries would suggest. For example if we compare this library with
+`mpfr_class` and `mpreal`, with all three using the underlying MPFR library at 50 decimal digits precision then
+we see the following typical results for polynomial execution:
+
+[table Evaluation of Order 6 Polynomial.
+[[Library][Relative Time][Relative number of memory allocations]]
+[[mp_number][1.0 (0.00793s)][1.0 (2996 total)]]
+[[mpfr_class][1.2 (0.00931s)][4.3 (12976 total)]]
+[[mpreal][1.9 (0.0148s)][9.3 (27947 total)]]
+]
+
+As you can see the execution time increases a lot more slowly than the number of memory allocations. There are
+a number of reasons for this:
+
+* The cost of extended-precision multiplication and division is so great, that the times taken for these tend to
+swamp everything else.
+* The cost of an in-place multiplication (using `operator*=`) tends to be more than an out-of-place
+`operator*` (typically `operator *=` has to create a temporary workspace to carry out the multiplication, where
+as `operator*` can use the target variable as workspace). Since the expression templates carry out their
+magic by converting out-of-place operators to in-place ones, we necessarily take this hit. Even so the
+transformation is more efficient than creating the extra temporary variable, just not by as much as
+one would hope.
+
+We'll conclude this section by providing some more performance comparisons between these three libraries,
+again, all are using MPFR to carry out the underlying arithmetic, and all are operating at the same precision
+(50 decimal places):
+
+[table Evaluation of Boost.Math's Bessel function test data
+[[Library][Relative Time][Relative Number of Memory Allocations]]
+[[mp_number][1.0 (6.21s)][1.0 (2685469)]]
+[[mpfr_class][1.04 (6.45s)][1.47 (3946007)]]
+[[mpreal][1.53 (9.52s)][4.92 (13222940)]]
+]
+
+[table Evaluation of Boost.Math's Non-Central T distribution test data
+[[Library][Relative Time][Relative Number of Memory Allocations]]
+[[mp_number][1.0 (269s)][1.0 (139082551)]]
+[[mpfr_class][1.04 (278s)][1.81 (252400791)]]
+[[mpreal][1.49 (401s)][3.22 (447009280)]]
+]
+
[endsect]

[section:tut Tutorial]

Added: sandbox/big_number/libs/multiprecision/performance/sf_performance.cpp
==============================================================================
--- (empty file)
+++ sandbox/big_number/libs/multiprecision/performance/sf_performance.cpp 2011-11-05 14:22:18 EDT (Sat, 05 Nov 2011)
@@ -0,0 +1,321 @@
+///////////////////////////////////////////////////////////////
+// Copyright 2011 John Maddock. Distributed under the Boost
+// Software License, Version 1.0. (See accompanying file
+// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_
+
+#define BOOST_CHRONO_HEADER_ONLY
+#define BOOST_MATH_MAX_ROOT_ITERATION_POLICY 500
+
+#include <boost/math/bindings/mpfr.hpp>
+#include <boost/math/bindings/mpreal.hpp>
+#include <boost/multiprecision/mpfr.hpp>
+#include <boost/math/special_functions/bessel.hpp>
+#include <boost/math/tools/rational.hpp>
+#include <boost/math/distributions/non_central_t.hpp>
+#include <boost/chrono.hpp>
+#include <boost/array.hpp>
+
+template <class Real>
+Real test_bessel();
+
+template <class Clock>
+struct stopwatch
+{
+ typedef typename Clock::duration duration;
+ stopwatch()
+ {
+ m_start = Clock::now();
+ }
+ duration elapsed()
+ {
+ return Clock::now() - m_start;
+ }
+ void reset()
+ {
+ m_start = Clock::now();
+ }
+
+private:
+ typename Clock::time_point m_start;
+};
+
+template <class Real>
+Real test_bessel()
+{
+# define T double
+# include "libs/math/test/bessel_i_int_data.ipp"
+# include "libs/math/test/bessel_i_data.ipp"
+
+ Real r;
+
+ for(unsigned i = 0; i < bessel_i_int_data.size(); ++i)
+ {
+ r += boost::math::cyl_bessel_i(Real(bessel_i_int_data[i][0]), Real(bessel_i_int_data[i][1]));
+ }
+ for(unsigned i = 0; i < bessel_i_data.size(); ++i)
+ {
+ r += boost::math::cyl_bessel_i(Real(bessel_i_data[i][0]), Real(bessel_i_data[i][1]));
+ }
+
+#include "libs/math/test/bessel_j_int_data.ipp"
+ for(unsigned i = 0; i < bessel_j_int_data.size(); ++i)
+ {
+ r += boost::math::cyl_bessel_j(Real(bessel_j_int_data[i][0]), Real(bessel_j_int_data[i][1]));
+ }
+
+#include "libs/math/test/bessel_j_data.ipp"
+ for(unsigned i = 0; i < bessel_j_data.size(); ++i)
+ {
+ r += boost::math::cyl_bessel_j(Real(bessel_j_data[i][0]), Real(bessel_j_data[i][1]));
+ }
+
+#include "libs/math/test/bessel_j_large_data.ipp"
+ for(unsigned i = 0; i < bessel_j_large_data.size(); ++i)
+ {
+ r += boost::math::cyl_bessel_j(Real(bessel_j_large_data[i][0]), Real(bessel_j_large_data[i][1]));
+ }
+
+#include "libs/math/test/sph_bessel_data.ipp"
+ for(unsigned i = 0; i < sph_bessel_data.size(); ++i)
+ {
+ r += boost::math::sph_bessel(static_cast<unsigned>(sph_bessel_data[i][0]), Real(sph_bessel_data[i][1]));
+ }
+
+
+ return r;
+}
+
+template <class Real>
+Real test_polynomial()
+{
+ static const unsigned t[] = {
+ 2, 3, 4, 5, 6, 7, 8 };
+ Real result = 0;
+ for(Real k = 2; k < 1000; ++k)
+ result += boost::math::tools::evaluate_polynomial(t, k);
+
+ return result;
+}
+
+template <class Real>
+Real test_nct()
+{
+#define T double
+#include "libs/math/test/nct.ipp"
+
+ Real result = 0;
+ for(unsigned i = 0; i < nct.size(); ++i)
+ {
+ try{
+ result += quantile(boost::math::non_central_t_distribution<Real>(nct[i][0], nct[i][1]), nct[i][3]);
+ }
+ catch(const std::exception&)
+ {}
+ result += cdf(boost::math::non_central_t_distribution<Real>(nct[i][0], nct[i][1]), nct[i][2]);
+ }
+ return result;
+}
+
+unsigned allocation_count = 0;
+
+void *(*alloc_func_ptr) (size_t);
+void *(*realloc_func_ptr) (void *, size_t, size_t);
+void (*free_func_ptr) (void *, size_t);
+
+void *alloc_func(size_t n)
+{
+ ++allocation_count;
+ return (*alloc_func_ptr)(n);
+}
+
+void free_func(void * p, size_t n)
+{
+ (*free_func_ptr)(p, n);
+}
+
+void * realloc_func(void * p, size_t old, size_t n)
+{
+ ++allocation_count;
+ return (*realloc_func_ptr)(p, old, n);
+}
+
+int main()
+{
+ mp_get_memory_functions(&alloc_func_ptr, &realloc_func_ptr, &free_func_ptr);
+ mp_set_memory_functions(&alloc_func, &realloc_func, &free_func);
+
+ static const unsigned a[] = {
+ 2, 3, 4, 5, 6, 7, 8 };
+ boost::multiprecision::mpfr_float_50 result, x(2);
+ allocation_count = 0;
+ result = (((((a[6] * x + a[5]) * x + a[4]) * x + a[3]) * x + a[2]) * x + a[1]) * x + a[0];
+ std::cout << allocation_count << std::endl;
+ mpfr_class r, x2(2);
+ allocation_count = 0;
+ r = (((((a[6] * x2 + a[5]) * x2 + a[4]) * x2 + a[3]) * x2 + a[2]) * x2 + a[1]) * x2 + a[0];
+ std::cout << allocation_count << std::endl;
+ mpfr::mpreal r2, x3(2);
+ allocation_count = 0;
+ r2 = (((((a[6] * x3 + a[5]) * x3 + a[4]) * x3 + a[3]) * x3 + a[2]) * x3 + a[1]) * x3 + a[0];
+ std::cout << allocation_count << std::endl;
+
+ allocation_count = 0;
+ result = boost::math::tools::evaluate_polynomial(a, x);
+ std::cout << allocation_count << std::endl;
+ allocation_count = 0;
+ r = boost::math::tools::evaluate_polynomial(a, x2);
+ std::cout << allocation_count << std::endl;
+ allocation_count = 0;
+ r2 = boost::math::tools::evaluate_polynomial(a, x3);
+ std::cout << allocation_count << std::endl;
+
+
+ boost::chrono::duration<double> time;
+ stopwatch<boost::chrono::high_resolution_clock> c;
+ //
+ // 50 digits first:
+ //
+ std::cout << "Testing Bessel Functions....." << std::endl;
+ mpfr_set_default_prec(50 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(50 * 1000L / 301L);
+ c.reset();
+ test_bessel<boost::multiprecision::mpfr_float_50>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_50 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_50 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_bessel<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (50 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_bessel<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (50 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ //
+ // Then 100 digits:
+ //
+ mpfr_set_default_prec(100 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(100 * 1000L / 301L);
+ c.reset();
+ test_bessel<boost::multiprecision::mpfr_float_100>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_100 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_50 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_bessel<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_bessel<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+
+ //
+ // 50 digits first:
+ //
+ c.reset();
+ std::cout << "Testing Polynomial Evaluation....." << std::endl;
+ mpfr_set_default_prec(50 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(50 * 1000L / 301L);
+ c.reset();
+ test_polynomial<boost::multiprecision::mpfr_float_50>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_50 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_50 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_polynomial<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (50 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_polynomial<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (50 digits = " << allocation_count << std::endl;
+ allocation_count = 0;
+ //
+ // Then 100 digits:
+ //
+ mpfr_set_default_prec(100 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(100 * 1000L / 301L);
+ c.reset();
+ test_polynomial<boost::multiprecision::mpfr_float_100>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_100 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_100 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_polynomial<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_polynomial<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+
+ //
+ // 50 digits first:
+ //
+ c.reset();
+ std::cout << "Testing Non-Central T....." << std::endl;
+ mpfr_set_default_prec(50 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(50 * 1000L / 301L);
+ c.reset();
+ test_nct<boost::multiprecision::mpfr_float_50>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_50 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_50 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_nct<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (50 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_nct<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (50 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (50 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ //
+ // Then 100 digits:
+ //
+ mpfr_set_default_prec(100 * 1000L / 301L);
+ mpfr::mpreal::set_default_prec(100 * 1000L / 301L);
+ c.reset();
+ test_nct<boost::multiprecision::mpfr_float_100>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_float_100 = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_float_100 = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_nct<mpfr_class>();
+ time = c.elapsed();
+ std::cout << "Time for mpfr_class (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpfr_class (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+ c.reset();
+ test_nct<mpfr::mpreal>();
+ time = c.elapsed();
+ std::cout << "Time for mpreal (100 digits) = " << time << std::endl;
+ std::cout << "Total allocations for mpreal (100 digits) = " << allocation_count << std::endl;
+ allocation_count = 0;
+}
+

Date view	Thread view	Subject view	Author view

Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk