Hi,

I was trying to understand the algorithm inside the main divide routine. I struggled with this bit of code:

 

      if ((prem[r_order] <= py[y_order]) && (r_order > 0))

      {

         double_limb_type a = (static_cast<double_limb_type>(prem[r_order]) << CppInt1::limb_bits) | prem[r_order - 1];

         double_limb_type b = py[y_order];

         double_limb_type v = a / b;

         if (v <= CppInt1::max_limb_value)

         {

            guess = static_cast<limb_type>(v);

            --r_order;

         }

      }

I was trying to understand why we essential set guess to 1 in the case that the division is too big.

I finally realized that the only way we can have v > CppInt1::max_limb_value is if we have prem[r_order] == py[y_order]. Some simple inequalities with floor can convince you of this.

So in the case of equality we do an expensive divide and test when we could move the test outside like this:

 

      limb_type guess = 1;

      if (r_order == 0) { ß------ do this first to simplify

          guess = prem[0] / py[y_order];

      } else if (prem[r_order] == py[y_order]) { ß--------- handle it here to avoid the divide

          guess = 1;

This leads to another question. Why is this line:

guess = prem[0] / py[y_order];

not this line:

guess = prem[0] / py[0];

 

Logically for division it must be the second line. In fact this loop has the invariant that r_order >= y_order. That makes sense for division. So the divisor must be py[0]. That suggests a divisor with one element and we handle that case via a special divide. So that code is not needed I believe.  We end up with this:

 

      BOOST_ASSERT(r_order >= y_order && y_order >= 1);

      if (prem[r_order] == py[y_order]) {

          guess = 1;

      } else if (prem[r_order] < py[y_order]) {

         double_limb_type a = (static_cast<double_limb_type>(prem[r_order]) << CppInt1::limb_bits) | prem[r_order - 1];

         double_limb_type b = py[y_order];

         double_limb_type v = a / b;

         BOOST_ASSERT(v <= CppInt1::max_limb_value);

         guess = static_cast<limb_type>(v);

         --r_order;

      } else {

         double_limb_type a = (static_cast<double_limb_type>(prem[r_order]) << CppInt1::limb_bits) | prem[r_order - 1];

         double_limb_type b = (y_order > 0) ? (static_cast<double_limb_type>(py[y_order]) << CppInt1::limb_bits) | py[y_order - 1] : (static_cast<double_limb_type>(py[y_order]) << CppInt1::limb_bits);

         BOOST_ASSERT(b);

         double_limb_type v = a / b;

         guess              = static_cast<limb_type>(v);

      }

      BOOST_ASSERT(guess); // If the guess ever gets to zero we go on forever....

 

 

Some simple random divide tests seem to show a little over 2% improvement from doing this. Of course this could just be code layout changes.

Just thinking that the 64 bit values are random would make me think this case is rare but I guess we subtract out multiples of the divisor and cause this to happen more for reasons that are hard to understand (for me anyway).

I see the path hit a lot.

Neill.