* Reference:
    [Budd88]
      _An Apl Compiler_
      Timothy Budd
      http://web.engr.oregonstate.edu/~budd/Books/aplc/

** Section: 5.1 Expansion Vectors

  Paragraph 1 of this section in [Budd88] contains:
  
    Consider an expression, A, of rank n.  Let the vector:
      (s_1,s_2,...,s_n)
    represent the shape of A.  The vector:
      (e_1,e_2,...,e_n)
    defined by the recursive equations:
  
      e_n = 1
      e_i = s_(i+1)*e_(i+1)                               (5.1)
      
    is called the 'expansion vector' for A.  The value e_i is equal to
    the distance between adjacent elements along dimension i in the
    ravel ordering of A.
    
  Here, 'ravel ordering of A' means, in essense, the actual ordering
  in memory. 

** Section: 5.7 Inner Product and Decode

  Paragraph 1 of this section contains:
  
    let L_1,L_2,...,L_n represent the size of the left argument, and
    R_1,R_2,...,R_m represent the size of the right argument.  It must
    be true that L_n == R_1.  The shape of the result is given by...
    L_1,L_2,...,L_n-1,R_2,...,R_m.
    
  On p. 79 of [Budd88], there's this pseudo-code:
  
    _Shape Phase_
    
      L = shape of right argument in first position.
      e_1 = expansion vector of right argument in first position.
        
    _Value Phase_
    
      offset'_L = (offset div e_1)*L+(L-1)
      offset'_R = (offset mod e_1)+(L-1)*e_1
      result = identity for first function(+, for example)
      _for_ counter=L-1 _to_ 0 _by_ -1 _do_ _begin_
        { compute value_L of left subexpresssion}
        { compute value_R of right subexpression}
        result = value_L*value_R+result
        offset'_L = offset'_L -1
        offset'_R = offset'_R -e_1
        _end_
   
* Translation of [Budd88] inner_product pseudo to code for any storage
  order. 

  The 'expansion vector' of section 5.1 of [Budd88] corresponds to
  what's called a.strides() in boost.multi_array:
  
    http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/reference.html#id836010
   
  the difference being a.strides() need not be monotonically dereasing
  in size as the index increases, as is true with the expansion
  vectors described by the equation (5.1) of [Budd88] as described
  above. 
    
  The value, e_1 is described in the 2nd paragraph of section 5.7 as:
  
    the size of the right subexpression 
    without the first dimension
    
  The size of the right argument is:
  
    e_0 = R_1*R_2*R_3*...*R_m
    
  and, according to equation(5.1) e_1 would be:
  
    e_1 =     R_2*R_3*...*R_m
    
  and, in this case, e_1 would be "the size of the right subexpression
  without the first dimension".  However, this is only true if e_1 is
  calculated last, and, for example, in the case of
  fortrans_storage_order, e-1 is calculated 1st and equal to 1.
  
  To account for storage orders different from the c_storage_order, a
  new value must be substittued for e_1.  That value being:
  
    space_1_R
      =a_right.num_elements()/length_common
      
  However, that value is substituted for e_1 only where the 'div' and
  'mod' operations occur:
  
      offset'_L = (offset div space_1_R)*L+(L-1)
      offset'_R = (offset mod space_1_R)+(L-1)*e_1
  
  Elsewhere, it remains OK since elsewhere
  it's used to calculate the change in offset'_R and there, it has the
  right meaning, as descrition in section 5.1:
  
    the distance between adjacent elements along dimension i
    
  The offset'_L calculation needs to be modified also, because, unlike
  with c_storage_order, where e_n==1(see equation (5.1)), the actual
  stride for the n-th dimension may be >1.  Thus, let:
  
    stride_n_L = stride for the n-th dimension of left argument.
    
  the the calculations for offset'_L need to be changed as follows:
  
      offset'_L = (offset div space_1_R)*L+(L-1)*stride_n_L
        offset'_L = offset'_L -stride_n_L
        
  To be consistent, rename e_1 to stride_1_R.  Then, with these
  changes, the 'Value Phase' code becomes:
      
    _Value Phase_any_storage_order_
    
      offset'_L = (offset div space_1_R)*L+(L-1)*stride_n_L
      offset'_R = (offset mod space_1_R)  +(L-1)*stride_1_R
      result = identity for first function(+, for example)
      _for_ counter=L-1 _to_ 0 _by_ -1 _do_ _begin_
        { compute value_L of left subexpresssion}
        { compute value_R of right subexpression}
        result = value_L*value_R+result
        offset'_L = offset'_L -stride_n_L
        offset'_R = offset'_R -stride_1_R
        _end_