Results for low dimmension.  More data would exceed mailing list limits:

#  m original: t1   MFLOPS original: t1    MFLOPS   Diff nrm3  gemm:   t2    MFLOPS   Diff nrm4 mixed:   t2    MFLOPS   Diff nrm5
  1   2.1263e-07   9.40601  1.32802e-07     15.06           0 1.36006e-07   14.7052           0 6.31318e-07   3.16798           0
  2  2.28189e-07   70.1173  1.37767e-07   116.138           0 1.59801e-07   100.125           0   6.653e-07   24.0493           0
  3    2.649e-07   203.851   1.5541e-07   347.468           0 1.54267e-07   350.042           0 6.98766e-07   77.2791           0
  4  3.35269e-07   381.783   2.4183e-07   529.297           0 2.12891e-07   601.247           0 6.65688e-07   192.282           0
  5  3.53868e-07   706.478  2.30977e-07   1082.36           0 2.53215e-07   987.303           0 7.07933e-07   353.141           0
  6   4.2987e-07   1004.95  2.59713e-07   1663.37           0 2.54867e-07      1695           0 8.17448e-07   528.474           0
  7  5.39621e-07   1271.26  4.51043e-07   1520.92 7.98975e-09 5.12948e-07   1337.37 7.98975e-09 8.76363e-07   782.781           0
  8  6.18993e-07    1654.3  6.38988e-07   1602.53 4.17673e-09 6.37931e-07   1605.19 4.17673e-09 8.92556e-07   1147.27           0
  9  7.73683e-07   1884.49  7.26336e-07   2007.34 3.30697e-09 8.00656e-07   1821.01 3.30697e-09 1.09762e-06   1328.33           0
 10  9.27569e-07   2156.17  8.31827e-07   2404.35 1.94317e-09 8.72131e-07   2293.23 1.94317e-09 1.16572e-06   1715.68           0
 11  1.13882e-06    2337.5  1.03275e-06   2577.58 1.27501e-09 1.08775e-06   2447.25 1.27501e-09 1.16439e-06   2286.17           0
 12  1.26427e-06   2733.59  1.40013e-06   2468.34 8.50076e-10 1.39562e-06   2476.32 8.50076e-10 1.01202e-06   3414.97           0
 13   1.5751e-06   2789.66  1.64811e-06   2666.09 5.39864e-10 1.66862e-06   2633.32 5.39864e-10 1.61517e-06   2720.45           0
 14  1.79595e-06   3055.77  1.89937e-06   2889.37 4.08632e-10  1.6485e-06   3329.09           0 1.65016e-06   3325.73           0
 15  2.14056e-06   3153.37  2.24248e-06   3010.06 2.73316e-10  1.6875e-06   3999.99           0 1.80164e-06   3746.59           0
 16  2.38996e-06   3427.67  2.63386e-06   3110.27 2.30152e-10 1.74627e-06   4691.14           0 1.91648e-06   4274.49           0
 17  2.93315e-06   3349.98  3.08031e-06   3189.94 1.85538e-10 2.17697e-06   4513.62           0 2.13505e-06   4602.23           0
 18   3.3771e-06   3453.85  3.23863e-06   3601.52 1.20251e-10 2.23225e-06   5225.23           0 2.36877e-06   4924.07           0
 19  4.19699e-06   3268.53  4.02621e-06   3407.17 1.07796e-10 2.29651e-06    5973.4           0 2.44714e-06   5605.72           0
 20  4.27777e-06   3740.27  4.86115e-06    3291.4 8.37665e-11 2.26798e-06   7054.74           0 2.44016e-06   6556.96           0
 21  5.58038e-06   3319.13  5.51606e-06   3357.83 5.93714e-11 2.61705e-06   7077.43           0 2.90197e-06   6382.56           0
 22  5.46208e-06   3898.88  5.50258e-06   3870.19 5.76987e-11 2.85448e-06   7460.56           0 3.09923e-06   6871.39           0
 23  7.26813e-06   3348.04  6.48407e-06   3752.89 4.47169e-11 3.03986e-06   8004.98           0 3.16566e-06   7686.86           0
 24  6.56421e-06   4211.93  7.20581e-06    3836.9 3.61275e-11 2.84288e-06   9725.35           0 2.81577e-06   9818.99           0
 25  7.97135e-06   3920.29  7.80654e-06   4003.06 3.02957e-11 4.04575e-06   7724.16           0 4.15001e-06   7530.11           0
 26  8.59272e-06    4090.9  8.46934e-06    4150.5 2.53217e-11  4.1795e-06   8410.58           0 4.36958e-06   8044.71           0
 27  1.05527e-05   3730.41  9.66865e-06   4071.51 1.97479e-11 4.24268e-06   9278.57           0 4.64476e-06   8475.37           0
 28  9.77679e-06   4490.63   1.0918e-05   4021.26 1.71505e-11 4.41728e-06   9939.14           0 4.55165e-06   9645.73           0
 29  1.23574e-05   3947.28  1.15308e-05   4230.22 1.54399e-11 4.96383e-06   9826.69           0 5.27042e-06   9255.05           0
 30  1.25312e-05   4309.24  1.23192e-05    4383.4 1.38837e-11 5.36616e-06   10063.1           0 5.57707e-06   9682.51           0
 31  1.41019e-05   4225.11  1.41554e-05   4209.15 1.12822e-11 5.56749e-06   10701.8           0 5.87983e-06   10133.3           0
 32  1.44935e-05   4521.76  1.74419e-05   3757.38  9.5502e-12 5.91291e-06   11083.5           0 6.07622e-06   10785.7           0
 33  1.68922e-05   4254.86  1.62224e-05   4430.55 8.00562e-12 6.51645e-06   11029.6           0 6.62821e-06   10843.7           0
 34  1.73001e-05    4543.8  1.68924e-05   4653.46 7.54927e-12 6.83433e-06   11501.9           0 6.95343e-06   11304.9           0
 35  2.07166e-05    4139.2  2.15962e-05   3970.61 6.52939e-12 7.06462e-06   12137.9           0 7.53811e-06   11375.5           0
 36  1.98326e-05   4704.97  2.13473e-05   4371.14 5.68874e-12   6.703e-06   13920.9           0 6.99365e-06   13342.4           0
 37   2.3838e-05   4249.78  2.23655e-05   4529.56 5.11318e-12 8.87253e-06   11417.9           0 9.13862e-06   11085.5           0
 38  2.35903e-05   4652.09  2.48122e-05   4422.99 4.71306e-12 9.24238e-06     11874           0 9.27922e-06   11826.9           0
 39  2.79913e-05   4238.39  2.64576e-05   4484.09 4.20714e-12 9.68511e-06   12249.5           0 9.95689e-06   11915.2           0
 40  2.60131e-05    4920.6   2.9098e-05   4398.93 3.42002e-12 9.80308e-06   13057.1           0 1.04198e-05   12284.3           0
 41  3.13419e-05   4398.01  3.03942e-05   4535.14 3.13757e-12 1.07587e-05   12812.2           0 1.10016e-05   12529.3           0
 42  3.10015e-05   4779.64  3.20343e-05   4625.54 2.91245e-12 1.09989e-05   13471.9           0 1.16031e-05   12770.4           0
 43   3.6527e-05   4353.33  3.49908e-05   4544.46 2.71446e-12 1.13164e-05   14051.7           0 1.20516e-05   13194.4           0
 44  3.36654e-05   5060.62  3.86435e-05   4408.71 2.49076e-12 1.16151e-05   14667.8           0 1.21377e-05   14036.2           0
 45  3.95282e-05   4610.63  3.98562e-05   4572.69 2.12037e-12 1.26784e-05   14374.8           0 1.32723e-05   13731.6           0
 46  3.96351e-05    4911.6  4.17105e-05   4667.22 1.96734e-12 1.27302e-05   15292.2           0 1.34346e-05   14490.3           0
 47  4.63424e-05   4480.69  4.50811e-05   4606.05 1.77515e-12 1.33133e-05   15596.9           0 1.39354e-05   14900.7           0
 48  4.31748e-05   5122.99   5.0325e-05   4395.11 1.75073e-12 1.32491e-05   16694.3           0  1.3501e-05   16382.8           0
 49  4.93001e-05   4772.77  5.11402e-05   4601.03 1.48788e-12  1.6222e-05   14504.9           0 1.72531e-05     13638           0


First group is legacy axpy_prod(), second group is legacy prod(), third group is legacy prod for low dimensions, and gemm() for high dimmension. fourth group is gemm().

As the legacy version is expression template based, it can possibly provide some further advantages when the operations are chained.

I put some defines in place, that would make possible to force the legacy version as the default, as opposed to the runtime switched version.

Imre


On Friday, 11 March 2016, 14:21, Nasos Iliopoulos <nasos_i@hotmail.com> wrote:


Regardless, these are great figures.

Can you please run them comparing the simple uBlas implementation for matrices from 2 to 100 with the gemm based one with a signle thread? I wonder when the control statement starts to play a role.

What do you think should be the plan to switch from multi-core to to single-threaded so as to not get all the communication hit for smaller matrices?


- Nasos