Results for low dimmension. More data would exceed mailing list limits:
#
m original: t1 MFLOPS original: t1 MFLOPS Diff nrm3 gemm:
t2 MFLOPS Diff nrm4 mixed: t2 MFLOPS Diff nrm5
1 2.1263e-07 9.40601 1.32802e-07 15.06 0
1.36006e-07 14.7052 0 6.31318e-07 3.16798 0
2 2.28189e-07 70.1173 1.37767e-07 116.138 0
1.59801e-07 100.125 0 6.653e-07 24.0493 0
3 2.649e-07 203.851 1.5541e-07 347.468 0
1.54267e-07 350.042 0 6.98766e-07 77.2791 0
4 3.35269e-07 381.783 2.4183e-07 529.297 0
2.12891e-07 601.247 0 6.65688e-07 192.282 0
5 3.53868e-07 706.478 2.30977e-07 1082.36 0
2.53215e-07 987.303 0 7.07933e-07 353.141 0
6 4.2987e-07 1004.95 2.59713e-07 1663.37 0
2.54867e-07 1695 0 8.17448e-07 528.474 0
7 5.39621e-07 1271.26 4.51043e-07 1520.92 7.98975e-09
5.12948e-07 1337.37 7.98975e-09 8.76363e-07 782.781 0
8 6.18993e-07 1654.3 6.38988e-07 1602.53 4.17673e-09
6.37931e-07 1605.19 4.17673e-09 8.92556e-07 1147.27 0
9 7.73683e-07 1884.49 7.26336e-07 2007.34 3.30697e-09
8.00656e-07 1821.01 3.30697e-09 1.09762e-06 1328.33 0
10
9.27569e-07 2156.17 8.31827e-07 2404.35 1.94317e-09 8.72131e-07
2293.23 1.94317e-09 1.16572e-06 1715.68 0
11
1.13882e-06 2337.5 1.03275e-06 2577.58 1.27501e-09 1.08775e-06
2447.25 1.27501e-09 1.16439e-06 2286.17 0
12
1.26427e-06 2733.59 1.40013e-06 2468.34 8.50076e-10 1.39562e-06
2476.32 8.50076e-10 1.01202e-06 3414.97 0
13
1.5751e-06 2789.66 1.64811e-06 2666.09 5.39864e-10 1.66862e-06
2633.32 5.39864e-10 1.61517e-06 2720.45 0
14
1.79595e-06 3055.77 1.89937e-06 2889.37 4.08632e-10 1.6485e-06
3329.09 0 1.65016e-06 3325.73 0
15
2.14056e-06 3153.37 2.24248e-06 3010.06 2.73316e-10 1.6875e-06
3999.99 0 1.80164e-06 3746.59 0
16
2.38996e-06 3427.67 2.63386e-06 3110.27 2.30152e-10 1.74627e-06
4691.14 0 1.91648e-06 4274.49 0
17
2.93315e-06 3349.98 3.08031e-06 3189.94 1.85538e-10 2.17697e-06
4513.62 0 2.13505e-06 4602.23 0
18
3.3771e-06 3453.85 3.23863e-06 3601.52 1.20251e-10 2.23225e-06
5225.23 0 2.36877e-06 4924.07 0
19
4.19699e-06 3268.53 4.02621e-06 3407.17 1.07796e-10 2.29651e-06
5973.4 0 2.44714e-06 5605.72 0
20
4.27777e-06 3740.27 4.86115e-06 3291.4 8.37665e-11 2.26798e-06
7054.74 0 2.44016e-06 6556.96 0
21
5.58038e-06 3319.13 5.51606e-06 3357.83 5.93714e-11 2.61705e-06
7077.43 0 2.90197e-06 6382.56 0
22
5.46208e-06 3898.88 5.50258e-06 3870.19 5.76987e-11 2.85448e-06
7460.56 0 3.09923e-06 6871.39 0
23
7.26813e-06 3348.04 6.48407e-06 3752.89 4.47169e-11 3.03986e-06
8004.98 0 3.16566e-06 7686.86 0
24
6.56421e-06 4211.93 7.20581e-06 3836.9 3.61275e-11 2.84288e-06
9725.35 0 2.81577e-06 9818.99 0
25
7.97135e-06 3920.29 7.80654e-06 4003.06 3.02957e-11 4.04575e-06
7724.16 0 4.15001e-06 7530.11 0
26
8.59272e-06 4090.9 8.46934e-06 4150.5 2.53217e-11 4.1795e-06
8410.58 0 4.36958e-06 8044.71 0
27
1.05527e-05 3730.41 9.66865e-06 4071.51 1.97479e-11 4.24268e-06
9278.57 0 4.64476e-06 8475.37 0
28
9.77679e-06 4490.63 1.0918e-05 4021.26 1.71505e-11 4.41728e-06
9939.14 0 4.55165e-06 9645.73 0
29
1.23574e-05 3947.28 1.15308e-05 4230.22 1.54399e-11 4.96383e-06
9826.69 0 5.27042e-06 9255.05 0
30
1.25312e-05 4309.24 1.23192e-05 4383.4 1.38837e-11 5.36616e-06
10063.1 0 5.57707e-06 9682.51 0
31
1.41019e-05 4225.11 1.41554e-05 4209.15 1.12822e-11 5.56749e-06
10701.8 0 5.87983e-06 10133.3 0
32
1.44935e-05 4521.76 1.74419e-05 3757.38 9.5502e-12 5.91291e-06
11083.5 0 6.07622e-06 10785.7 0
33
1.68922e-05 4254.86 1.62224e-05 4430.55 8.00562e-12 6.51645e-06
11029.6 0 6.62821e-06 10843.7 0
34
1.73001e-05 4543.8 1.68924e-05 4653.46 7.54927e-12 6.83433e-06
11501.9 0 6.95343e-06 11304.9 0
35
2.07166e-05 4139.2 2.15962e-05 3970.61 6.52939e-12 7.06462e-06
12137.9 0 7.53811e-06 11375.5 0
36
1.98326e-05 4704.97 2.13473e-05 4371.14 5.68874e-12 6.703e-06
13920.9 0 6.99365e-06 13342.4 0
37
2.3838e-05 4249.78 2.23655e-05 4529.56 5.11318e-12 8.87253e-06
11417.9 0 9.13862e-06 11085.5 0
38
2.35903e-05 4652.09 2.48122e-05 4422.99 4.71306e-12
9.24238e-06 11874 0 9.27922e-06 11826.9 0
39
2.79913e-05 4238.39 2.64576e-05 4484.09 4.20714e-12 9.68511e-06
12249.5 0 9.95689e-06 11915.2 0
40
2.60131e-05 4920.6 2.9098e-05 4398.93 3.42002e-12 9.80308e-06
13057.1 0 1.04198e-05 12284.3 0
41
3.13419e-05 4398.01 3.03942e-05 4535.14 3.13757e-12 1.07587e-05
12812.2 0 1.10016e-05 12529.3 0
42
3.10015e-05 4779.64 3.20343e-05 4625.54 2.91245e-12 1.09989e-05
13471.9 0 1.16031e-05 12770.4 0
43
3.6527e-05 4353.33 3.49908e-05 4544.46 2.71446e-12 1.13164e-05
14051.7 0 1.20516e-05 13194.4 0
44
3.36654e-05 5060.62 3.86435e-05 4408.71 2.49076e-12 1.16151e-05
14667.8 0 1.21377e-05 14036.2 0
45
3.95282e-05 4610.63 3.98562e-05 4572.69 2.12037e-12 1.26784e-05
14374.8 0 1.32723e-05 13731.6 0
46
3.96351e-05 4911.6 4.17105e-05 4667.22 1.96734e-12 1.27302e-05
15292.2 0 1.34346e-05 14490.3 0
47
4.63424e-05 4480.69 4.50811e-05 4606.05 1.77515e-12 1.33133e-05
15596.9 0 1.39354e-05 14900.7 0
48
4.31748e-05 5122.99 5.0325e-05 4395.11 1.75073e-12 1.32491e-05
16694.3 0 1.3501e-05 16382.8 0
49
4.93001e-05 4772.77 5.11402e-05 4601.03 1.48788e-12 1.6222e-05
14504.9 0 1.72531e-05 13638 0
First
group is legacy axpy_prod(), second group is legacy prod(), third group
is legacy prod for low dimensions, and gemm() for high dimmension.
fourth group is gemm().
As
the legacy version is expression template based, it can possibly
provide some further advantages when the operations are chained.
I
put some defines in place, that would make possible to force the legacy
version as the default, as opposed to the runtime switched version.
On Friday, 11 March 2016, 14:21, Nasos Iliopoulos <nasos_i@hotmail.com> wrote:
Regardless, these are great figures.
Can you please run them comparing the simple uBlas implementation
for matrices from 2 to 100 with the gemm based one with a signle
thread? I wonder when the control statement starts to play a role.
What do you think should be the plan to switch from multi-core to to
single-threaded so as to not get all the communication hit for
smaller matrices?
- Nasos