|
Ublas : |
Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: palik imre (imre_palik_at_[hidden])
Date: 2016-03-13 14:26:19
A bit of confusion here.
I created a fork of the feature branch you sent, as I didn't have the rights to push there. Then I sent a pull request for that.
Should I fork the master instead?
Thanks,
Imre
On Sunday, 13 March 2016, 19:03, palik imre <imre_palik_at_[hidden]> wrote:
Results for low dimmension. More data would exceed mailing list limits:
#Â m original: t1Â Â MFLOPS original: t1Â Â Â MFLOPSÂ Â Diff nrm3Â gemm:Â Â t2Â Â Â MFLOPSÂ Â Diff nrm4 mixed:Â Â t2Â Â Â MFLOPSÂ Â Diff nrm5
 1  2.1263e-07  9.40601 1.32802e-07    15.06          0 1.36006e-07  14.7052          0 6.31318e-07  3.16798          0
 2 2.28189e-07  70.1173 1.37767e-07  116.138          0 1.59801e-07  100.125          0  6.653e-07  24.0493          0
 3   2.649e-07  203.851  1.5541e-07  347.468          0 1.54267e-07  350.042          0 6.98766e-07  77.2791          0
 4 3.35269e-07  381.783  2.4183e-07  529.297          0 2.12891e-07  601.247          0 6.65688e-07  192.282          0
 5 3.53868e-07  706.478 2.30977e-07  1082.36          0 2.53215e-07  987.303          0 7.07933e-07  353.141          0
 6  4.2987e-07  1004.95 2.59713e-07  1663.37          0 2.54867e-07     1695          0 8.17448e-07  528.474          0
 7 5.39621e-07  1271.26 4.51043e-07  1520.92 7.98975e-09 5.12948e-07  1337.37 7.98975e-09 8.76363e-07  782.781          0
 8 6.18993e-07   1654.3 6.38988e-07  1602.53 4.17673e-09 6.37931e-07  1605.19 4.17673e-09 8.92556e-07  1147.27          0
 9 7.73683e-07  1884.49 7.26336e-07  2007.34 3.30697e-09 8.00656e-07  1821.01 3.30697e-09 1.09762e-06  1328.33          0
 10 9.27569e-07  2156.17 8.31827e-07  2404.35 1.94317e-09 8.72131e-07  2293.23 1.94317e-09 1.16572e-06  1715.68          0
 11 1.13882e-06   2337.5 1.03275e-06  2577.58 1.27501e-09 1.08775e-06  2447.25 1.27501e-09 1.16439e-06  2286.17          0
 12 1.26427e-06  2733.59 1.40013e-06  2468.34 8.50076e-10 1.39562e-06  2476.32 8.50076e-10 1.01202e-06  3414.97          0
 13  1.5751e-06  2789.66 1.64811e-06  2666.09 5.39864e-10 1.66862e-06  2633.32 5.39864e-10 1.61517e-06  2720.45          0
 14 1.79595e-06  3055.77 1.89937e-06  2889.37 4.08632e-10 1.6485e-06  3329.09          0 1.65016e-06  3325.73          0
 15 2.14056e-06  3153.37 2.24248e-06  3010.06 2.73316e-10 1.6875e-06  3999.99          0 1.80164e-06  3746.59          0
 16 2.38996e-06  3427.67 2.63386e-06  3110.27 2.30152e-10 1.74627e-06  4691.14          0 1.91648e-06  4274.49          0
 17 2.93315e-06  3349.98 3.08031e-06  3189.94 1.85538e-10 2.17697e-06  4513.62          0 2.13505e-06  4602.23          0
 18  3.3771e-06  3453.85 3.23863e-06  3601.52 1.20251e-10 2.23225e-06  5225.23          0 2.36877e-06  4924.07          0
 19 4.19699e-06  3268.53 4.02621e-06  3407.17 1.07796e-10 2.29651e-06   5973.4          0 2.44714e-06  5605.72          0
 20 4.27777e-06  3740.27 4.86115e-06   3291.4 8.37665e-11 2.26798e-06  7054.74          0 2.44016e-06  6556.96          0
 21 5.58038e-06  3319.13 5.51606e-06  3357.83 5.93714e-11 2.61705e-06  7077.43          0 2.90197e-06  6382.56          0
 22 5.46208e-06  3898.88 5.50258e-06  3870.19 5.76987e-11 2.85448e-06  7460.56          0 3.09923e-06  6871.39          0
 23 7.26813e-06  3348.04 6.48407e-06  3752.89 4.47169e-11 3.03986e-06  8004.98          0 3.16566e-06  7686.86          0
 24 6.56421e-06  4211.93 7.20581e-06   3836.9 3.61275e-11 2.84288e-06  9725.35          0 2.81577e-06  9818.99          0
 25 7.97135e-06  3920.29 7.80654e-06  4003.06 3.02957e-11 4.04575e-06  7724.16          0 4.15001e-06  7530.11          0
 26 8.59272e-06   4090.9 8.46934e-06   4150.5 2.53217e-11 4.1795e-06  8410.58          0 4.36958e-06  8044.71          0
 27 1.05527e-05  3730.41 9.66865e-06  4071.51 1.97479e-11 4.24268e-06  9278.57          0 4.64476e-06  8475.37          0
 28 9.77679e-06  4490.63  1.0918e-05  4021.26 1.71505e-11 4.41728e-06  9939.14          0 4.55165e-06  9645.73          0
 29 1.23574e-05  3947.28 1.15308e-05  4230.22 1.54399e-11 4.96383e-06  9826.69          0 5.27042e-06  9255.05          0
 30 1.25312e-05  4309.24 1.23192e-05   4383.4 1.38837e-11 5.36616e-06  10063.1          0 5.57707e-06  9682.51          0
 31 1.41019e-05  4225.11 1.41554e-05  4209.15 1.12822e-11 5.56749e-06  10701.8          0 5.87983e-06  10133.3          0
 32 1.44935e-05  4521.76 1.74419e-05  3757.38 9.5502e-12 5.91291e-06  11083.5          0 6.07622e-06  10785.7          0
 33 1.68922e-05  4254.86 1.62224e-05  4430.55 8.00562e-12 6.51645e-06  11029.6          0 6.62821e-06  10843.7          0
 34 1.73001e-05   4543.8 1.68924e-05  4653.46 7.54927e-12 6.83433e-06  11501.9          0 6.95343e-06  11304.9          0
 35 2.07166e-05   4139.2 2.15962e-05  3970.61 6.52939e-12 7.06462e-06  12137.9          0 7.53811e-06  11375.5          0
 36 1.98326e-05  4704.97 2.13473e-05  4371.14 5.68874e-12  6.703e-06  13920.9          0 6.99365e-06  13342.4          0
 37  2.3838e-05  4249.78 2.23655e-05  4529.56 5.11318e-12 8.87253e-06  11417.9          0 9.13862e-06  11085.5          0
 38 2.35903e-05  4652.09 2.48122e-05  4422.99 4.71306e-12 9.24238e-06    11874          0 9.27922e-06  11826.9          0
 39 2.79913e-05  4238.39 2.64576e-05  4484.09 4.20714e-12 9.68511e-06  12249.5          0 9.95689e-06  11915.2          0
 40 2.60131e-05   4920.6  2.9098e-05  4398.93 3.42002e-12 9.80308e-06  13057.1          0 1.04198e-05  12284.3          0
 41 3.13419e-05  4398.01 3.03942e-05  4535.14 3.13757e-12 1.07587e-05  12812.2          0 1.10016e-05  12529.3          0
 42 3.10015e-05  4779.64 3.20343e-05  4625.54 2.91245e-12 1.09989e-05  13471.9          0 1.16031e-05  12770.4          0
 43  3.6527e-05  4353.33 3.49908e-05  4544.46 2.71446e-12 1.13164e-05  14051.7          0 1.20516e-05  13194.4          0
 44 3.36654e-05  5060.62 3.86435e-05  4408.71 2.49076e-12 1.16151e-05  14667.8          0 1.21377e-05  14036.2          0
 45 3.95282e-05  4610.63 3.98562e-05  4572.69 2.12037e-12 1.26784e-05  14374.8          0 1.32723e-05  13731.6          0
 46 3.96351e-05   4911.6 4.17105e-05  4667.22 1.96734e-12 1.27302e-05  15292.2          0 1.34346e-05  14490.3          0
 47 4.63424e-05  4480.69 4.50811e-05  4606.05 1.77515e-12 1.33133e-05  15596.9          0 1.39354e-05  14900.7          0
 48 4.31748e-05  5122.99  5.0325e-05  4395.11 1.75073e-12 1.32491e-05  16694.3          0 1.3501e-05  16382.8          0
 49 4.93001e-05  4772.77 5.11402e-05  4601.03 1.48788e-12 1.6222e-05  14504.9          0 1.72531e-05    13638          0
First group is legacy axpy_prod(), second group is legacy prod(), third group is legacy prod for low dimensions, and gemm() for high dimmension. fourth group is gemm().
As the legacy version is expression template based, it can possibly provide some further advantages when the operations are chained.
I put some defines in place, that would make possible to force the legacy version as the default, as opposed to the runtime switched version.
Imre
On Friday, 11 March 2016, 14:21, Nasos Iliopoulos <nasos_i_at_[hidden]> wrote:
Regardless, these are great figures.
Can you please run them comparing the simple uBlas implementation for matrices from 2 to 100 with the gemm based one with a signle thread? I wonder when the control statement starts to play a role.
What do you think should be the plan to switch from multi-core to to single-threaded so as to not get all the communication hit for smaller matrices?
- Nasos
_______________________________________________
ublas mailing list
ublas_at_[hidden]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: imre_palik_at_[hidden]