Boost logo

Ublas :

Subject: Re: [ublas] Matrix multiplication performance
From: Riccardo Rossi (rrossi_at_[hidden])
Date: 2016-01-28 15:15:12


i am impressed. 6* on a cuadcore!!

do you also do sparse linear algebra by chance?

cheers
Riccardo

On Thu, Jan 28, 2016 at 7:49 PM, Michael Lehn <michael.lehn_at_[hidden]>
wrote:

> In the meantime some results from my Haswell machine. It has 4 quad
> cores. But there are
> other jobs running so I went up to 8 threads. But anyway, the
> parallelisation is simple for
> the maximal matrix dimension N=M=K=4000 it reaches
>
> 1) 32.9 GFLOPS with 1 thread
> 2) 63 GFLOPS with 2 threads
> 3) 104.6 GFLOPS with 4 threads
> 4) 180.5 GFLOPS with 8 threads
>
> that is ok for a simple implementation but can be done better. Most of
> all it takes much too long (or
> much to big problem sizes to scale well). But for the moment we should
> focus on a good single threaded
> implementation and do the parallel stuff the right way later. As this
> will require more than just a single
> #pragma omp parallel for
>
>
>
> [lehn_at_node042 session4]$ g++ -Ofast -Wall -std=c++11 -DNDEBUG -DHAVE_FMA
> -I ../boost_1_60_0/ -fopenmp matprod.cc
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=1; ./a.out
> # m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS
> Diff nrm1
> 100 100 100 0.00119632 1671.79 0.00089036 2246.28
> 3.90562e-14
> 200 200 200 0.00322943 4954.44 0.00082579 19375.4
> 1.50135e-15
> 300 300 300 0.0108177 4991.81 0.00221283 24403.1
> 2.18434e-16
> 400 400 400 0.0247278 5176.35 0.00429661 29790.9
> 5.58593e-17
> 500 500 500 0.053677 4657.49 0.00822185 30406.8
> 1.94899e-17
> 600 600 600 0.0820133 5267.44 0.0136631 31617.9
> 1.16524e-17
> 700 700 700 0.129231 5308.34 0.0208619 32882.9
> 6.82385e-18
> 800 800 800 0.19206 5331.67 0.0309358 33100.8
> 4.08617e-18
> 900 900 900 0.272354 5353.34 0.0430091 33899.8
> 2.54117e-18
> 1000 1000 1000 0.372831 5364.36 0.0582482 34335.8
> 1.64011e-18
> 1100 1100 1100 0.494906 5378.8 0.0796676 33413.8
> 1.08587e-18
> 1200 1200 1200 0.642926 5375.43 0.098814 34974.8
> 7.43828e-19
> 1300 1300 1300 0.815164 5390.32 0.125541 35000.5
> 5.26152e-19
> 1400 1400 1400 1.04147 5269.48 0.154808 35450.4
> 3.81507e-19
> 1500 1500 1500 1.24516 5420.99 0.187327 36033.2
> 2.82388e-19
> 1600 1600 1600 1.5581 5257.68 0.236257 34674
> 2.12031e-19
> 1700 1700 1700 2.57574 3814.82 0.273446 35933.9
> 1.61384e-19
> 1800 1800 1800 3.24948 3589.5 0.319974 36453
> 1.25033e-19
> 1900 1900 1900 4.01719 3414.82 0.378235 36268.4
> 9.86666e-20
> 2000 2000 2000 4.82997 3312.65 0.438886 36456
> 7.86863e-20
> 2100 2100 2100 5.88206 3148.89 0.517821 35769.1
> 6.31726e-20
> 2200 2200 2200 6.87358 3098.24 0.590235 36080.6
> 5.1152e-20
> 2300 2300 2300 8.08021 3011.55 0.659934 36873.4
> 4.19219e-20
> 2400 2400 2400 9.31063 2969.51 0.748285 36948.5
> 3.46865e-20
> 2500 2500 2500 10.5343 2966.51 0.84448 37005
> 2.88942e-20
> 2600 2600 2600 11.8768 2959.71 0.984227 35715.3
> 2.42294e-20
> 2700 2700 2700 13.3378 2951.45 1.06838 36846.5
> 2.04036e-20
> 2800 2800 2800 14.9304 2940.57 1.18762 36968.2
> 1.73201e-20
> 2900 2900 2900 16.8965 2886.87 1.33445 36552.8
> 1.47904e-20
> 3000 3000 3000 18.7376 2881.9 1.49449 36132.7
> 1.27205e-20
> 3100 3100 3100 20.8439 2858.48 1.66163 35857.5
> 1.09759e-20
> 3200 3200 3200 22.9032 2861.44 1.82771 35856.9
> 9.49415e-21
> 3300 3300 3300 28.2407 2545.05 2.08438 34482.2
> 8.25868e-21
> 3400 3400 3400 27.5374 2854.6 2.18449 35984.7
> 7.22064e-21
> 3500 3500 3500 29.925 2865.5 2.34372 36587.1
> 6.34137e-21
> 3600 3600 3600 32.6588 2857.17 2.56586 36366.7
> 5.5874e-21
> 3700 3700 3700 34.5032 2936.14 2.77154 36552.2
> 4.92873e-21
> 3800 3800 3800 36.9099 2973.29 2.97732 36860.1
> 4.36811e-21
> 3900 3900 3900 44.6497 2657.09 3.24271 36586.1
> 3.88313e-21
> 4000 4000 4000 56.9767 2246.53 3.88046 32985.8
> 3.46672e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=2; ./a.out
> # m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS
> Diff nrm1
> 100 100 100 0.00120386 1661.33 0.000876976 2280.56
> 3.95867e-14
> 200 200 200 0.00323702 4942.82 0.00099518 16077.5
> 1.50256e-15
> 300 300 300 0.0106352 5077.5 0.00286667 18837.2
> 2.19644e-16
> 400 400 400 0.0247765 5166.19 0.00610925 20951.8
> 5.61969e-17
> 500 500 500 0.0478359 5226.2 0.00707235 35348.9
> 1.94268e-17
> 600 600 600 0.082058 5264.57 0.0108406 39850.3
> 1.16982e-17
> 700 700 700 0.129637 5291.71 0.0170924 40134.8
> 6.8281e-18
> 800 800 800 0.1925 5319.48 0.0214161 47814.4
> 4.09348e-18
> 900 900 900 0.273022 5340.22 0.0298684 48814.2
> 2.54562e-18
> 1000 1000 1000 0.373113 5360.3 0.0417747 47875.9
> 1.64027e-18
> 1100 1100 1100 0.499034 5334.3 0.0527302 50483.4
> 1.08356e-18
> 1200 1200 1200 0.64351 5370.55 0.0624654 55326.6
> 7.44302e-19
> 1300 1300 1300 0.829601 5296.52 0.0793488 55375.8
> 5.25547e-19
> 1400 1400 1400 1.13615 4830.35 0.0937135 58561.5
> 3.8117e-19
> 1500 1500 1500 1.38215 4883.71 0.11078 60931.4
> 2.82628e-19
> 1600 1600 1600 2.34569 3492.37 0.148535 55152.1
> 2.11636e-19
> 1700 1700 1700 2.80764 3499.73 0.166754 58925.2
> 1.61617e-19
> 1800 1800 1800 3.65597 3190.4 0.183227 63658.6
> 1.25225e-19
> 1900 1900 1900 6.04791 2268.22 0.229272 59832.8
> 9.8624e-20
> 2000 2000 2000 5.41562 2954.41 0.244907 65331
> 7.8709e-20
> 2100 2100 2100 5.79329 3197.15 0.320638 57766.1
> 6.31124e-20
> 2200 2200 2200 10.1105 2106.32 0.348126 61173.2
> 5.11424e-20
> 2300 2300 2300 11.746 2071.68 0.385373 63144
> 4.18844e-20
> 2400 2400 2400 13.4099 2061.77 0.438608 63035.8
> 3.46829e-20
> 2500 2500 2500 14.8645 2102.32 0.491434 63589.4
> 2.88839e-20
> 2600 2600 2600 17.1602 2048.46 0.550163 63893.8
> 2.42378e-20
> 2700 2700 2700 19.24 2046.05 0.616314 63873.3
> 2.03993e-20
> 2800 2800 2800 14.8633 2953.85 0.675975 64949.2
> 1.73082e-20
> 2900 2900 2900 18.533 2631.96 0.72636 67154.1
> 1.47984e-20
> 3000 3000 3000 18.2701 2955.64 0.804625 67112
> 1.27211e-20
> 3100 3100 3100 20.2371 2944.19 0.938507 63485.9
> 1.09831e-20
> 3200 3200 3200 22.6838 2889.11 1.07581 60918.1
> 9.49232e-21
> 3300 3300 3300 25.0228 2872.33 1.06473 67504.6
> 8.25942e-21
> 3400 3400 3400 27.3561 2873.51 1.16247 67621.6
> 7.21511e-21
> 3500 3500 3500 29.7889 2878.59 1.32098 64913.8
> 6.3385e-21
> 3600 3600 3600 34.8098 2680.62 1.37908 67662.7
> 5.58738e-21
> 3700 3700 3700 37.6151 2693.23 1.52253 66538.1
> 4.92976e-21
> 3800 3800 3800 38.99 2814.67 1.63282 67211.5
> 4.36537e-21
> 3900 3900 3900 57.5765 2060.53 1.75221 67707.4
> 3.88246e-21
> 4000 4000 4000 51.1335 2503.25 2.03062 63035.1
> 3.46549e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=4; ./a.out
> # m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS
> Diff nrm1
> 100 100 100 0.00119733 1670.39 0.00124331 1608.61
> 3.84618e-14
> 200 200 200 0.00427996 3738.35 0.000965206 16576.8
> 1.47604e-15
> 300 300 300 0.0146617 3683.06 0.00235442 22935.6
> 2.18643e-16
> 400 400 400 0.0301558 4244.62 0.00431311 29677
> 5.57089e-17
> 500 500 500 0.0509763 4904.24 0.00541684 46152.4
> 1.94817e-17
> 600 600 600 0.0823676 5244.78 0.00815973 52943
> 1.16851e-17
> 700 700 700 0.131064 5234.07 0.0133055 51557.7
> 6.81692e-18
> 800 800 800 0.198438 5160.3 0.0208701 49065.4
> 4.09087e-18
> 900 900 900 0.273346 5333.91 0.0244156 59716
> 2.53963e-18
> 1000 1000 1000 0.374021 5347.3 0.0252625 79168.7
> 1.64654e-18
> 1100 1100 1100 0.502426 5298.29 0.05022 53006.7
> 1.08395e-18
> 1200 1200 1200 0.865696 3992.16 0.0443738 77883.9
> 7.44661e-19
> 1300 1300 1300 1.00063 4391.23 0.0544683 80670.8
> 5.25559e-19
> 1400 1400 1400 1.26828 4327.13 0.0599685 91514.7
> 3.80933e-19
> 1500 1500 1500 1.3623 4954.86 0.0826977 81622.6
> 2.8281e-19
> 1600 1600 1600 2.14419 3820.56 0.0940622 87091.3
> 2.11718e-19
> 1700 1700 1700 2.98106 3296.14 0.104828 93734.3
> 1.61252e-19
> 1800 1800 1800 4.10679 2840.17 0.125856 92677.2
> 1.25247e-19
> 1900 1900 1900 7.25737 1890.22 0.137977 99422.2
> 9.85647e-20
> 2000 2000 2000 9.0378 1770.34 0.195959 81649.8
> 7.86877e-20
> 2100 2100 2100 7.43091 2492.56 0.205205 90261
> 6.31814e-20
> 2200 2200 2200 8.01552 2656.84 0.229878 92640.5
> 5.11206e-20
> 2300 2300 2300 11.3209 2149.47 0.242479 100355
> 4.19281e-20
> 2400 2400 2400 11.7655 2349.91 0.267819 103234
> 3.4696e-20
> 2500 2500 2500 14.75 2118.65 0.318302 98177.1
> 2.89065e-20
> 2600 2600 2600 16.1598 2175.27 0.349963 100445
> 2.42432e-20
> 2700 2700 2700 19.6465 2003.72 0.384713 102326
> 2.04284e-20
> 2800 2800 2800 18.5487 2366.95 0.422473 103922
> 1.73051e-20
> 2900 2900 2900 18.4844 2638.87 0.431616 113012
> 1.48037e-20
> 3000 3000 3000 18.3601 2941.16 0.487947 110668
> 1.27205e-20
> 3100 3100 3100 20.1449 2957.67 0.555138 107328
> 1.09745e-20
> 3200 3200 3200 22.2403 2946.72 0.597566 109672
> 9.49034e-21
> 3300 3300 3300 24.3526 2951.39 0.635492 113100
> 8.25459e-21
> 3400 3400 3400 26.5834 2957.04 0.693353 113374
> 7.22134e-21
> 3500 3500 3500 28.9996 2956.93 0.753307 113831
> 6.33808e-21
> 3600 3600 3600 31.4492 2967.07 0.793409 117609
> 5.58761e-21
> 3700 3700 3700 34.9533 2898.33 0.959263 105608
> 4.93129e-21
> 3800 3800 3800 38.2463 2869.4 1.01686 107924
> 4.36735e-21
> 3900 3900 3900 42.3957 2798.35 1.08582 109262
> 3.88282e-21
> 4000 4000 4000 44.7076 2863.05 1.22383 104590
> 3.469e-21
> [lehn_at_node042 session4]$ export OMP_NUM_THREADS=8; ./a.out
> # m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS
> Diff nrm1
> 100 100 100 0.00120762 1656.15 0.001279 1563.72
> 3.8463e-14
> 200 200 200 0.0036143 4426.86 0.000631185 25349.1
> 1.48858e-15
> 300 300 300 0.0108139 4993.56 0.00204664 26384.7
> 2.20015e-16
> 400 400 400 0.0251417 5091.13 0.00316074 40496.9
> 5.58204e-17
> 500 500 500 0.0482996 5176.03 0.00479854 52099.2
> 1.9429e-17
> 600 600 600 0.0830052 5204.49 0.0074349 58104.3
> 1.16567e-17
> 700 700 700 0.13281 5165.28 0.0134778 50898.6
> 6.82167e-18
> 800 800 800 0.19639 5214.12 0.0143988 71117.2
> 4.08235e-18
> 900 900 900 0.279542 5215.68 0.0186552 78155
> 2.54218e-18
> 1000 1000 1000 0.381906 5236.89 0.020541 97366.2
> 1.63963e-18
> 1100 1100 1100 0.509376 5226 0.0338259 78697.1
> 1.08399e-18
> 1200 1200 1200 0.760565 4543.99 0.0317094 108990
> 7.44215e-19
> 1300 1300 1300 1.04442 4207.14 0.0419104 104843
> 5.25101e-19
> 1400 1400 1400 1.47537 3719.75 0.0450985 121689
> 3.81236e-19
> 1500 1500 1500 1.90994 3534.15 0.0514728 131137
> 2.82394e-19
> 1600 1600 1600 1.56705 5227.67 0.0599189 136718
> 2.11847e-19
> 1700 1700 1700 2.62892 3737.66 0.0756787 129838
> 1.61316e-19
> 1800 1800 1800 3.29831 3536.35 0.0827417 140969
> 1.25087e-19
> 1900 1900 1900 4.03473 3399.98 0.0915113 149905
> 9.85857e-20
> 2000 2000 2000 4.87315 3283.3 0.105251 152017
> 7.86417e-20
> 2100 2100 2100 5.87975 3150.13 0.123634 149813
> 6.31281e-20
> 2200 2200 2200 7.06021 3016.34 0.134536 158293
> 5.11845e-20
> 2300 2300 2300 10.6045 2294.69 0.162671 149590
> 4.19035e-20
> 2400 2400 2400 9.31785 2967.21 0.160164 172623
> 3.46453e-20
> 2500 2500 2500 10.4852 2980.38 0.181067 172588
> 2.89024e-20
> 2600 2600 2600 11.8263 2972.35 0.208792 168359
> 2.42313e-20
> 2700 2700 2700 13.2755 2965.32 0.226646 173690
> 2.04063e-20
> 2800 2800 2800 14.8042 2965.65 0.24966 175855
> 1.73142e-20
> 2900 2900 2900 16.9983 2869.58 0.287892 169432
> 1.47875e-20
> 3000 3000 3000 19.7129 2739.32 0.330801 163240
> 1.27204e-20
> 3100 3100 3100 21.4476 2778.02 0.382773 155659
> 1.09704e-20
> 3200 3200 3200 22.7482 2880.93 0.440904 148640
> 9.49823e-21
> 3300 3300 3300 25.1449 2858.39 0.416183 172698
> 8.25712e-21
> 3400 3400 3400 27.4412 2864.6 0.497616 157969
> 7.2164e-21
> 3500 3500 3500 30.5976 2802.51 0.49974 171589
> 6.33781e-21
> 3600 3600 3600 32.5102 2870.24 0.558002 167225
> 5.59021e-21
> 3700 3700 3700 35.3566 2865.26 0.571003 177418
> 4.93007e-21
> 3800 3800 3800 37.321 2940.55 0.578064 189847
> 4.36563e-21
> 3900 3900 3900 40.1645 2953.8 0.623894 190157
> 3.8876e-21
> 4000 4000 4000 43.1753 2964.66 0.709328 180452
> 3.46575e-21
>
>
> On 28 Jan 2016, at 18:41, Michael Lehn <michael.lehn_at_[hidden]> wrote:
>
> > Also the parallelisation with openmp is done pretty cheap and simple at
> the moment. So you also
> > might want to check how it scales by
> >
> > export OMP_NUM_THREADS=2; ./a.out
> > export OMP_NUM_THREADS=4; ./a.out
> > export OMP_NUM_THREADS=6; ./a.out
> > ...
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: rrossi_at_[hidden]
>

-- 
*Riccardo Rossi*
PhD, Civil Engineer
member of the Kratos Team: www.cimne.com/kratos
lecturer at Universitat Politècnica de Catalunya, BarcelonaTech (UPC)
Research fellow at International Center for Numerical Methods in
Engineering (CIMNE)
C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9
08034 – Barcelona – Spain – www.cimne.com  -
T.(+34) 93 401 56 96 skype: *rougered4*
<http://www.cimne.com/>
<https://www.facebook.com/cimne> <http://blog.cimne.com/>
<http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos>
<http://www.linkedin.com/company/cimne> <https://twitter.com/cimne>
Les dades personals contingudes en aquest missatge són tractades amb la
finalitat de mantenir el contacte professional entre CIMNE i voste. Podra
exercir els drets d'accés, rectificació, cancel·lació i oposició,
dirigint-se a cimne_at_cimne.upc.edu. La utilització de la seva adreça de
correu electronic per part de CIMNE queda subjecte a les disposicions de la
Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
Electronic.
 Imprimiu aquest missatge, només si és estrictament necessari.
<http://www.cimne.com/>