<div dir="ltr"><div><div><div>i am impressed. 6* on a cuadcore!! <br><br></div>do you also do sparse linear algebra by chance?<br><br></div>cheers<br></div>Riccardo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 28, 2016 at 7:49 PM, Michael Lehn <span dir="ltr"><<a href="mailto:michael.lehn@uni-ulm.de" target="_blank">michael.lehn@uni-ulm.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In the meantime some results from my Haswell machine. It has 4 quad cores. But there are<br> other jobs running so I went up to 8 threads. But anyway, the parallelisation is simple for<br> the maximal matrix dimension N=M=K=4000 it reaches<br> <br> 1) 32.9 GFLOPS with 1 thread<br> 2) 63 GFLOPS with 2 threads<br> 3) 104.6 GFLOPS with 4 threads<br> 4) 180.5 GFLOPS with 8 threads<br> <br> that is ok for a simple implementation but can be done better. Most of all it takes much too long (or<br> much to big problem sizes to scale well). But for the moment we should focus on a good single threaded<br> implementation and do the parallel stuff the right way later. As this will require more than just a single<br> #pragma omp parallel for<br> <br> <br> <br> [lehn@node042 session4]$ g++ -Ofast -Wall -std=c++11 -DNDEBUG -DHAVE_FMA -I ../boost_1_60_0/ -fopenmp matprod.cc<br> [lehn@node042 session4]$ export OMP_NUM_THREADS=1; ./a.out<br> <span class=""># m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS Diff nrm1<br> </span> 100 100 100 0.00119632 1671.79 0.00089036 2246.28 3.90562e-14<br> 200 200 200 0.00322943 4954.44 0.00082579 19375.4 1.50135e-15<br> 300 300 300 0.0108177 4991.81 0.00221283 24403.1 2.18434e-16<br> 400 400 400 0.0247278 5176.35 0.00429661 29790.9 5.58593e-17<br> 500 500 500 0.053677 4657.49 0.00822185 30406.8 1.94899e-17<br> 600 600 600 0.0820133 5267.44 0.0136631 31617.9 1.16524e-17<br> 700 700 700 0.129231 5308.34 0.0208619 32882.9 6.82385e-18<br> 800 800 800 0.19206 5331.67 0.0309358 33100.8 4.08617e-18<br> 900 900 900 0.272354 5353.34 0.0430091 33899.8 2.54117e-18<br> 1000 1000 1000 0.372831 5364.36 0.0582482 34335.8 1.64011e-18<br> 1100 1100 1100 0.494906 5378.8 0.0796676 33413.8 1.08587e-18<br> 1200 1200 1200 0.642926 5375.43 0.098814 34974.8 7.43828e-19<br> 1300 1300 1300 0.815164 5390.32 0.125541 35000.5 5.26152e-19<br> 1400 1400 1400 1.04147 5269.48 0.154808 35450.4 3.81507e-19<br> 1500 1500 1500 1.24516 5420.99 0.187327 36033.2 2.82388e-19<br> 1600 1600 1600 1.5581 5257.68 0.236257 34674 2.12031e-19<br> 1700 1700 1700 2.57574 3814.82 0.273446 35933.9 1.61384e-19<br> 1800 1800 1800 3.24948 3589.5 0.319974 36453 1.25033e-19<br> 1900 1900 1900 4.01719 3414.82 0.378235 36268.4 9.86666e-20<br> 2000 2000 2000 4.82997 3312.65 0.438886 36456 7.86863e-20<br> 2100 2100 2100 5.88206 3148.89 0.517821 35769.1 6.31726e-20<br> 2200 2200 2200 6.87358 3098.24 0.590235 36080.6 5.1152e-20<br> 2300 2300 2300 8.08021 3011.55 0.659934 36873.4 4.19219e-20<br> 2400 2400 2400 9.31063 2969.51 0.748285 36948.5 3.46865e-20<br> 2500 2500 2500 10.5343 2966.51 0.84448 37005 2.88942e-20<br> 2600 2600 2600 11.8768 2959.71 0.984227 35715.3 2.42294e-20<br> 2700 2700 2700 13.3378 2951.45 1.06838 36846.5 2.04036e-20<br> 2800 2800 2800 14.9304 2940.57 1.18762 36968.2 1.73201e-20<br> 2900 2900 2900 16.8965 2886.87 1.33445 36552.8 1.47904e-20<br> 3000 3000 3000 18.7376 2881.9 1.49449 36132.7 1.27205e-20<br> 3100 3100 3100 20.8439 2858.48 1.66163 35857.5 1.09759e-20<br> 3200 3200 3200 22.9032 2861.44 1.82771 35856.9 9.49415e-21<br> 3300 3300 3300 28.2407 2545.05 2.08438 34482.2 8.25868e-21<br> 3400 3400 3400 27.5374 2854.6 2.18449 35984.7 7.22064e-21<br> 3500 3500 3500 29.925 2865.5 2.34372 36587.1 6.34137e-21<br> 3600 3600 3600 32.6588 2857.17 2.56586 36366.7 5.5874e-21<br> 3700 3700 3700 34.5032 2936.14 2.77154 36552.2 4.92873e-21<br> 3800 3800 3800 36.9099 2973.29 2.97732 36860.1 4.36811e-21<br> 3900 3900 3900 44.6497 2657.09 3.24271 36586.1 3.88313e-21<br> 4000 4000 4000 56.9767 2246.53 3.88046 32985.8 3.46672e-21<br> [lehn@node042 session4]$ export OMP_NUM_THREADS=2; ./a.out<br> <span class=""># m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS Diff nrm1<br> </span> 100 100 100 0.00120386 1661.33 0.000876976 2280.56 3.95867e-14<br> 200 200 200 0.00323702 4942.82 0.00099518 16077.5 1.50256e-15<br> 300 300 300 0.0106352 5077.5 0.00286667 18837.2 2.19644e-16<br> 400 400 400 0.0247765 5166.19 0.00610925 20951.8 5.61969e-17<br> 500 500 500 0.0478359 5226.2 0.00707235 35348.9 1.94268e-17<br> 600 600 600 0.082058 5264.57 0.0108406 39850.3 1.16982e-17<br> 700 700 700 0.129637 5291.71 0.0170924 40134.8 6.8281e-18<br> 800 800 800 0.1925 5319.48 0.0214161 47814.4 4.09348e-18<br> 900 900 900 0.273022 5340.22 0.0298684 48814.2 2.54562e-18<br> 1000 1000 1000 0.373113 5360.3 0.0417747 47875.9 1.64027e-18<br> 1100 1100 1100 0.499034 5334.3 0.0527302 50483.4 1.08356e-18<br> 1200 1200 1200 0.64351 5370.55 0.0624654 55326.6 7.44302e-19<br> 1300 1300 1300 0.829601 5296.52 0.0793488 55375.8 5.25547e-19<br> 1400 1400 1400 1.13615 4830.35 0.0937135 58561.5 3.8117e-19<br> 1500 1500 1500 1.38215 4883.71 0.11078 60931.4 2.82628e-19<br> 1600 1600 1600 2.34569 3492.37 0.148535 55152.1 2.11636e-19<br> 1700 1700 1700 2.80764 3499.73 0.166754 58925.2 1.61617e-19<br> 1800 1800 1800 3.65597 3190.4 0.183227 63658.6 1.25225e-19<br> 1900 1900 1900 6.04791 2268.22 0.229272 59832.8 9.8624e-20<br> 2000 2000 2000 5.41562 2954.41 0.244907 65331 7.8709e-20<br> 2100 2100 2100 5.79329 3197.15 0.320638 57766.1 6.31124e-20<br> 2200 2200 2200 10.1105 2106.32 0.348126 61173.2 5.11424e-20<br> 2300 2300 2300 11.746 2071.68 0.385373 63144 4.18844e-20<br> 2400 2400 2400 13.4099 2061.77 0.438608 63035.8 3.46829e-20<br> 2500 2500 2500 14.8645 2102.32 0.491434 63589.4 2.88839e-20<br> 2600 2600 2600 17.1602 2048.46 0.550163 63893.8 2.42378e-20<br> 2700 2700 2700 19.24 2046.05 0.616314 63873.3 2.03993e-20<br> 2800 2800 2800 14.8633 2953.85 0.675975 64949.2 1.73082e-20<br> 2900 2900 2900 18.533 2631.96 0.72636 67154.1 1.47984e-20<br> 3000 3000 3000 18.2701 2955.64 0.804625 67112 1.27211e-20<br> 3100 3100 3100 20.2371 2944.19 0.938507 63485.9 1.09831e-20<br> 3200 3200 3200 22.6838 2889.11 1.07581 60918.1 9.49232e-21<br> 3300 3300 3300 25.0228 2872.33 1.06473 67504.6 8.25942e-21<br> 3400 3400 3400 27.3561 2873.51 1.16247 67621.6 7.21511e-21<br> 3500 3500 3500 29.7889 2878.59 1.32098 64913.8 6.3385e-21<br> 3600 3600 3600 34.8098 2680.62 1.37908 67662.7 5.58738e-21<br> 3700 3700 3700 37.6151 2693.23 1.52253 66538.1 4.92976e-21<br> 3800 3800 3800 38.99 2814.67 1.63282 67211.5 4.36537e-21<br> 3900 3900 3900 57.5765 2060.53 1.75221 67707.4 3.88246e-21<br> 4000 4000 4000 51.1335 2503.25 2.03062 63035.1 3.46549e-21<br> [lehn@node042 session4]$ export OMP_NUM_THREADS=4; ./a.out<br> <span class=""># m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS Diff nrm1<br> </span> 100 100 100 0.00119733 1670.39 0.00124331 1608.61 3.84618e-14<br> 200 200 200 0.00427996 3738.35 0.000965206 16576.8 1.47604e-15<br> 300 300 300 0.0146617 3683.06 0.00235442 22935.6 2.18643e-16<br> 400 400 400 0.0301558 4244.62 0.00431311 29677 5.57089e-17<br> 500 500 500 0.0509763 4904.24 0.00541684 46152.4 1.94817e-17<br> 600 600 600 0.0823676 5244.78 0.00815973 52943 1.16851e-17<br> 700 700 700 0.131064 5234.07 0.0133055 51557.7 6.81692e-18<br> 800 800 800 0.198438 5160.3 0.0208701 49065.4 4.09087e-18<br> 900 900 900 0.273346 5333.91 0.0244156 59716 2.53963e-18<br> 1000 1000 1000 0.374021 5347.3 0.0252625 79168.7 1.64654e-18<br> 1100 1100 1100 0.502426 5298.29 0.05022 53006.7 1.08395e-18<br> 1200 1200 1200 0.865696 3992.16 0.0443738 77883.9 7.44661e-19<br> 1300 1300 1300 1.00063 4391.23 0.0544683 80670.8 5.25559e-19<br> 1400 1400 1400 1.26828 4327.13 0.0599685 91514.7 3.80933e-19<br> 1500 1500 1500 1.3623 4954.86 0.0826977 81622.6 2.8281e-19<br> 1600 1600 1600 2.14419 3820.56 0.0940622 87091.3 2.11718e-19<br> 1700 1700 1700 2.98106 3296.14 0.104828 93734.3 1.61252e-19<br> 1800 1800 1800 4.10679 2840.17 0.125856 92677.2 1.25247e-19<br> 1900 1900 1900 7.25737 1890.22 0.137977 99422.2 9.85647e-20<br> 2000 2000 2000 9.0378 1770.34 0.195959 81649.8 7.86877e-20<br> 2100 2100 2100 7.43091 2492.56 0.205205 90261 6.31814e-20<br> 2200 2200 2200 8.01552 2656.84 0.229878 92640.5 5.11206e-20<br> 2300 2300 2300 11.3209 2149.47 0.242479 100355 4.19281e-20<br> 2400 2400 2400 11.7655 2349.91 0.267819 103234 3.4696e-20<br> 2500 2500 2500 14.75 2118.65 0.318302 98177.1 2.89065e-20<br> 2600 2600 2600 16.1598 2175.27 0.349963 100445 2.42432e-20<br> 2700 2700 2700 19.6465 2003.72 0.384713 102326 2.04284e-20<br> 2800 2800 2800 18.5487 2366.95 0.422473 103922 1.73051e-20<br> 2900 2900 2900 18.4844 2638.87 0.431616 113012 1.48037e-20<br> 3000 3000 3000 18.3601 2941.16 0.487947 110668 1.27205e-20<br> 3100 3100 3100 20.1449 2957.67 0.555138 107328 1.09745e-20<br> 3200 3200 3200 22.2403 2946.72 0.597566 109672 9.49034e-21<br> 3300 3300 3300 24.3526 2951.39 0.635492 113100 8.25459e-21<br> 3400 3400 3400 26.5834 2957.04 0.693353 113374 7.22134e-21<br> 3500 3500 3500 28.9996 2956.93 0.753307 113831 6.33808e-21<br> 3600 3600 3600 31.4492 2967.07 0.793409 117609 5.58761e-21<br> 3700 3700 3700 34.9533 2898.33 0.959263 105608 4.93129e-21<br> 3800 3800 3800 38.2463 2869.4 1.01686 107924 4.36735e-21<br> 3900 3900 3900 42.3957 2798.35 1.08582 109262 3.88282e-21<br> 4000 4000 4000 44.7076 2863.05 1.22383 104590 3.469e-21<br> [lehn@node042 session4]$ export OMP_NUM_THREADS=8; ./a.out<br> <span class=""># m n k uBLAS: t1 MFLOPS Blocked: t2 MFLOPS Diff nrm1<br> </span> 100 100 100 0.00120762 1656.15 0.001279 1563.72 3.8463e-14<br> 200 200 200 0.0036143 4426.86 0.000631185 25349.1 1.48858e-15<br> 300 300 300 0.0108139 4993.56 0.00204664 26384.7 2.20015e-16<br> 400 400 400 0.0251417 5091.13 0.00316074 40496.9 5.58204e-17<br> 500 500 500 0.0482996 5176.03 0.00479854 52099.2 1.9429e-17<br> 600 600 600 0.0830052 5204.49 0.0074349 58104.3 1.16567e-17<br> 700 700 700 0.13281 5165.28 0.0134778 50898.6 6.82167e-18<br> 800 800 800 0.19639 5214.12 0.0143988 71117.2 4.08235e-18<br> 900 900 900 0.279542 5215.68 0.0186552 78155 2.54218e-18<br> 1000 1000 1000 0.381906 5236.89 0.020541 97366.2 1.63963e-18<br> 1100 1100 1100 0.509376 5226 0.0338259 78697.1 1.08399e-18<br> 1200 1200 1200 0.760565 4543.99 0.0317094 108990 7.44215e-19<br> 1300 1300 1300 1.04442 4207.14 0.0419104 104843 5.25101e-19<br> 1400 1400 1400 1.47537 3719.75 0.0450985 121689 3.81236e-19<br> 1500 1500 1500 1.90994 3534.15 0.0514728 131137 2.82394e-19<br> 1600 1600 1600 1.56705 5227.67 0.0599189 136718 2.11847e-19<br> 1700 1700 1700 2.62892 3737.66 0.0756787 129838 1.61316e-19<br> 1800 1800 1800 3.29831 3536.35 0.0827417 140969 1.25087e-19<br> 1900 1900 1900 4.03473 3399.98 0.0915113 149905 9.85857e-20<br> 2000 2000 2000 4.87315 3283.3 0.105251 152017 7.86417e-20<br> 2100 2100 2100 5.87975 3150.13 0.123634 149813 6.31281e-20<br> 2200 2200 2200 7.06021 3016.34 0.134536 158293 5.11845e-20<br> 2300 2300 2300 10.6045 2294.69 0.162671 149590 4.19035e-20<br> 2400 2400 2400 9.31785 2967.21 0.160164 172623 3.46453e-20<br> 2500 2500 2500 10.4852 2980.38 0.181067 172588 2.89024e-20<br> 2600 2600 2600 11.8263 2972.35 0.208792 168359 2.42313e-20<br> 2700 2700 2700 13.2755 2965.32 0.226646 173690 2.04063e-20<br> 2800 2800 2800 14.8042 2965.65 0.24966 175855 1.73142e-20<br> 2900 2900 2900 16.9983 2869.58 0.287892 169432 1.47875e-20<br> 3000 3000 3000 19.7129 2739.32 0.330801 163240 1.27204e-20<br> 3100 3100 3100 21.4476 2778.02 0.382773 155659 1.09704e-20<br> 3200 3200 3200 22.7482 2880.93 0.440904 148640 9.49823e-21<br> 3300 3300 3300 25.1449 2858.39 0.416183 172698 8.25712e-21<br> 3400 3400 3400 27.4412 2864.6 0.497616 157969 7.2164e-21<br> 3500 3500 3500 30.5976 2802.51 0.49974 171589 6.33781e-21<br> 3600 3600 3600 32.5102 2870.24 0.558002 167225 5.59021e-21<br> 3700 3700 3700 35.3566 2865.26 0.571003 177418 4.93007e-21<br> 3800 3800 3800 37.321 2940.55 0.578064 189847 4.36563e-21<br> 3900 3900 3900 40.1645 2953.8 0.623894 190157 3.8876e-21<br> 4000 4000 4000 43.1753 2964.66 0.709328 180452 3.46575e-21<br> <span class=""><br> <br> On 28 Jan 2016, at 18:41, Michael Lehn <<a href="mailto:michael.lehn@uni-ulm.de">michael.lehn@uni-ulm.de</a>> wrote:<br> <br> > Also the parallelisation with openmp is done pretty cheap and simple at the moment. So you also<br> > might want to check how it scales by<br> ><br> > export OMP_NUM_THREADS=2; ./a.out<br> > export OMP_NUM_THREADS=4; ./a.out<br> > export OMP_NUM_THREADS=6; ./a.out<br> > ...<br> <br> </span><span class="">_______________________________________________<br> ublas mailing list<br> <a href="mailto:ublas@lists.boost.org">ublas@lists.boost.org</a><br> <a href="http://lists.boost.org/mailman/listinfo.cgi/ublas" rel="noreferrer" target="_blank">http://lists.boost.org/mailman/listinfo.cgi/ublas</a><br> </span>Sent to: <a href="mailto:rrossi@cimne.upc.edu">rrossi@cimne.upc.edu</a><br> </blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><p></p><div><div dir="ltr"><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(0,73,150);line-height:18px"><b>Riccardo Rossi<br></b></p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"> PhD, Civil Engineer</p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"><br></p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px">member of the Kratos Team: <a href="http://www.cimne.com/kratos" target="_blank">www.cimne.com/kratos</a><br></p><p style="padding:0px;margin:0px"><span style="color:rgb(120,120,120);font-family:Arial,Helvetica,sans-serif;font-size:12px;line-height:18px">lecturer at Universitat Politècnica de Catalunya, BarcelonaTech (UPC)</span><br> </p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px">Research fellow at International Center for Numerical Methods in Engineering (CIMNE)</p> <p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"> <br></p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px">C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9</p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"> 08034 – Barcelona – Spain – <a href="http://www.cimne.com" target="_blank">www.cimne.com</a> - </p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"> <span>T</span>.<a value="+34934010794">(+34) 93 401 56 96</a> skype: <b>rougered4</b><br></p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:18px"> </p><p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(120,120,120);font-size:12px;line-height:11px"> <a href="http://www.cimne.com/" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/logo_cimne_negre.png" width="80" height="17" border="0"></a></p><p style="font-size:medium;font-family:'Times New Roman'"> <a href="https://www.facebook.com/cimne" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/facebook-logo.png" width="24" height="24" border="0"></a><a href="http://blog.cimne.com/" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/wordpress.png" width="24" height="24" border="0"></a><a href="http://vimeo.com/cimne" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/vimeo.png" width="24" height="24" border="0"></a><a href="http://www.youtube.com/user/CIMNEvideos" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/you-tube1.png" width="24" height="24" border="0"></a><a href="http://www.linkedin.com/company/cimne" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/linkedin-logo.png" width="24" height="24" border="0"></a><a href="https://twitter.com/cimne" target="_blank"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/twitter.png" width="24" height="24" border="0"></a></p> <p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(155,155,155);font-size:10px;line-height:14px">Les dades personals contingudes en aquest missatge són tractades amb la finalitat de mantenir el contacte professional entre CIMNE i voste. Podra exercir els drets d'accés, rectificació, cancel·lació i oposició, dirigint-se a <a href="mailto:cimne@cimne.upc.edu" target="_blank">cimne@cimne.upc.edu</a>. La utilització de la seva adreça de correu electronic per part de CIMNE queda subjecte a les disposicions de la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç Electronic.</p> <p style="padding:0px;margin:0px;font-family:Arial,Helvetica,sans-serif;color:rgb(4,155,2);font-size:10px;line-height:18px"><img src="https://web.cimne.upc.edu/groups/publicacions/signatura/image002.png" width="20" height="17"> Imprimiu aquest missatge, només si és estrictament necessari.</p> </div></div><span lang="ES"><a href="http://www.cimne.com/" target="_blank"><span style="font-size:10.0pt;font-family:"Arial","sans-serif""></span></a></span></div></div> </div>