Boost logo

Boost Users :

From: Mike Marchywka (marchywka_at_[hidden])
Date: 2007-11-29 14:28:00


Sorry for top-posting without quotes but I think someone said something about
intel compiler applicable to this. When I said earlier that the Intel compiler
was as good as my hand coded assembler for wavelets, I think I was
talking about something I wrote as naive hand coded for-loops
and, IIRC, it was an in-place 2D transform. I'm not sure what that compiler
can do with extra copies and pretty sure it won't volunteer to overwrite your
operands :) This can be a big deal on big data sets when considering cache misses.
An, if you are doing multiple passes on same data, blocking ( do many levels on small
junk) can help too.

Also, I think JM said something about gcc not inlining very well.
I had to check as this was always the one thing I assumed a compiler could
do right. I did a quick test of my own code

 g++ -Wall -O0 -ggdb -S -o junk00g string_test.cpp

gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

versus -O3 and it appears, on a quick look, that most of the "calls" went
away in the code I care about ( although I haven't used gcc much in the past
and haven't looked at assembler in a while ).

> From: maikbeckmann_at_[hidden]
> To: boost-users_at_[hidden]
> Date: Thu, 29 Nov 2007 19:54:35 +0100
> Subject: Re: [Boost-users] C++ Performance
>
> Am Donnerstag 29 November 2007 10:34:04 schrieb nisha kannookadan:
>> Ok, I optimized my program (now its with pass by reference and the resize
>> stuff is out):
>>
>>
>> void Wavelet::ttrans(matrix& At, int level)
>> {
>> matrix cfe1, cfe2, cfo, cfe, c, d;
>> int N,s2;
>>
>> N = (At.size1()+1)/2;
>> s2 = At.size2();
>> scalar_matrix zer(N,s2);
>>
>> for (int ii = 1; ii <= level; ii++)
>> {
>>
>> cfo = subslice(At, 0,2,N, 0,1,s2);
>> cfe = subslice(At, 1,2,N-1, 0,1,s2);
> A subslice is lightweight handle for maybe a heavyweight matrix. This line
> cfo = subslice(At, 0,2,N, 0,1,s2);
> eliminates the performance gain, since cfo is a full flagged matrix.
>
> However, don't know if its allowed to apply a subrange to a subslice. Can you
> spend a full working example plus data? Its very hard to give tips on
> template libraries without the tips I get from of my compiler :)
>
>> c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5);
>>
>> zer.resize(N,s2,true);
>> cfe1 = zer;
>> cfe2 = zer;
>>
>> (subrange(cfe1, 0,N-1, 0,s2)).assign(cfe);
>> (subrange(cfe2, 1,N, 0,s2)).assign(cfe);
>> d = cfo-(cfe1+cfe2)*0.5;
>>
>> (subrange(At, 0,N-1, 0,At.size2())).assign(c);
>> (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
>>
>> N = N/2;
>> }
>>
>> cfe1.clear();
>> cfe2.clear();
>> cfo.clear();
>> cfe.clear();
>> c.clear();
>> d.clear();
>>
>> }
>>
>> But I guessed, its still not good enough, and wanted to work with pointer
>> to mend copying..and the result was the next code piece, which compiles,
>> but terminates when I run it..
>>
>> void Wavelet::ttrans(matrix& At, int level)
>> {
>> matrix cfe1, cfe2, *cfo, *cfe, *c, *d;
>
> PLEASE don't use ublas matrices as pointers! They are not made for this (no
> virtual destructors for performace reasons). If you want to avoid copying,
> allways use references.
>
>
> BTW: Theres a ublas mailing list
> - http://lists.boost.org/mailman/listinfo.cgi/ublas
> which is read by all ublas devs and power users. If someone knows how get the
> most performace out of your code, they do. And again, yu will get the most
> (useful) feedback if you provide a working examples which can be hacked.
>
> Best,
> -- Maik
>
>
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users

_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net