Boost logo

Boost :

From: Rene Rivera (grafik666_at_[hidden])
Date: 2002-07-31 11:00:00


[2002-07-31] Anatoliy Kuznetsov wrote:

>
>>
>> As Anatoliy pointed out it's only faster when
>> extended to more bits in
>> parallel. But I've found that it's faster in the 32
>> bit case and up.
>
>
>Hmm, I need to reevaluate my test. Thanks for pointing
>out.
>
>
>> The speedup is mostly due to keeping the computation
>> in the ALU, as opposed
>> to doing memory access.
>
>
>I got an impression that in my case all table goes
>into the CPU cache which makes memory latency not a
>big problem.

True, if you are using small tables, and if you don't need that same chache
space for other things. But I usually end up trying to free up as much cache
space as possible to make room for algorithms where using tables gives you a
100x speedup. I'd rather have the 100x speedup and worsen the speedups of
10x or less.

>> But if you are forced to do it a byte at a time, the
>> table would be faster.
>
>
>Also there is a problem of misalignment. If bitset is
>character based it is not always safe to adress it as
>int*. On non-Intel architectures it can cause a BUS
>error.
>
>I got a nice crash on Sparc Solaris when tryed to
>access stack allocated array of ints as an array of
>64-bit longs. The same trick always worked on SGI.

One big problem with using byte access is that in some CPUs the access cause
pipeline hits. For example in PPC byte access is optimized to be equivalent
to regular word access. But in x86 byte access is much slower than word
access. I ran into this while writting an alpha blending routine, I ended up
writting a switch statement that was more cases and more code (6x the code)
that read word at a time. Just the reading alone was 16x faster this way. If
you look at implementations of memcpy you'll see the same thing done...
doing aligned whole word operations as much as possible.

-- grafik - Don't Assume Anything
-- rrivera_at_[hidden] - grafik_at_[hidden]
-- 102708583_at_icq - Grafik666_at_AIM - Grafik_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk