Boost logo

Boost :

From: jsiek_at_[hidden]
Date: 2000-01-29 02:44:14

Greg Colvin writes:
> This looks much better, and may indeed be much faster. I'll leave it
> to Mark to tweak his code to get this expansion.
> I still count 6 loads/stores versus 12 loads/stores.

But those 12 loads/stores are to the stack instead of the heap, and
 in many situations they will go away via register allocation. This is
what happen's in Mark's example code. Also, alias analysis handles
stack based objects very well, while with most compilers stuff on the
heap is a complete mystery. This makes it difficult for the compiler
to optimize the instruction around the loads and stores. For instance,
with Mark's example, the SGI compiler doesn't attempt any loop
optimizations in the shared ptr case, while for the linked ptr it does
some (though the presence of a possible exception prevents it from
unrolling). The end result, is that on an Origin2000 a tight loop that
just copies pointers (Mark's example) runs 6X faster with linked over

Here's the summary of the loop for linked_ptr.

 #<loop> Loop body line 33, nesting depth: 1, estimated iterations: 100
 #<loop> Not unrolled: in exception region or handler
 #<sched> Loop schedule length: 45 cycles (ignoring nested loops)
 #<sched> 31 mem refs ( 68% of peak)
 #<sched> 20 integer ops ( 22% of peak)
 #<sched> 49 instructions ( 27% of peak)
 #<freq> BB:13 frequency = 92.79869 (heuristic)
 #<freq> BB:13 => BB:14 probability = 0.81116
 #<freq> BB:13 => BB:15 probability = 0.18884

For shared pointer the compiler didn't print a loop summary because it
didn't treat the loop as a "loop", and didn't schedule the
instructions accordingly. If you look at the assembly code
there's a huge difference.



 Jeremy Siek
 Ph.D. Candidate email: jsiek_at_[hidden]
 Univ. of Notre Dame work phone: (650) 933-8724
 and cell phone: (415) 377-5814
 C++ Library & Compiler Group fax: (650) 932-0127
 SGI www:

Boost list run by bdawes at, gregod at, cpdaniel at, john at