Boost logo

Boost :

From: Philippe A. Bouchard (philippeb_at_[hidden])
Date: 2003-11-19 16:57:51


Hurd, Matthew wrote:

[...]

> Yep, in the past Intel has seemed to have advantage there.
>
> W.r.t. interlocked increment, I don't know why it is faster on AMD,
> but
> I suspect it has something to do with flushing the pipeline perhaps
> and
> the P4 has a long one. FWIW, I've profiled the P3 to beat a P4 on
> interlocked increment at around a third the clock ;-)

This is funny, I knew the P4 compensated its slow pipeline execution &
"flexibility" by overclocking its bus but I was not aware of those
bottlenecks. It is always interesting.

> At least it is something to take into account when you see a
> benchmark.
> You might see a 200% slow down between relative speeds on different
> architectures just due to this kind of factor.

Exactly, benchmarks are pretty much the bottom line.

>> The interesting part is to speed up existing code with minimal
>> dependencies,
>> knowing it cannot be faster. I am looking forward for eventual
> benchmarks
>> against Hans-Boehm's garbage collectors ;) To be honest with you,
> working
>> on this full-time... I would implement my "real-time" collector to
> get
>> rid
>> instantaneously of cyclic references and add atomic reference count
>> operations immediately.
>
> Yep, a reference without the need for a lock would be a big win. Not
> sure how you would make this safe in the garbage collector though as a
> mark sweep could cause a pause. Perhaps there is an optimistic
> technique, such as a timestamp/cyclestamp, than can rollback on
> conflict, perhaps over two phases. I'll have to think some more about
> such optimism, perhaps you could use such an approach for updating the
> reference count with a roll-back and try again semantics... Hmmm.

Again, there are many solutions but I personnaly believe destruction of
cyclic objects should be done intantaneously at the cost of some immediate
speed (as compared to a garbage collector). What I was thinking of implied
1 more pointer into the smart pointer increasing its size by sizeof(void
*)... (sizeof(new_shared_ptr) would be == 3; sizeof(new_shifted_ptr) == 2 ==
sizeof(actual_shared_ptr)). 2 counters would therefore be necessary but the
overall benefits are more important. If there is interest I could figure
out something because it needs portable low-level solutions...

Emotionnal turbulence is sometimes useful as for those discussions, not
everybody is working full-time on advanced C++... to respond to what Matt.
was saying... ;)

Regards,

Philippe


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk