|
Boost : |
From: Gregory Colvin (gregory.colvin_at_[hidden])
Date: 2003-05-08 10:32:42
My experience tuning our java VM is similar, and it runs on a lot of
different CPUs. Still, there is reason to be suspicious of very small
changes, which might be repeatable for our benchmark set, yet have no
real meaning for normal use. And there is reason to be careful not to
waste time pursuing 3% tweaks instead of going for 100% breakthroughs.
On Thursday, May 8, 2003, at 09:11 America/Denver, Darin Adler wrote:
> On Thursday, May 8, 2003, at 07:04 AM, Beman Dawes wrote:
>
>> A 2-3% timing difference probably isn't reliably repeatable in real
>> code.
>>
>> How code and data happens to land in hardware caches can easily swamp
>> out such a small difference. The version-to-version or step-to-step
>> differences in CPU's, memory, compilers, or operating systems can
>> cause that much difference in a given program. Differences need to
>> get up into the 20-30% range before they are likely to be reliably
>> repeatable across different systems.
>>
>> At least that's been my experience.
>
> That has not been my recent experience. While working on my current
> project (the Safari web browser), we have routinely made 1% speedups
> that are measurable and have an effect across multiple machines and
> compilers (same basic CPU type and operating system), and we have also
> detected 1% slowdowns when we inadvertently introduced them.
>
> They add up. Ten 1% speedups result in a 9.5% speedup.
>
> It's true that differences in CPUs, memory, compilers, and operating
> systems can cause huge differences, but that does not mean that
> changes that make such small increases in performance are therefore
> not worthwhile.
>
> In our project, a 3% speed increase is considered a cause for
> celebration.
>
> I'm not sure, though, if this negates your point, Beman. Something
> that gives a 2-3% speedup for one Boost user might not be worth any
> level of obfuscation unless we can prove it provides a similar speedup
> for other Boost uses.
>
> -- Darin
>
> PS: On the occasions where you can fix an algorithm in a way that
> gives a 10x speed increase, or a 25% one, that's even more exciting.
> To give you an idea of what I'm talking about, here's a log from a
> week I spent increasing the speed of our JavaScript library:
>
> Monday, November 18, 2002
> - sped up JavaScript iBench by 70% by using a better sort algorithm
> and reducing the number of UString allocations
> - sped up JavaScript iBench by 6% by turning ExecState into a simple
> object instead of a two level abstraction
> - sped up JavaScript iBench by 7% by turning the property map into a
> hash table and improving String instance handling
> - sped up JavaScript iBench by 6% by hoisting the property map into
> ObjectImp and doing less ref/unref
> - sped up JavaScript iBench by 1.5% by converting integers into
> strings with custom code rather than sprintf
>
> Tuesday, November 19, 2002
> - sped up JavaScript iBench by 2% by using masking instead of modulus
> in the property map hash table code
> - sped up JavaScript iBench by 3% by improving the implementation of
> the "perfect hashing" hash tables used for static properties
> - sped up JavaScript iBench by 1.5% by storing computed hash values in
> the UString representation so we don't recompute them
> - sped up JavaScript iBench by 6.5% by atomizing property identifiers
> - sped up JavaScript iBench by 1.5% by not clearing and rebuilding the
> list each time during sorting
>
> Wednesday, November 20, 2002
> - sped up JavaScript iBench by 5% by decreasing the amount of
> ref/deref done by changing interfaces so they can deal directly with
> ValueImp
> - sped up JavaScript iBench by 1% by making lists ref/unref less
> - sped up JavaScript iBench by 7.5% by creating argument list objects
> only on demand rather than for each function call
>
> Thursday, November 21, 2002
> - sped up JavaScript iBench by 3% by allocating the ActivationImp
> objects on the stack rather than in the garbage-collected heap
> - sped up JavaScript iBench by 11% by turning the scope chain into a
> singly-linked list that shares tails rather than a non-sharing
> doubly-linked list with subtly different semantics
>
> Friday, November 22, 2002
> - sped up JavaScript iBench by 10% by changing the List class from a
> linked list to a vector, and using a pool of instances
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk