Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: Daniel Herring (dherring_at_[hidden])
Date: 2010-10-12 17:34:32
On Tue, 12 Oct 2010, joel falcou wrote:
> On 12/10/10 20:32, Manjunath Kudlur wrote:
>>> More on that, I'm eager to know how dynamically geenrating code (in a string
>>> ? in a file ? as soem bytecode array ?) THEN
>>> running a compiler then executing the resulting binary can beat statically
>>> compiled code dealing with vector register.
>>> Explain this to me and I convert my self.
>> I looked at Intel's array building blocks and tried to understand how
>> they were doing JIT. More here :
>> Basically, they use a mix of operator overloading and some cleverly
>> named macros to make C++ statements "generate" abstract syntax trees
>> at run time, then JIT it to SSE supported by thread building blocks.
> I can't see how different it is from ET and what are the benefit of
> doing static code geenration at runtime...
> I must be really dense but I can't see this winning any race ...
Presuming a sufficiently advanced jit architecture,
- static code generation
- may not have access to custom opcodes or hardware on the user's computer
- may not be able to produce all optimal variants
- cannot change after delivery
- can only guess the actual runtime code paths
- a jit on the user's machine can support
- local benchmarking and profiling
- user-side optimization (including profile hints)
- use an upgraded compiler
- save compiled results
Basically, think of it as moving the compiler from the developer's box to
the user's executable. Both are "the same compiler", but the user's copy
presumably knows more about the actual input data and hardware available.
Knowing when and how to invoke the JIT is an art (Java uses it too much,
C/C++ too little), but there are certain patterns which are a clear win.
For example, a jit can inline several functions in a user-selected
computation pipeline (examples: DSP blocks or high/medium/low special
effects options); this eliminates branches, temporaries, etc. (hence its
use in current GPU shader languages).
The LLVM project supports turning the full compiler into a JIT (hint:
there is no essential need for on-disk object files). With the Clang
frontend, it should even be possible to compile Boost code into a running
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk