Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2010-10-12 19:03:55
Daniel Herring wrote:
> On Tue, 12 Oct 2010, joel falcou wrote:
>> On 12/10/10 20:32, Manjunath Kudlur wrote:
>>>> More on that, I'm eager to know how dynamically geenrating code
>>>> (in a string ? in a file ? as soem bytecode array ?) THEN
>>>> running a compiler then executing the resulting binary can beat
>>>> statically compiled code dealing with vector register.
>>>> Explain this to me and I convert my self.
>>> I looked at Intel's array building blocks and tried to understand
>>> how they were doing JIT. More here :
>>> Basically, they use a mix of operator overloading and some cleverly
>>> named macros to make C++ statements "generate" abstract syntax trees
>>> at run time, then JIT it to SSE supported by thread building blocks.
>> I can't see how different it is from ET and what are the benefit of
>> doing static code geenration at runtime...
>> I must be really dense but I can't see this winning any race ...
> Presuming a sufficiently advanced jit architecture,
> - static code generation
> - may not have access to custom opcodes or hardware on the user's
> - may not be able to produce all optimal variants
> - cannot change after delivery
> - can only guess the actual runtime code paths
> - a jit on the user's machine can support
> - local benchmarking and profiling
> - user-side optimization (including profile hints)
> - use an upgraded compiler
> - save compiled results
> - ...
> Basically, think of it as moving the compiler from the developer's
> box to
> the user's executable. Both are "the same compiler", but the user's
> copy presumably knows more about the actual input data and hardware
> Knowing when and how to invoke the JIT is an art (Java uses it too
> much, C/C++ too little), but there are certain patterns which are a
> clear win.
> For example, a jit can inline several functions in a user-selected
> computation pipeline (examples: DSP blocks or high/medium/low special
> effects options); this eliminates branches, temporaries, etc. (hence
> use in current GPU shader languages).
I want to underscore the similarity between shaders in graphics and vector kernels in HPC applications. As we offload to GPU the distincition gets even more fuzzy. When your games load it tells you its compiling the shaders. That's a pretty crummy user experience in and of itself, but we don't quibble because we are happy with the user experience that comes immediately after. In theory you could pre-compile shaders for every variant of graphics hardware out there and ship them with the game and then patch it as new hardware comes out. In practice we are happier to wait for the shaders to compile when we launch the game or get to the next level. It isn't realistic to expect every user to compile the application for their platform, nor it is reasonable to expect every app vendor to provide their source code to the customer to recompile.
> The LLVM project supports turning the full compiler into a JIT (hint:
> there is no essential need for on-disk object files). With the Clang
> frontend, it should even be possible to compile Boost code into a
> running application...
I know a lot of people who are excited about LLVM + Clang and JIT comes up a lot since LLVM is light weight and fast. It would be nice to have an open source (and free) C++ interpreter. We could use LLVM for things like auto-vectorization JIT in a snap. People get it.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk