|
Boost Users : |
From: Mike Marchywka (marchywka_at_[hidden])
Date: 2008-08-25 07:45:11
> Date: Sun, 24 Aug 2008 21:56:49 -0400
> From: me22.ca+boost_at_[hidden]
> To: boost-users_at_[hidden]
> Subject: Re: [Boost-users] [Thread] Beginner question regarding thread groups
>
> On Sun, Aug 24, 2008 at 18:58, Michel Lestrade
> wrote:
>>
>> I am considering rewriting part of our ray tracing code to use the Boost
>> thread library. As many of you might know, ray tracing is a task that lends
>> itself well to a parallel approach since each ray is independent from the
>> other. However, the code right now is in Fortran which doesn't support task
>> level parallelism easily ... The thread_group class seems ideally suited to
>> my needs (one thread == one ray) but there is at least one problem I
>> envision that I would like to solve before embarking on this fairly
>> time-consuming rewrite.
>>
>
> It's worth pointing out that, as I recall, you don't really want more
> continually-active threads than, say, twice your number of cores. It
> would be nice to run one thread per ray, but the switching and OS
> management overhead will kill you if you attempt to run thousands of
> threads at once, which I assume you'd need with 1:1. (Erlang, for
> example, which is conceptually based on many, many communicating
> processes, uses its own implementation, rather than OS threads of
> processes.)
>
> I think you'd be much better off creating a "ray queue" of some sort,
If you want some ideas for reference, I did a quick check on the intel
site and they have some papers and apparently even an openMP Fortran
compiler if that helps you,
[ obviously their comments are specific to their products but quite generally useful
esp if you are looking for ideas ]
http://cache-www.intel.com/cd/00/00/21/92/219292_hyperthreading_extract.pdf
( http://www.google.com/search?hl=en&q=site%3Aintel.com+openmp+performance+optimization )
I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983
but I have seen several posts recently
on threading as a solution to everything and would like to point out that there
may be better approaches that yield greater improvements. Have you
tried to "think locally, act globally?" That is, consider ways of organizing your
approach to increase various types of locality that minimize cache thrashing?
While you say that rays are independent, if you do classical physical optics,
nearby rays tend to have similar trajectories etc. Rather than let an ignorant
but fair thread scheduler decide what piece of memory to access next, if you
are cache aware, you could even consider something like sorting the rays
to get the best locality and making them dependent with a transform
scheme that recognizes they are similar if nearby etc.
Random access memory is random access but if you look at the overall architecture
you can take a big performance hit for not keeping stuff sequential.
Again, I'm not sure any of this helps you here and if you have a lot of processors
then maybe threading is the easiest solution for you but I still don't
see a lot of discussion on memory access optimization which becomes
a big limitation in many cases.
got a few hits here FWIW,
http://citeseerx.ist.psu.edu/search?q=%22ray+tracing%22+AND++locality+AND+cache&sort=rel
Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka_at_[hidden]
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note: hotmail is getting cumbersom, try also marchywka_at_[hidden]
_________________________________________________________________
Be the filmmaker you always wanted to belearn how to burn a DVD with Windows®.
http://clk.atdmt.com/MRT/go/108588797/direct/01/
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net