From: Peter Schmitteckert (peter_at_[hidden])
Date: 2007-10-07 14:48:33
Riccardo Rossi wrote:
> Hi peter, could you expand on the NUMA stuff?
Well, on a Numa machine you have (fast) local memory, and (slower) remote memory,
i.e. it has to go through some extra chips. On an opteron it usually means that
the memory is connected to the memory controller of another processor.
Wenn you allocote memory via malloc/new then, at least on a decent OS,
not much will happen. The actual memory allocation is deferred until the first access
to the memory.
So, if you allocate memory in the main code and access it there, then it will be placed to
the processor the main thread is running. If you just allocate it in the main thread,
and first use it in a worker thread, then it will be allocated in the local memory of
the worker thread.
> i have been doing tests with a compressed matrix in parallel and it DOES
> make a huge difference the matrix vector prod on NUMA systems (opteron
> sunfire) depending on how you initialize the matrix.
Concerning the memory allocation SunOS/Solaris qualifies as a decent OS.
> do you have any proposal on how to cleanly initialize things in parallel?
Defer the first access to the thread that is using the memory.