-----Original Message-----
From: "Brian Budge" [brian.budge@gmail.com]
Date: 18/11/2010 03:20 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions

>> The large calculation that I currently do serially and that I intend to
>> parallelize is the maximum of the return values of a large number of
>> evaluations of a given "function" in the mathematical sense.
>> The number of arguments of the function is only known at runtime.
>> Let's say it is determined at runtime that the number of arguments is 10, ie
>> we have 10 arguments x0, ..., x9
>> Each argument can take a different number of values, for e.g. x0 can be
>> x0_0, x0_1 .... x0_n0
>> x1 can be x1_0, x1_1, ...x1_n1 and so on...n0 and n1 are typically known at
>> runtime and different
>>
>> so serially, I run
>> f(x0_0, x1_0, ..., x9_0)
>> f(x0_0, x1_0, ..., x9_1)
>> ...
>> f(x0_0, x1_0, ..., x9_n9)
>> then with all the x8 then all the x7 ... then all the x0.
>> There is n0*n1*...*n9 runs
>> Then I get the maximum of the return values.
>>
>>
>> Imagining I have N mpi processes, ideally each process would run
>> n0*n1*...*n9/N function evaluations.
>> How do I split?
>>
>> In terms of current implementation, each of the x is a boost::variant over 4
>> types:
>> a double, a <min,max> pair, a <min, max, increment> triplet or a
>> vector<double>
>> A visitor is applied recursively to the variants in order to traverse the
>> whole parameter space.
>> apply_visitor on x0 => say if x0 is a triplet, then
>> for (x0 from min to max with increment)
>>   apply_visitor on x1
>> and so on until x9, then we actually call the f function with all the
>> arguments collected so far.
>>
>> How can one parallelize such a beast?
>>
>> rds,
>>
>>
>> _______________________________________________
>This partly depends on how many processors/machines you have
>available. You need to find a way of partitioning your state space
>into tasks, and then doling those tasks out to processes/threads. How
>expensive is f()? How much memory is used?
>Brian

f() takes about 1ms. I have a couple of billion n-tuples in the parameter space for now but thinking of reaching easily a couple of trillions.
In the serial mode, only about 20Mbytes of RAM is used. The parameter space is not generated ahead. It looks like a tree.
The tree is generated, when I reach a leaf, I evaluate f(), and compare the return to the current maximum.
So I'm not storing much.

rds,