Boost logo

Boost :

From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2008-05-12 12:05:25


Johan Torp <johan.torp_at_[hidden]> writes:

> Anthony Williams-3 wrote:
> I misunderstood - I thought set_wait_callback was Gaskill's proposed
> callback when a future was ready.

No problem.

> If I understand you correctly this use case is as follows;
> Each future correlates to an unfinished task.
> A. When a worker thread calls wait(), instead of/prior to blocking it might
> perform another task
> B. By detecting when client/non-worker threads are waiting we can use that
> information to serve them faster.

Yes.

> A seems very dangerous. The worker thread has a stack associated with the
> task it is carrying out. If the thread crashes or an exception is thrown,
> only the current task is affected. If it should carry out another task
> [task2] on top of the first task [task1], task2 crashing would destroy task1
> too. Also, we could get a problem with too deep stack nesting if the same
> thread starts working on more and more tasks.

If the thread "crashes", you've got a serious bug: all bets are off. It
doesn't matter whether that's the same thread that's performing another task
or not.

If the task throws an exception, I expect this to be swallowed by the
task-launching code, and get stored in the future corresponding to that
task. This therefore will not affect the original waiting task.

Stack overflow is potentially a real problem. If a thread pool implementation
decides to do such task nesting, it needs to ensure the stack is sufficiently
large to handle it. For example, if you're going to allow N nested tasks, the
stack for that pool thread ought to be N times larger than the stack for a
single task.

> B might be useful. It can't detect waiting by periodic is_ready-polling -
> which with todays interface is needed to wait for more than one future.

I would use timed_wait() calls when waiting for more than one future: doing a
busy-wait with is_ready just consumes CPU time which would be better spent
actually doing the work that will set the futures to ready, and timed_wait is
more expressive than sleep:

void wait_for_either(jss::unique_future<int>& a,jss::unique_future<int>& b)
{
    if(a.is_ready() || b.is_ready())
    {
        return true;
    }
    while(!a.timed_wait(boost::posix_time::milliseconds(1)) &&
          !b.timed_wait(boost::posix_time::milliseconds(1)));
}

This will trigger the callback.

> Rather than implicitly trying to guess client threads' needs wouldn't it be
> better to either:
> - Let the thread-pool be a predictable FIFO queue. Trust client code to do
> the scheduling and not submit too many tasks at the same time.

That's not appropriate for situations where a task on the pool can submit more
tasks to the same pool, as in my quicksort example.

> - Open up the pool interface and let users control some kind of
> prioritization or scheduling

Allowing callbacks doesn't preclude this option.

> I'm probably misunderstanding something here.

There are many ways of scheduling threads in a thread pool, not all of which
are suitable for all circumstances. Providing set_wait_callback gives the
writer of the thread pool flexibility to choose the most appropriate model for
the circumstances he is trying to handle.

> Anthony Williams-3 wrote:
>>
>> Waiting for one of a number of tasks is an important use case. I'm not
>> sure
>> how best to handle it. I've seen people talk about "future_or" and "f1 ||
>> f2",
>> but I'm not sure if that's definitely the way to go.
>>
>
> I think we should allow waiting on a dynamically changing number of futures.
> Operators are poorly suited for this because they require function
> recursion. Maybe some kind of future container with wait_all() and
> wait_any() functions. My biggest question is how this will map to
> condition_variables. Also, could this facility be some layer below futures
> which can be reused?

My wait_for_either above could easily be extended to a dynamic set, and to do
wait_for_both instead.

Condition variables are more complicated, since they don't inherently support
wait-for-any or wait-for-all operations, and every wake from a timed_wait call
removes the thread from the waitset for that cv. To make it work, you would
have to supply all the predicates associated with each cv, and all the
associated mutexes. I think I'd rather just use futures.

> Dependency deduction and lazy return value composition functionality - like
> operators - should probably be built on top of the dynamic waiting facility.

It would be relatively simple to build an operator on top of wait_for_either().

> Anthony Williams-3 wrote:
>>
>> However, even though distinct overloads will be called, this is not
>> necessarily desirable, as the semantics are distinct.
>>
>
> In what way are the semantics different?

timed_wait(duration) waits for a specified amount of time to
elapse. timed_wait(absolute_time) waits until the clock reads the specified
time. This can be important if you're using a cv in a loop:

boost::mutex m;
boost::condition_variable cv;
bool done=false;

template<typename Duration>
bool wait_with_background_processing(Duration d)
{
    boost::system_time timeout=boost::get_system_time()+d;
    boost::unique_lock<boost::mutex> lk(m);
    while(!done)
    {
        if(!cv.timed_wait(d)) // oops, meant timeout
        {
            return false;
        }
        lk.unlock();
        do_background_processing();
        lk.lock();
    }
}

This code will compile and run, but will wait up to the specified duration
every time round the loop rather than waiting only the specified duration in
total. By giving the duration and absolute-time overloads different names, you
can avoid this bug.

> Anthony Williams-3 wrote:
>>
>> The members of the LWG are discussing renaming
>> condition_variable::timed_wait to have distinct names
>> for the duration and absolute time overloads in order to ensure that the
>> user
>> has absolute clarity of intent: wait_for(absolute_time) or
>> wait_until(duration) won't compile.
>>
>
> To be extra clear, I'll repeat myself: This will complicate writing parallel
> algorithms which works generically with any time type. For both library
> vendors and users.

Generic code needs to know whether it's got a duration or an absolute time, so
this is not an issue, IMHO.

> My 5 cents is still that 2 x time_limited_wait is clear and readable enough
> but it's no strong opinion. For good or bad you are forcing users to supply
> their intent twice - by both argument type and method name. Is this a
> general strategy for the standard library?

This is an important strategy with condition variables, and it is probably
sensible to do the same elsewhere in the standard library for consistency.

Anthony

-- 
Anthony Williams            | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk