Boost logo

Boost :

From: William Kempf (williamkempf_at_[hidden])
Date: 2001-09-18 09:18:14

From: Jens Maurer <Jens.Maurer_at_[hidden]>
>William Kempf wrote:
>> >While developing a work-around, I wondered why call_once() doesn't
>> >take a function object? (This is more a request for adding rationale
>> >than for changing the implementation.)
>>Because you'd have a race condition. You have to construct a function
>>object, and if you could do that in a thread safe manner there'd be no
>>for call_once() ;).
>I may be able to construct the function object more than once from
>threads, but still I may want the target function to be called only once.

You know, I didn't think about that. This makes it a little less safe, the
user would have to fully understand that the function object is subject to
race conditions if it or any of it's data is shared between threads, but it
would add utility when used properly. However, this does not solve the
issue at hand, and adds some significant complexity to the implementation.

>Anyway, that "once" business seems to be mostly superfluous anyway, because
>you can consider your "once" function a monitor, set a flag upon first
>and skip the function on subsequent entries -> one mutex per "once"
>function and a "flag" and you're done.

That's one possible implementation, but it's also the least efficient
implementation. Locking a mutex is a relatively expensive operation and you
can usually do away with the need to lock the mutex once the routine has
been run, ala the DCL pattern. This pattern isn't safe for general use
because of memory visibility issues, but implementations can take advantage
of the fact that a platform has no issues with this, as I've done for Win32,
or can make use of memory barriers to insure proper memory visibility in a
platform specific manner. Combine this with the plumbing required to
implement this simple operation and the existence of call_once() is no
longer "mostly superflous". There's a reason it exists in POSIX and the
same reason applies to Boost.Threads.

>>An interesting solution. I can clean this up some. There's no need to
>>pthread_once to init the mutex since POSIX defines static initialization
>While static initialization shows up in my Linux "man" page, I'm not
>sure whether this is a universal POSIX feature. But I assume you
>have the specs available and checked this.

The book "Pthreads Programming" indicates that Draft 4 of the POSIX standard
did not contain the static mutex initialization concept but that the final
standard does. It may be possible there are some systems that don't have
fully compliant implementations, but I think you should assume they all do
until proven otherwise. However, I can't use this implementation because it
uses the DCL which can't be coded portably. I could use the monitor
approach instead, but that would be ineficient. I've come up with an
alternative design that at least in theory should be as optimal as I can
make it with out resorting to non-portable code, but timings may show that
in reality it's not more efficient. This is one that may need to be
addressed several times in the future before we determine an optimal
solution, but for now I'll settle for a correct solution.

>>Probably, but I can't see a clean method of doing so. Probably just
>>"writers block" but everything I can come up with uses dynamic memory and
>>requires a lot more overhead for all operations. If I have to go that
>>route I guess I have to, but I was hoping there was a solution I wasn't
>>thinking of.
>I don't see an obvious solution without dynamic memory allocation,
>either: You need to call the proper C++ cleanup handler for each
>type T, and this must be routed through an "extern C" wrapper.
>Either you have lots of wrappers (one for each type T), which you
>cannot find distinct names for (because they're all "extern C"), or you
>only have one wrapper and then need to demultiplex into the proper
>C++ cleanup handler. Which means you have to store its address
>somewhere. However, that address isn't thread-specific, it's
>actually global (it's a function address), so it's not really
>appropriate to store it in the TSS area.
>Could you use a separate, global <map> to map each data pointer
>(each of the pointers stored via "set") to its cleanup
>function pointer? This incurs only overhead on "set" (and
>cleanup), but not on "get", unlike other approaches. I'd
>expect "set" to be rare compared to "get".
>(Of course, access to the <map> needs to be synchronized, or
>needs to be in thread-local storage, similar to the "Windows"

If we follow the Win32 solution here then everything is solved. That's
actually the solution I referred to when I talked about the overhead,
though. This implementation would have significantly more overhead then the
native TSS solutions and is likely to be criticized by POSIX users. It will
be unfortunate if we have to use such a heavy handed solution just because
of linkage compatibility issues, but if no one has another solution this is
likely to be the solution I have to use.

>(Btw, please mark the "tss" constructor "explicit". Also, you don't
>need the default argument "=0" of the tss constructor, it appears.)

Thanks for pointing these out.

Bill Kempf

Get your FREE download of MSN Explorer at

Boost list run by bdawes at, gregod at, cpdaniel at, john at