From: Roland (roland.schwarz_at_[hidden])
Date: 2004-08-01 23:30:14
On Sun, 01 Aug 2004 21:28:30 -0500 "Aaron W. LaFramboise" <aaronrabiddog51_at_[hidden]> wrote:
> The trouble is that we actually want the callback field to point to
> ___xl_a + 4, not at ___xl_a itself, which is zero. The tlssup.obj that
> is part of MSVC6's runtime libraries gets this wrong, and so the TLS
> callback list pointed to by the TLS directory looks like this:
> [null pointer][user-specified callback][null pointer]
> In other words, whatever bit of the PE loader responsible for calling
> the TLS callbacks hits that first null, thinks (correctly) that it is
> the end of the list, and never calls any of the callbacks.
Unfortunately this seems to be not enough. Even when the first entry
is not zero (tried to set it to dummy stub) my callback allocated via
.CRT$XLB does not get called, there are still lot of zeroes in between.
It seems as if the linker has a minimum size when emiting data segements.
> In any case, the runtime fixup you mention appears to fix this, although
> it might be doing more work that it needs to (you just need to replace
> that first zero with something valid). I must admit I am slightly
> concerned about modifying an PE image at runtime to make it correct, for
> the same reason I am concerned with hooking in production code.
Hmm. Do I really modify the PE image? I am just modifying data that lies in the
data segment. Isn't this ok? What are your concerns?
And, then: even Microsoft relies on modifying in memory read only segments.
I learned this at some time when I was interested in those "first level exceptions"
that you can see in the debugger at times, but never get through to your program.
They are used to create a copy of the RO memory on the fly, so that it can be
written. And yes this has been seen in production code.
But again I don't think we are doing something simmilar ugly here.
What I would be concerned of is, that someone else already
has taken reference of an item I am moving away. This however
could be easily solved by providing a non null nop-callback, in
place of the zeroes.
But thread shutdown would last longer of course.
> It seems a little hackish, and it seems like it might cause suprising
> behavior. The alternative is to provide an implementation of tlssup.obj
> that isn't broken, but this is also slightly hackish (although it does
> at least produce an image that is correct with no runtime fixups needed).
> I was hoping there might be some sort of way to tweak something or other
> to make the real MSVC6 tlssup.obj behave correctly, but there does not
> seem to be any way other than doing some sort of runtime fixup, or
> flat-out replacing the whole object.
Reading your post again I am not anymore sure if the linker really is
doing the expected thing here. As it is for now, holes of zeroes are
normal, but PE directory requires contiguous behaviour you say?
The .CRT$XIC startup code fixes the very same problem for the startup code
the following way:
while ( pfa < pfz )
if ( *pfa != 0 )
Doing the same in the tlssup code replacement would do a lot of
unnecessary looping on every thread startup/shutdown. I cannot
believe that this is intended (or desirable).
> In any case, no sort of runtime fixup should be done on anything other
> than MSVC6, since later versions seem to get it right.
Agreed. This callback never was used in MSVC6 so it is a hack however
you view it. Obviously we are the first to use it ever.
> Also, on a unrelated point, is there any reason to use the .CRT$XC
> section directly rather than use a global class? They're really the
> same thing, but the entire .CRT section is undocumented, and not very
> well known. It seems unnecessary to depend upon that interface if there
> is no particular gain from using it over the well-defined interface.
Yes. We need to run after the last global c-tor has finished to be sure that
all our thread_specific_ptr ctors have been called. This is because I rely on
the well documented behaviour of the atexit function that I use at this time
to schedule the main-thread exit. This in turn will cause that it will be run before
any of the global dtors (e.g. thread_specific_ptr) get to live. This is to solve for
the wrong dtor ordering problem.
If you know an other well documented way how to achieve this I would prefer
it of course. BTW: global ctor execution order (as to my knowledge) is not
specified by the C++ standard.
And then I think the method used is by no way less "documented" and "reliable"
than the TLS callback. (Did you check out my link to godeguru?)
And then, CRT relies on it to such an extent, that it is unlikely to change (without
beeing replaced by a more capable means).
To summarize: I think you have found the second least hackish solution of all using
TLS callback. My original proposal (using the piggy pack DLL) uses _only_
documented API's for implementation. But it looks ugly.
I would vote for using TLS-callback, but commenting out the fixup-code when
used for MSVC7.
And then: should there be any unforeseeable problems in the future we always
can revert back to the piggy-pack-DLL solution without (noticeable) effect for the
user of the library.
What do you think?