22 October, 2012

How to get x64 dynamic function table critical section?

There is the standard way to walk on stack under Windows x64: RtlLookupFunctionEntry() and RtlVirtualUnwind() API calls. These methods use additional information to unwind stack: static and dynamic function tables. Static tables are generated by compilers and stored insight DLLs. Dynamic ones generated by application (CLR in fact is part of application) for dynamically created methods.

As you already understand there is the one small bad thing when your application has to walk on the stack asynchronously - looks at the stack of another thread. You have to suspend another thread before start walking, but the thread can be suspended insight the another stack walker which also use RtlLookupFunctionEntry() and RtlVirtualUnwind(). Both API calls use the critical section to protect dynamic tables. You can say it's incredible situation. However, it will be the reality when you enable frequent ETW CLR events with StackKeyword. I got deadlock with 99% probability on my sample.

How to avoid the deadlock? In fact it's easy - just check critical section to ability to enter. I must say that Windows 8 uses the slim reader/writer lock instead of ordinary one.

BOOL CanUseStackWalkerCS(LPCRITICAL_SECTION const pDTFCS)
{
 BOOL const res = TryEnterCriticalSection(pDTFCS);
 if (res)
  LeaveCriticalSection(pDFTCS);
 return res;
}

BOOL CanUseStackWalkerSRWL(LPCRITICAL_SECTION const pDTFSRWL)
{
 BOOL const res = TryAcquireSRWLockShared(pDTFSRWL);
 if (res)
  ReleaseSRWLockShared(pDFTSRWL);
 return res;
}
But how to get the dynamic function table critical section?

First, you need to hook EnterCriticalSection() and AcquireSRWLockExclusive() API calls to do it. It's dangerous operation. However, there're the bunch of libraries which can do it instead of you (for example, Microsoft Detours - just $9999.95). It isn't hard to write it by itself, but you will need to write simple dis-assembler to detect branches and instruction sizes. But this is out of this article. Hooks to catch critical sections:

static DWORD volatile g_OsTid = 0;

static LPCRITICAL_SECTION g_pCS = NULL;
static bool g_FailedCS = false;

static PSRWLOCK g_pSRWL = NULL;
static bool g_FailedSRWL = false;

static void Hook_EnterCriticalSection(LPCRITICAL_SECTION const pCS)
{
 if (g_OsTid == GetCurrentThreadId())
  if (g_pCS == NULL)
   g_pCS = pCS;
  else if (g_pCS != pCS)
   g_FailedCS = true;
}

static void Hook_AcquireSRWLockExclusive(PSRWLOCK const pSRWL)
{
 if (g_OsTid == GetCurrentThreadId())
  if (g_pSRWL == NULL)
   g_pSRWL = pSRWL;
  else if (g_pSRWL != pSRWL)
   g_FailedSRWL = true;
}

static void StartUsingHooks()
{
 g_pCS = NULL;
 g_FailedCS = false;

 g_pSRWL = NULL;
 g_FailedSRWL = false;

 g_OsTid = GetCurrentThreadId();
}

static void StopUsingHooks()
{
 g_OsTid = 0;
}
Call StartUsingHooks() immediately before start catching and StopUsingHooks() just after finish catching.

Second, just call RtlAddFunctionTable() with fake arguments:

RUNTIME_FUNCTION rf;

memset(&rf, 0, sizeof(rf));
rf.BeginAddress = 0x100;
rf.EndAddress = 0x101;

StartUsingHooks();
BOOL const res = RtlAddFunctionTable(&rf, 1, 0x100);
StopUsingHooks();

if (!res)
 throw std::exception("Can't add to the DFT");

if (!RtlDeleteFunctionTable(&rf))
 throw std::exception("Can't remove from the DFT");

if (g_FailedCS)
 throw std::exception("EnterCriticalSection hook doesn't work");

if (g_FailedSRWL)
 throw std::exception("AcquireSRWLockExclusive hook doesn't work");

if (g_pCS == NULL && g_pSRWL == NULL)
 throw std::exception("No DFT pointers were excavated");
As a result of the code above g_pCS or g_pSRWL contains pointer to the dynamic function table critical section. Congratulations!

11 comments:

cadude said...
This comment has been removed by the author.
Unknown said...

I tried to use StackWalk64(), but I really had some trouble with it. It worked strange when I don't have PDB.

Base is here, but I have some improvements:
0. suspend thread
1. check dynamic function table critical section to ability to enter
2. get CONTEXT for suspended thread
3. call GetFunctionFromIP(), goto #7 if it's success and returns non-zero FunctionID
4. call RtlLookupFunctionEntry()
5. call RtlVirtualUnwind() if #4 returns non NULL or pop RIP from stack if NULL
6. check and update RSP + RIP in CONTEXT, goto #3
7. call DoStackSnapshot()

There are some critical bugs in CLR v2.0 x64 and CLR v4.0 x64 described here. However, the article in Russian now. I'm going to translate it when I will have time for it. Now, you can use Google Translate...

cadude said...
This comment has been removed by the author.
Unknown said...

This is the known issue of CLR v4.0 x64. It calls clr!EEGetThreadContext() with CONTEXT_EXCEPTION_REQUEST by itself and checks that there the CONTEXT_SERVICE_ACTIVE or/and CONTEXT_EXCEPTION_ACTIVE in ContextFlags of CONTEXT.

There is the only one way to fix it: the patch.

Unknown said...

Remove RtlZeroMemory() and NvContext. You should use context in RtlVirtualUnwind(). Please add strong RSP and RIP check after unwinding.

You should stop walking after DoStackSnapshot() call anyway.

P.S. You need to fix CLR before or use CLR v2.0 x64 for debug.

cadude said...

regarding checking critical secion

i found this, is this doable, what are you using ?

http://www.tech-archive.net/Archive/Development/microsoft.public.win32.programmer.kernel/2009-11/msg00020.html

Unknown said...

I don't use the loader critical section. I use the dynamic function table critical section.

cadude said...
This comment has been removed by the author.
Unknown said...

I mean that new RIP/RSP values point to the valid memory. RSP is aligned in the right way and in valid range.

cadude said...
This comment has been removed by the author.
cadude said...
This comment has been removed by the author.