Recently I was asked by a colleague to explain (and code) process injection and function hooking from scratch.
I thought it was a nice opportunity to share with an audience - not only how those things work, but how I conceptually approach such a task.
We will also touch in various interesting security topics, such as:
- ASLR on Windows
- Process mitigations against RWX pages
- Known DLLs
- Multithreaded environments
- Secure C coding styles
The task is simple - code a .dll
file that could be injected into an arbitrary process. Upon injection, it will hook a function in a certain way.
Well, what does it mean? Let's break this down:
- A
dll
is a dynamic library - it contains code or data that might be used by anyone loading it. To load a DLL, use the LoadLibrary API (there areLoadLibraryA
andLoadLibraryW
for ANSI and Unicode paths, respectively). There are also other ways to load DLLs, but we will focus today on the simpleLoadLibraryA
API. Code injection
is the act of running arbitrary code in a different address space under a chosen process. There are numerous reasons to perform code injection - some of them are malicious (like hiding from security tools or elevating privileges) and some are legitimate (for example, this is how debuggers work). Code injection for the purpose of hooking is a bit of gray area - it can be used for both malicious purposes or for good (like monitoring tools such as API monitor).Hooking
is the act of making some code behave differently than expected (commonly API calls). On Windows, there are numerous methods of achieving that - IAT hooking is a common practice, but in our case we will be performing inline hooking. Breaking down this task, let's decide on the sub-tasks:
- We will need to have some code that finds the right process ID from the process name.
- We need to inject our DLL into a target process ID.
- Once injected, we need to find the right function to hook and install a hook.
- As a bonus point, our hook might want to invoke the original API call. Since I'm lazy, I've decided to create one DLL for everything - if it's loaded to the right process it will perform the hooking (steps 3 and 4), otherwise it will attempt to inject (steps 1 and 2). With that in mind, let's get going!
If you come from a Linux background, mapping a process name to a process ID usually requires you to traverse a directory (/proc
) and read files. On Windows, there are APIs to perform that, and specifically, the Process32First and Process32Next APIs.
Before diving into the code, I feel that I have to justify my coding style:
- I prefer
C
toC++
. I feel quite strongly about this, almost as strongly as Linus Torvalds feels about that. - On Windows, there are usually two sets of APIs - one ending with an
A
for ANSI strings, and one withW
for Unicode. I prefer to go for the Unicode ones (the ANSI ones usually simply translate the ANSI string into a wide string and call theW
version anyway). There are also macros that expand to either theA
orW
version based on project preferences - I prefer to avoid those. - My C coding style contains many remarks (even obvious ones) and a proper cleanup label. I do not use
goto
besidesgoto lblCleanup
, and I make sure I only have onereturn
in a function. This makes for very clean quality code, in my opinion. - On Windows I will use the Windows common variable naming (with a prefix that hints the variable type).
Now that those concepts are agreed (?), let us start with some basic dll
skeleton code. Every dll
starts with a DllMain
function, which gets 3 parameters:
hInstance
- the DLLHINSTANCE
(which is a complicated way to say it's a pointer to the DLL in memory)dwReason
- the reason for callingDllMain
. There are 4 reasons: when first loading the DLL to a process, when unloading from a process, and when a new thread is attached or detached. We will care only about the first load, and also make sure thread calls aren't made with theDisableThreadLibraryCalls
API.pvReserved
- ignored.
So, here's the DllMain
code:
BOOL
WINAPI
DllMain(
HINSTANCE hInstance,
DWORD dwReason,
LPVOID pvReserved
)
{
// Unreferenced parameters
UNREFERENCED_PARAMETER(pvReserved);
// Only care about loading
if (DLL_PROCESS_ATTACH == dwReason)
{
DisableThreadLibraryCalls(hInstance);
MainRoutine(hInstance);
}
// Succeed always
return TRUE;
}
The code simply calls MainRoutine
with the hInstance
(that I will need later), and it does so only upon the first load (DLL_PROCESS_ATTACH
).
The MainRoutine
function is just as easy - it will resolve the name of the executable that we are loaded into, and act accordingly:
- If it's loaded to the right process - we hook.
- Otherwise, we inject.
#define PROCESS_OF_INTEREST (L"notepad.exe")
static
VOID
MainRoutine(
HINSTANCE hInstance
)
{
WCHAR wszExePath[MAX_PATH] = { 0 };
PWSTR pwszSep = NULL;
// Get the EXE path that we are loaded into
if (0 == GetModuleFileNameW(NULL, wszExePath, ARRAYSIZE(wszExePath)))
{
goto lblCleanup;
}
// If we are not running in the right process - inject
pwszSep = wcsrchr(wszExePath, L'\\');
if ((NULL == pwszSep) || (0 != wcscmp(pwszSep + 1, PROCESS_OF_INTEREST)))
{
// INJECTION CODE GOES HERE
}
else
{
// HOOKING CODE GOES HERE
}
lblCleanup:
return;
}
The idea is to invole the GetModuleFileNameW
to get the path of the executable we are loaded into (with NULL
to get that for the executable) and then using the wcsrchr
to get the last directory separator and then wcscmp
to ensure we are loaded to the right process (PROCESS_OF_INTEREST
).
As part of the injection code, we will need to find a process (the PROCESS_OF_INTEREST
that was #define
d earlier) by name and get its process ID.
This is trivially done with Windows API with the CreateToolhelp32Snapshot
, Process32FirstW
and Process32NextW
APIs. There is nothing special about that functionality, but it's a nice opportunity to show how my coding style looks like:
#define CLOSE_TO_VAL(obj, val, pfn) do \
{ \
if ((val) != (obj)) \
{ \
(VOID)(pfn)(obj); \
(obj) = (val); \
} \
} while (FALSE)
#define CLOSE_HANDLE(hHandle) CLOSE_TO_VAL(hHandle, NULL, CloseHandle)
#define CLOSE_SNAPSHOT(hHandle) do \
{ \
__pragma(warning(push)) \
__pragma(warning(disable:6387)) \
CLOSE_TO_VAL(hHandle, INVALID_HANDLE_VALUE, CloseHandle); \
__pragma(warning(pop)) \
} while (FALSE)
static
DWORD
GetTargetPid(VOID)
{
DWORD dwPid = 0;
HANDLE hSnap = INVALID_HANDLE_VALUE;
PROCESSENTRY32 tCurrProcess = { 0 };
// Find the right process
hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPALL, 0);
if (INVALID_HANDLE_VALUE == hSnap)
{
goto lblCleanup;
}
// Fetch first processs
tCurrProcess.dwSize = sizeof(tCurrProcess);
if (!Process32FirstW(hSnap, &tCurrProcess))
{
goto lblCleanup;
}
// Iterate all processes
do
{
if (0 == wcscmp(tCurrProcess.szExeFile, PROCESS_OF_INTEREST))
{
dwPid = tCurrProcess.th32ProcessID;
break;
}
} while (Process32NextW(hSnap, &tCurrProcess));
lblCleanup:
// Cleanup
CLOSE_SNAPSHOT(hSnap);
// Return result
return dwPid;
}
The function is trivial - it returns a process ID of 0
in case of failure and the right process ID in case of success.
There are some interesting macros that I've defined that, in my opinion, make my code quite elegant:
CLOSE_TO_VALUE
closes an object (most commonlyHANDLE
on Windows) and assigns it to a value (e.g.NULL
) - unless it's equal to that value.CLOSE_HANDLE
simply closes aNULL
-defaultedHANDLE
with theCloseHandle
API.CLOSE_SNAPSHOT
does the same but withINVALID_HANDLE_VALUE
, since that's the error result forCreateToolhelp32Snapshot
.
Convince yourself why this code is leak-free and easy to read, and most importantly, works.
Assuming we got the right process ID for the desired process, how do we inject code into it?
The number of code injection articles is too high to count, and there are many different methods of injecting code into a process.
However, I will specify the simplest one (in my opinion) - DLL injection
with CreateRemoteThread
. The idea is quite simple:
- Use
VirtualAllocEx
to allocate a RW region in the desired process address space. - Use
WriteProcessMemory
to write the DLL path there. - Run
LoadLibrary
in the other process address space with the help ofCreateRemoteThread
API. Since the thread routine has the exact prototype asLoadLibrary
, we could useLoadLibrary
as the thread routine and the newly allocated region (which contains the DLL path) there.
Since we want to inject ourselves, we need to know our own DLL path - but that's easy - we call GetModuleFileName
but with the hInstance
we got back in DllMain
.
Here's the injection code:
static
BOOL
InjectDll(
DWORD dwPid,
PWSTR pwszDllPath
)
{
BOOL bResult = FALSE;
HANDLE hProc = NULL;
HANDLE hThread = NULL;
SIZE_T cbDllPath = 0;
PVOID pvRemoteBuffer = NULL;
SIZE_T cbWritten = 0;
// Open the process
hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwPid);
if (NULL == hProc)
{
goto lblCleanup;
}
// Allocate RW memory for the DLL path
cbDllPath = (wcslen(pwszDllPath) + 1) * sizeof(*pwszDllPath);
pvRemoteBuffer = VirtualAllocEx(hProc, NULL, cbDllPath, MEM_COMMIT, PAGE_READWRITE);
if (NULL == pvRemoteBuffer)
{
goto lblCleanup;
}
// Copy the DLL path memory
if ((!WriteProcessMemory(hProc, pvRemoteBuffer, pwszDllPath, cbDllPath, &cbWritten)) || (cbWritten != cbDllPath))
{
goto lblCleanup;
}
// Inject the DLL
hThread = CreateRemoteThread(hProc, NULL, 0, (LPTHREAD_START_ROUTINE)LoadLibraryW, pvRemoteBuffer, 0, NULL);
if (NULL == hThread)
{
goto lblCleanup;
}
// Success
bResult = TRUE;
lblCleanup:
// Cleanup
CLOSE_HANDLE(hThread);
if (NULL != pvRemoteBuffer)
{
(VOID)VirtualFreeEx(hProc, pvRemoteBuffer, 0, MEM_RELEASE);
pvRemoteBuffer = NULL;
}
CLOSE_HANDLE(hProc);
// Return result
return bResult;
}
There are a few interesting takes here:
- This function gets a process
HANDLE
using theOpenProcess
API. It usesPROCESS_ACCESS_ALL
, but we could improve it with more fine-grained flags if we ever feel it's necessary. - When calling
CreateRemoteThread
, we useLoadLibraryW
as the start routine and supplypvRemoteBuffer
as its argument.
There is quite a subtle issue with using LoadLibraryW
as the thread routine. You see, Windows (and all other modern operating systems) has a security feature called ASLR. Its purpose is to randomize the base address at which modules are loaded, s.t. attackers that exploit memory corruption vulnerabilities will have a hard time determining where execution should proceed.
The subtle point here is that due to ASLR, allegedly there is no guarantee that LoadLibraryW
has the same address in both of the processes - note we treat LoadLibraryW
here locally in our process address space, but the thread routine should be in the foreign address space!
The solution to that is a mechanism in Windows called Known DLLs. Essentially, Known DLLs
is a performance mechanism - common DLLs are mapped as memory sections when the operating system boots (smss.exe
does that), and then, when a program tries to load them - they're loaded straight from that shared memory section, which is cheaper than reading from disk.
This performance improvement comes with a cost - all Known DLLs are going to be loaded to the same address on the local machine (but not between reboots). Luckily for us, kernel32.dll
(which implements LoadLibraryW
) is a Known DLL, so we can always refer to its functions locally and know that they are mapped to the same address in all other processes in the system!
As I mentioned in the intro, there are multiple ways to hook a function. I would like to mention that hooking is a dangerous business - for example, if there are multiple threads of execution, we have to make sure no thread executes semi-hooked functions. I've seen that happen in the past - not to me, but to the MITRE Red Team during an official evaluation. I've blogged about it in the past - you are more than welcome to read here.
In our case, we are going to perform some inline hooking. I am assuming I run in a 64-bit process, but adjusting will be quite easy, if necessary (left as an exercise to the reader). Let's break it down to smaller tasks:
- We will need to change the page protections of the hooked function to
RWX
(normally they will beRX
only). This is necessary to be able to write our hook trampoline. It's important to know there are several mitigations on modern Windows against this - read the SetProcessMitigationPolicy API for more info. - We will need to backup the original bytes from the function we're going to override - in case we want to invoke the original function.
- Our trampoline should jump to our own function.
For the trampoline itself, the idea is simple:
48 B8 XX XX XX XX XX XX XX XX mov rax, <imm>
50 push rax
C3 ret
Overall - the trampoline takes 12
bytes, which is reasonable. I assume the function we are going to hook is longer than that.
For the hooking, I am going to backup the original bytes in a global variable - usually I dislike them, but it's kind of necessary when hooking.
Here's my code (with an example to hook MessageBoxA
):
#define TRAMPOLINE_LEN (12)
static
BYTE
g_acTrampolineBytes[TRAMPOLINE_LEN] = { 0 };
static
PBYTE
g_pcOriginalFunctionPtr = NULL;
static
VOID
SetTranpoline(
PVOID pvHook
)
{
// Hook instruction: MOV RAX, <IMM>
g_pcOriginalFunctionPtr[0] = 0x48;
g_pcOriginalFunctionPtr[1] = 0xB8;
*((PULONGLONG)(g_pcOriginalFunctionPtr + 2)) = (ULONGLONG)pvHook;
// Hook instruction: PUSH RAX
g_pcOriginalFunctionPtr[10] = 0x50;
// Hook instruction: RET
g_pcOriginalFunctionPtr[11] = 0xC3;
}
static
VOID
UnsetTranpoline(VOID)
{
CopyMemory(g_pcOriginalFunctionPtr, g_acTrampolineBytes, sizeof(g_acTrampolineBytes));
}
static
INT
WINAPI
MyFakeMessageBoxA(
HWND hWnd,
LPCSTR pszText,
LPCSTR pszCaption,
UINT uType)
{
INT iResult = 0;
// We are overriding the text
UNREFERENCED_PARAMETERO(pszText);
// Call the original function
UnsetTranpoline();
iResult = MessageBoxA(hWnd, "HOOKED MSGBOX", pszCaption, uType);
SetTranpoline(MyFakeMessageBoxA);
// Return result
return iResult;
}
static
BOOL
InstallHook(VOID)
{
HMODULE hModule = NULL;
DWORD dwOldProt = 0;
BOOL bResult = FALSE;
// Get the module
hModule = GetModuleHandleW(L"User32.dll");
if (NULL == hModule)
{
goto lblCleanup;
}
// Get the function to hook
g_pcOriginalFunctionPtr = (PBYTE)GetProcAddress(hModule, "MessageBoxA");
if (NULL == g_pcOriginalFunctionPtr)
{
goto lblCleanup;
}
// Change the page protection for hooking purposes
if (!VirtualProtect(g_pcOriginalFunctionPtr, sizeof(g_acTrampolineBytes), PAGE_EXECUTE_READWRITE, &dwOldProt))
{
goto lblCleanup;
}
// Save the original trampoline bytes for later unhooking
CopyMemory(g_acTrampolineBytes, g_pcOriginalFunctionPtr, sizeof(g_acTrampolineBytes));
// Install the hook
SetTranpoline(MyFakeMessageBoxA);
// Success
bResult = TRUE;
lblCleanup:
// Return result
return bResult;
}
Let's examine some critical points:
- The
SetTrampoline
function sets the trampoline as we discussed - it assigned therax
register to the hook and then uses apush-ret
trick to jump to it. - The
UnsetTrampoline
function restores the original bytes into the function usingCopyMemory
. - The
MyFakeMessageBoxA
function is the hook forMessageBoxA
, which will call the original function but replace the text with a constant. To call the originalMessageBoxA
function - it must unset the trampoline - otherwise, it will call itself again. - The
InstallHook
function installs the hook onMessageBoxA
by vallingVirtualProtect
withPAGE_EXECUTE_READWRITE
.
Even this short, trivial exercise is interesting in many ways, at least in my opinion.
When interviewing, I used to ask candidates how to inject and hook, what their considerations are, how things work and how to deal with edge-cases. I hope this blogpost is useful for those exact topics!
Thanks,
Jonathan Bar Or