Enforcing a write-xor-execute memory policy from usermode

Published 02/02/2018 | By GCS

If BuzzFeed ran an article titled “26 Security Features You Probably Shouldn’t Enforce From Usermode”, this one would almost certainly make the list. But, for whatever reason, I thought it would be a fun learning experience to try to enforce a W^X memory policy from usermode. Some of you are probably asking what the heck a W^X policy is in the first place, and I’m terrible at thinking of ways to start blog posts (case in point: this paragraph), so I guess we’ll start out there.

What’s a W^X policy, anyway?

W^X is an exploit mitigation tactic in which memory pages that are, or have ever been, marked as writable can never be marked as executable during the process lifetime. The old exploit tactic of putting your exploit payload on the stack (or heap) and calling it directly was killed off with no-execute (NX, also known as hardware DEP on Windows) support, which made ret2libc/ROP approaches much more popular. ROP involves finding small pieces of existing executable code in the application and its libraries, chaining them together using the stack, with the goal of calling an API or two to allocate some executable memory for the payload to be copied into. On Windows this is usually done with a ROP chain to the VirtualAlloc() API, passing PAGE_EXECUTE_READWRITE() in order to allow for both writing the data in and executing it afterwards.

Enforcing a W^X policy breaks this approach, as an exploit cannot allocate memory as RWX, or as RW and then later executable. Applications in Windows 8.1 and later can opt into a W^X policy, enforced by the kernel, this using the SetProcessMitigationPolicy() API with the ProcessDynamicCodePolicy() argument. Of course, this is also the boring way (at least for this article).

The small print

I’m not going to make you read a sixty-eight page EULA and sign your life away on the dotted line, but there are things you should know before you gallivant away with some source code and a dream of securing your applications:

I am a terrible C++ programmer. You should absolutely not use my code in production
This is a proof-of-concept, so you still absolutely should not use it in production. And probably not any other context than “I want to learn how this works” or “I want to torture my eyes by reading wonky code”
While some effort has been made to make the PoC thread-safe, there are some race conditions (probably security-critical ones) that exist and I haven’t done anything to fix for reasons of keeping the code fairly simple
Only VirtualAlloc(), VirtualProtect(), and VirtualFree() are hooked. There are ways to get around this (e.g. calling `ntdll` functions directly, or using `Ex` suffix variants) so, again, don’t expect any concrete security from this
In case you didn’t already get the memo, implementing this kind of security feature from usermode is colossally silly, particularly when the OS offers a proper version that is enforced in the kernel. An attacker who expects this usermode “protection” can tailor their exploit to bypass it in most cases
Just like the kernelmode version, this breaks any application that uses JIT compilation. So that means all browsers, anything that uses Java, .NET, or a modern JavaScript engine. It also means things that embed a web frame

Caveat emptor, and all that.

How does this thing work?

The actual approach is fairly simple:

Hook APIs
Reject calls that would result in a page being writable and executable at the same time
Track calls that result in a page being writable, and deny future calls that would make those pages executable

The grunt work involved with hooking APIs is fairly boring, so I enlisted the help of the mhook library by Marton Anka. This library provides a really intuitive way of hooking APIs:

Mhook_SetHook((PVOID*)&OriginalVirtualAlloc,   HookedVirtualAlloc);
Mhook_SetHook((PVOID*)&OriginalVirtualProtect, HookedVirtualProtect);
Mhook_SetHook((PVOID*)&OriginalVirtualFree,    HookedVirtualFree);

Each call to Mhook_SetHook() takes a pointer to the original API as the first parameter, and a pointer to the hooked version you want to replace it with.

VirtualAlloc hook

The VirtualAlloc hook checks if flProtect is either PAGE_EXECUTE_READWRITE or PAGE_EXECUTE_WRITECOPY. The former is the general-case RWX protection, and the latter is used when the segment of memory is a memory-mapped file. If either of these protection options are detected, the operation is failed with an access denied error.

Next, we perform the requested VirtualAlloc() call via OriginalVirtualAlloc. If this succeeds, we check to see if the requested allocation contained a writable flag (e.g. PAGE_READWRITE or PAGE_WRITECOPY) and, if so, add that allocation’s page address and allocation size to a tracking list. This allows us to later reject requests to make these pages executable, as they have been tainted with the writable mark.

VirtualProtect hook

The VirtualProtect hook is the most involved. As with VirtualAlloc it first rejects RWX protections outright. It then checks to see if the requested protection is executable and, if so, checks if the page exists within the boundary of a tracked writable allocation, i.e. if it starts within one, ends within one, or starts before and ends after one. This prevents tricks like allocating a small chunk of writable memory inside a larger readonly block, then calling VirtualProtect() over the whole block to make it all executable.

In order to protect against abuse of writable memory that was pre-allocated by the loader (e.g. the .data section) the code also calls VirtualQuery() to test the existing protection status of the memory, just in case we aren’t tracking it.

Another case we need to handle is similar to the VirtualAlloc() call. If the call is making memory writable, we need to track it. First we check if the exact allocation is already present in our tracked list, then if it isn’t we add it. It doesn’t matter if we have overlapping tracking metadata for writable allocations – we handle this case in our hooked VirtualFree(). Speaking of which…

VirtualFree hook

This hook is fairly simple. We just iterate over every item in the tracked allocations and remove them if they cover the address being freed.

Testing

The initial driver for me writing this code, before I decided to implement a full W^X policy with it, was to test for cases where an application under test would attempt to allocate RWX buffers, and if they actually needed those buffers to be executable (i.e. swap RWX for RW and see if you get a crash). For fun, I injected this DLL into a bunch of different programs. Many (e.g. notepad, calc) just work without problems, as they don’t rely on RWX memory at all. A number of others (e.g. Chrome, Spotify) crash due to JIT code that runs inside the process. It was quite fun to watch these allocations occur in realtime via debug messages.

Bypasses

There are a number of ways to bypass the PoC as it stands. I thought about eliminating them, but I think it’s more fun to go through the code and identify the problems.

The first and most obvious way is to ROP to GetModuleHandle() and find the original APIs that way, totally bypassing the checks. It is possible to fix this to some extent by hooking GetModuleHandle() and similar APIs, but this mostly ends up as a cat-and-mouse game. This is why you should implement this stuff in kernelmode.

The second way is a race condition. In both VirtualAlloc and VirtualProtect hooks we call the original function, then lock the tracking list and add the new allocation to the tracked list. It is possible to call either of these functions twice. This can be fixed with a global allocation mutex.

There’s also a potential TOCTOU race condition in our VirtualProtect hook, where we check the page protection using VirtualQuery and later potentially call VirtualAlloc based on the result. However, the attacker would have to get the application to call the unhooked VirtualProtect in order to exploit this particular issue.

Finally there’s a really interesting case – marking a page as writable, filling it with data, freeing it, then re-allocating as read-execute and hoping that you get the same page back before it gets reset to zeroes by the OS. In fact, when I thought of this issue, I wondered whether I might have stumbled across a potential mitigation bypass in the real W^X implementation for Windows, and my eyes turned into dollar signs. Thankfully (or sadly) the clever folks at Microsoft thought of this already, and forced every released page to be reset.

Closing words

I hope that this ham-fisted approach to implementing W^X has been of some educational use, at least in terms of thinking about how the protection can be implemented in practice. If you’d like to follow along at home, the code can be found in the WXPolicyEnforcer project on the Portcullis Labs GitHub. It is released under MIT license.