World of kernel in pwn

Posted on Sat, Jul 31, 2021 Kernel Land PWN Binary Exploitation Linux Kernel kernel


Hello or good evening to all in this article we will see together how to exploit a buffer overflow on a linux kernel (x86-64,x86)

I would like to introduce myself for those who do not know me. My name is default and I am a young student in cyber security and I intend to become a pentester

I'm going to share with you some questions/answers that I asked myself during my first kernel operation.

And make you discover even more about this vast world that is the low level in computer science !

First of all I would like to thank bryton and nuts, it's with him that we operated our first kernel you can find his blog right here →,

What better way to see the some basics before starting something

The first question we will ask ourselves is.. *suspense*

What is kernel ? How does the kernel work?

Kernel is a set of programs that follows the basic concepts of the operating system. In simple words, The kernel is a fundamental part of a modern computer's operating system and used to initialize and manages critical resources like CPU, Memory, I/O devices, clocks etc and provides a platform to be able to run other programs and utilize all these resources in batter way. Without this core part any OS will not work at all.

The critical code of the kernel is usually loaded into a protected area of memory, which prevents it from being overwritten by other, less frequently used parts of the operating system or by applications this protected area is called as kernel space and these group of programs that executes in this area are known as kernel threads (programs). Various kernel designs differ in how they manage system calls and resources.

Other programs (tasks) will execute in other restricted area that is known as user space. User space doesnt have that much privilege to manage critical resources. They can request the kernel to get those resources using system calls.

Kernel land vs User Land

in this part we will see the ring protection model, the difference between the user land and the kernel land and the protections

Ring protection model, Ring signification

This diagram is supposed to represent the different levels of privilege in a system

(x86 processors, system mode on ARM, kernel mode on MIPS, supervisor mode on 68xxx, etc..)

It is divided into several "parts" called ring

As you may have guessed, each ring has its own meaning

Kernel land

Said before the kernel has a memory space of its own called kernel space

This memory has been set up to avoid user interaction in order to avoid damaging sensitive data

On one conceptual level, the kernel is everything that runs at a "more privileged" level of hardware protection.

Based on the ring model protection the kernel space is ring0

The kernel is usually interrupt-driven, either software interrupts (system calls) or hardware interrupts (disk drives, network cards, hardware timers).

Later in the article we will learn how to interact with kernel land via a kernel related vulnerability

User land

Going back to what I said earlier, there is a user space with lower privacy

A little bit of space for you, you can do what you want you will not damage the kernel space

is what runs in the least privileged mode (ring 3 on x86 CPUs, user mode on ARM or MIPS, etc.). User land takes advantage of the way that the kernel smooths over minor hardware differeences, presenting the same API to all programs. For instance, some wireless cards might have extra control registers with respect to others, or contain more or less on-board buffer for incoming packets. The driver code accounts for these differences (sometimes by ignoring advanced or unusual features), and presents the same socket API to all programs.

Some processors (e.g. x86, VAX, Alpha AXP) have more than two modes, but the generic Unix architecture doesn't use the intermediate modes.

Linux device drivers

Now in this part we are going to focus on the devices because in the purpose of an exploitation it is a fundamental part to understand

from an exploitation point of view hackers would need an entry point so that data is received in kernel land

This is where we will understand how devices work, how to create a device, why it can be useful in the case of a vulnerability

Loadable Kernel Modules (LKM)

One of the goof geatures of Linux is the ability to extend at runtime the set of features offered by the Kernel. This means that you can add functionality to the kernel !

The parts of code that can be added to the kernel are called a module But what is exactly a module ?

In an operating system, a module is a part of the kernel that can be integrated during operation. The term generally used for them is Loadable Kernel Module (LKM).

loadable module can be dynamically linked to the kernel with the insmod command

and unlink with rmmod , lsmod to list LKM

there are different module and device classes

we can quote char modules, block modules, network modules

the interesting part will be in the char modules

Character device is one that can be accessed as a stream of bytes (like a file) a char driver is in charge of implementing this behavior. Such a driver usually implements at least the open, close, read, and write system calls. Exemple : /dev/console, /dev/ttyS0

Build your own LKM

When we start a language we often start with the classic, "HelloWorld" and this is what I invite you to do but this time in kernel modules

#include <linux/init.h>
#include <linux/module.h>

static int hello_init(void)
	printk(KERN_ALERT "Hello, world\n");
	return (0);

static void hello_exit(void)
	printk(KERN_ALERT "Goodbye, world\n");


This is the code for a helloword

There are two functions hello_init and hello_exit

module_init() and module_exit() are used to define the initialization of the module and the exit

you can also see that we don't use the usual "printf()" function but "printk()" and yes because don't forget that our module is executed in kernel land and so they are not the same function used

kernel modules are compiled with a makefile, to output a .ko file

Here is a Makefile example

obj-m +=hello.o

    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Useful command after compilation

$ lsmod # list LKM
$ insmod hello.ko # load kernel module
$ rmmod hello # remove LKM

And you can see in the kernel log ($ dmesg ) your helloword :D

which would mean that your module has been executed in the kernel space !

if you are interested in kernel development and want to satisfy your curiosity I invite you to read this book on device drivers

Linux Device Drivers, Third Edition

This is the web site for the Third Edition of Linux Device Drivers, by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. For the moment, only the finished PDF files are available; we do intend to make an HTML version and the DocBook source available as well.

Why devices are an interesting entrypoint for hackers ?

Let's go back to our goal of exploiting a kernel, in what way could the devices be interesting?

But yes as said earlier the devices are executed in kernel land which means that if the developer of this one makes an error that will potentially leave a vulnerability you can attack this entry point in order to reach the kernel space!

By finding a way to send data to the driver your data will be supported in kernel land and that's interesting

Note: devices are often used in this domain but note that it is not always the case we can also use a syscall but in this article our entry point will be a device

Memory protections in kernel and user land

To introduce this part I'll tell you that there are several protections on the memory in kernel land but also in user land

And yes, you little rascal, it's not that easy aha !

This protection was implemented in part to protect malicious attackers from gaining privileges from their remote binary memory

Let's start with the popular protection

User land memory protections

The NX bit for "non-execute" is a protection that prevents attackers from executing something on the stack, for example a shellcode

more precisely it is a technology coming from the CPU to separate the memory areas to be used either by storing processor instructions (code). And therefore makes them non-executable

For Address space layout randomization, also found in kernel land under the name of "kaslr", ASLR is a protection that will make the memory addresses contained in the stack, the heap, and the libraries totally randomized in order not to be able to use them anymore with a vulnerability (example: in the case of a buffer overflow for example that allows us to control the flow of execution of a program but if the address changes at each execution the flow will be generated) Notice : ASLR is not a protection for a precise binary but for all the user land part memory in question.

For Data Execution Prevention is a security feature built into many modern operating systems such as Linux, Mac OS X, iOS, Microsoft Windows and Android. It is designed to prevent code execution from memory blocks that are supposed to contain data in order to weaken the probability of a successful buffer overflow attack.

Notice : In order to activate it, the nx bit must be set to 1.

Is a protection allowing to also randomize the memory addresses but this time contained in the binary code

Relocation Read-Only (or RELRO) is a security measure which makes some binary sections read-only.

Partial RELRO : is the default setting in GCC, partial RELRO makes almost no difference, other than it forces the GOT to come before the BSS in memory, eliminating the risk of a buffer overflows on a global variable overwriting GOT entries.

Full RELRO : resolves all dynamically linked functions at the beginning of the execution and that makes the entire GOT read-only which removes the ability to perform a "GOT overwrite" attack, where the GOT address of a function is overwritten with the location of another function or a ROP gadget an attacker wants to run.

Since the gcc extension "stack smashing protector" is a protection preventing the overwriting of the register containing the return value, with the SSP a stack canary is placed between the registers RBP & RIP: RBP, CANARY, RIP. (EBP & EIP: EBP, CANARY, EIP in 32bits ) The canary contains a value and this value prevents from rewriting on RIP because by rewriting on the canary it will check if the bytes rewritten are the same as this value if it is not the case it will interrupt the execution of the program.

Fortify is a protection that will be done during the compilation. It will detect some vulnerable function of the program for example "strcpy" and will change it so that it is not vulnerable anymore for example in "strncpy".

Kernel land memory protections

Kernel address space layout randomization (KASLR) enables address space randomization for the Linux kernel image by randomizing where the kernel code is placed at boot time

Supervisor Mode Execution Prevention (SMEP) can be used to prevent the supervisor mode from unintentionally executing user space code. for example kernel pointers (symbols) found in /proc/kallsyms can not be used without bypass

Supervisor Mode Access Prevention (SMAP) is a feature that allows supervisor mode programs to optionally set user-space memory mappings so that access to those mappings from supervisor mode will cause a trap. This makes it harder for malicious programs to "trick" the kernel into using instructions or data from a user-space program.

Kernel pointer restricts the use of kernel symbols (/proc/kallsyms) by making their address null in the eyes of the user

Kernel page-table isolation (KPTI or PTI)is a Linux kernel feature that and improves kernel hardening against attempts to bypass kernel address space layout randomization (KASLR). It works by better isolating user space and kernel space memory.

this is the same as stack canaries on userland. It is enabled in the kernel at compile time and cannot be disabled.

Control register manual

this part is not from me especially it's just a manual of the different control register

What is a control register ?

A control register is a processor register which changes or controls the general behavior of a CPU or other digital device. Common tasks performed by control registers include interrupt control, switching the addressing mode, paging control, and coprocessor control.

The CR0 register is 32 bits long on the 386 and higher processors. On x64 processors in long mode, it (and the other control registers) is 64 bits long. CR0 has various control flags that modify the basic operation of the processor.

Reserved, the CPU will throw a #UD exception when trying to access it.

Contains a value called Page Fault Linear Address (PFLA). When a page fault occurs, the address the program attempted to access is stored in the CR2 register.

Used when virtual addressing is enabled, hence when the PG bit is set in CR0. CR3 enables the processor to translate linear addresses into physical addresses by locating the page directory and page tables for the current task. Typically, the upper 20 bits of CR3 become the page directory base register (PDBR), which stores the physical address of the first page directory entry. If the PCIDE bit in CR4 is set, the lowest 12 bits are used for the process-context identifier (PCID).[1]

Used in protected mode to control operations such as virtual-8086 support, enabling I/O breakpoints, page size extension and machine-check exceptions.

Reserved, same case as CR1.

Additional Control registers in x86-64 series

Extended Feature Enable Register (EFER) is a model-specific register added in the AMD K6 processor, to allow enabling the SYSCALL/SYSRET instruction, and later for entering and exiting long mode. This register becomes architectural in AMD64 and has been adopted by Intel as IA32_EFER. Its MSR number is 0xC0000080.

CR8 is a new register accessible in 64-bit mode using the REX prefix. CR8 is used to prioritize external interrupts and is referred to as the task-priority register (TPR).

The AMD64 architecture allows software to define up to 15 external interrupt-priority classes. Priority classes are numbered from 1 to 15, with priority-class 1 being the lowest and priority-class 15 the highest. CR8 uses the four low-order bits for specifying a task priority and the remaining 60 bits are reserved and must be written with zeros.

System software can use the TPR register to temporarily block low-priority interrupts from interrupting a high-priority task. This is accomplished by loading TPR with a value corresponding to the highest-priority interrupt that is to be blocked. For example, loading TPR with a value of 9 (1001b) blocks all interrupts with a priority class of 9 or less, while allowing all interrupts with a priority class of 10 or more to be recognized. Loading TPR with 0 enables all external interrupts. Loading TPR with 15 (1111b) disables all external interrupts.

The TPR is cleared to 0 on reset.

XCR0, or Extended Control Register 0, is a control register which is used to toggle the storing or loading of registers related to specific CPU features using the XSAVE/XRSTOR instructions. It is also used with some features to enable or disable the processor's ability to execute their corresponding instructions. It can be accessed using the privileged XSETBV and nonprivileged XGETBV instructions.

There is also the IA32_XSS MSR, which is located at address 0DA0h. The IA32_XSS MSR controls bits of XCR0 which are considered to be "supervisor" state, and should be invisible to regular programs. It operates with the privileged XSAVES and XRSTORS instructions by adding supervisor state to the data they operate with. Put simply, if the X87 state was enabled in XCR0 and PT state was enabled in IA32_XSS, the XSAVE instruction would only store X87 state, while the privileged XSAVES would store both X87 and PT states. Because it is an MSR, it can be accessed using the RDMSR and WRMSR instructions.

Kernel stack layout

On Linux, every thread on your system has a corresponding kernel stack allocated in kernel memory. Linux kernel stacks on x86 are either 4096 or 8192 bytes in size, depending on your distribution. While this size may seem small to contain a full call chain and associated local stack variables, in reality the kernel call chains are relatively shallow and kernel functions are discouraged from abusing the precious space with large local stack variables when efficient allocators such as the SLUB are available.

The stack shares the 4k/8k total size with the thread_info structure, which contains some metadata about the current thread, as seen in include/linux/sched.h:

union thread_union {
 struct thread_info thread_info;
 unsigned long stack[THREAD_SIZE/sizeof(long)];

The thread_info structure has the following definition on x86 from arch/x86/include/asm/thread_info.h:

struct thread_info {
 struct task_struct *task;
 struct exec_domain *exec_domain;
 __u32 flags;
 __u32 status;
 __u32 cpu;
 int preempt_count;
 mm_segment_t addr_limit;
 struct restart_block restart_block;
 void __user *sysenter_return;
#ifdef CONFIG_X86_32
 unsigned long previous_esp;
 __u8 supervisor_stack[0];
 int uaccess_err;

Visualising kernel stack

if the attacker provides a sufficiently large count, the stack may extend down past the boundary of thread_info, allowing the attacker to subsequently write arbitrary values into the thread_info structure. Extending the stack pointer past the thread_info boundary would look like the following:

File system env explication

Shell script to run the qemu emulation

This file is simply the kernel itself but compressed into a single file. It can be extracted into an ELF executable file "vmlinux" Useful to look for gadgets when doing a rop.

Tool for extract:

linux/extract-vmlinux at master · torvalds/linux

Linux kernel source tree. Contribute to torvalds/linux development by creating an account on GitHub.

the Linux file system that is compressed with cpioand gzip, directories such as /bin/etc, … are stored in this file, also the vulnearable kernel module is likely to be included in the file system as well. For other challenges, this file might come in some other compression schemes.

We will be interested in this file in the article on rop because we will interact with

Or you can see all explication in basics unit

Kernel exploitation part 1:

Linux Kernel Exploitation - BOF (part1)