Cybersecurity Handbook Web Code With Boundaries Production With Confidence Immediate Measures (15 Minutes) Why This Matters 1 Core Principles

Input Validation & Output Encoding

Q: Immediate measures (15 minutes) First implement input validation, secure output encoding, and parameterized queries. Harden authentication, session management, and security headers as a baseline. Embed measures in CI/CD with tests, linting, and release gates. Why this matters

The core of Input Validation & Output Encoding is risk reduction in practice. Technical context supports the measure selection, but implementation and assurance are central.

Q: Core Principles Validate input, encode output

┌──────────┐ ┌────────────┐ ┌──────────┐ ┌──────────────┐ │ Input │────▶│ Validation │────▶│ Business │────▶│ Output │────▶ Output │ (untrust)│ │ (allowlist)│ │ Logic │ │ encoding │ └──────────┘ └────────────┘ └──────────┘ └──────────────┘ Input validation is generic: "Is this a valid email address? A number between 1 and 100?" Output encoding is context-specific: "Am I placing this value in HTML, JavaScript, SQL, or a URL?" Never trust All input is untrusted. Not just form fields, but also:

Q: Output encoding per context

The correct encoding depends on where you place the data. This is the most critical lesson: there is no universal sanitize function.

Input Validation & Output Encoding

Code With Boundaries, Production With Confidence

Web risk is rarely mysterious. It usually lies in predictable mistakes that persist under time pressure.

With Input Validation & Output Encoding, the biggest gains come from secure defaults that are automatically enforced in every release.

That makes security less of a separate check after the fact and more of a standard quality of your product.

Immediate measures (15 minutes)

First implement input validation, secure output encoding, and parameterized queries.
Harden authentication, session management, and security headers as a baseline.
Embed measures in CI/CD with tests, linting, and release gates.

Why this matters

The core of Input Validation & Output Encoding is risk reduction in practice. Technical context supports the measure selection, but implementation and assurance are central.

Core Principles

Validate input, encode output

┌──────────┐     ┌────────────┐     ┌──────────┐     ┌──────────────┐
│  Input   │────▶│ Validation │────▶│ Business │────▶│ Output       │────▶ Output
│ (untrust)│     │ (allowlist)│     │  Logic   │     │ encoding     │
└──────────┘     └────────────┘     └──────────┘     └──────────────┘

Input validation is generic: "Is this a valid email address? A number between 1 and 100?"
Output encoding is context-specific: "Am I placing this value in HTML, JavaScript, SQL, or a URL?"

Never trust

All input is untrusted. Not just form fields, but also:

HTTP headers (Host, Referer, User-Agent, X-Forwarded-For)
Cookies
URL parameters and path segments
Filenames in uploads
API responses from external services
Database content (may have been injected earlier)

Input Validation

Allowlist over blocklist

# WRONG — blocklist: try to block known bad patterns
def sanitize_input(value):
    blacklist = ['<script>', 'DROP TABLE', '../', ';']
    for bad in blacklist:
        value = value.replace(bad, '')
    return value  # Endlessly bypassable

# RIGHT — allowlist: define what IS allowed
import re

def validate_username(value):
    if not re.fullmatch(r'[a-zA-Z0-9_]{3,30}', value):
        raise ValueError("Invalid username")
    return value

A blocklist is a race you always lose. There are infinitely many ways to encode malicious input. An allowlist defines the finite set of valid values.

Type, range, and format

# Type validation
def validate_age(value):
    age = int(value)          # TypeError if not a number
    if not 0 <= age <= 150:   # Range check
        raise ValueError("Age out of range")
    return age

# Format validation with regex
import re

def validate_dutch_postcode(value):
    if not re.fullmatch(r'\d{4}\s?[A-Z]{2}', value):
        raise ValueError("Invalid postal code")
    return value.replace(' ', '')  # Normalize to '1234AB'

# Email: use a library, don't write your own regex
from email_validator import validate_email

def validate_email_address(value):
    result = validate_email(value)
    return result.normalized

Unicode normalization

Unicode offers multiple representations for the same character. Without normalization, identical-looking strings can be different:

import unicodedata

# 'café' can be encoded in two ways:
nfc = unicodedata.normalize('NFC', user_input)   # Composed: é
nfkc = unicodedata.normalize('NFKC', user_input) # Compatible: ﬁ → fi

# Normalize BEFORE validation
def validate_name(value):
    value = unicodedata.normalize('NFC', value)
    if not re.fullmatch(r'[\w\s\-]{1,100}', value, re.UNICODE):
        raise ValueError("Invalid name")
    return value

Rule: Normalize Unicode before you validate, and validate before you store. This prevents bypasses via homoglyphs (Cyrillic a vs Latin a) and width variants.

Length limitation

Always limit the length of input. This prevents:

Buffer overflows
ReDoS (Regular Expression Denial of Service)
Database overflow
Resource exhaustion

MAX_COMMENT_LENGTH = 5000

def validate_comment(value):
    if len(value) > MAX_COMMENT_LENGTH:
        raise ValueError(f"Comment too long (max {MAX_COMMENT_LENGTH} characters)")
    return value.strip()

Output encoding per context

The correct encoding depends on where you place the data. This is the most critical lesson: there is no universal sanitize function.

Context matrix

Output context	Encoding method	Example
HTML body	HTML entity encoding	`<` → `<`
HTML attribute	HTML entity encoding + quotes	`"` → `"`
JavaScript string	JavaScript string escaping	`'` → `\'`, `\n` → `\\n`
URL parameter	Percent-encoding	→ `%20`, `&` → `%26`
CSS value	CSS escaping	`\` → `\\`, `(` → `\28`
SQL query	Parameterized queries	No encoding — use placeholders
JSON	JSON serialization	Use `json.dumps()`, never string concatenation
Command line	No encoding — use arrays	No shell, pass args as list

HTML entity encoding

# Python — standard library
import html

user_input = '<script>alert("XSS")</script>'
safe = html.escape(user_input)
# &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

// Java — OWASP Java Encoder
import org.owasp.encoder.Encode;

String safe = Encode.forHtml(userInput);
String safeAttr = Encode.forHtmlAttribute(userInput);
String safeJs = Encode.forJavaScript(userInput);

// JavaScript (server-side Node.js)
const he = require('he');

const safe = he.encode(userInput);

// PHP
$safe = htmlspecialchars($userInput, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// C# — System.Text.Encodings.Web
using System.Text.Encodings.Web;

string safe = HtmlEncoder.Default.Encode(userInput);
string safeJs = JavaScriptEncoder.Default.Encode(userInput);
string safeUrl = UrlEncoder.Default.Encode(userInput);

JavaScript string escaping

# Never this:
f"var name = '{user_input}';"  # XSS via '; alert(1); //

# Do this instead:
import json
f"var name = {json.dumps(user_input)};"  # Safely escaped

URL encoding

from urllib.parse import quote, urlencode

# Single parameter
safe_param = quote(user_input)

# Multiple parameters
params = urlencode({'search': user_input, 'page': '1'})
url = f"https://example.com/search?{params}"

SQL — always parameterized queries

# WRONG — string concatenation
cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")

# RIGHT — parameterized
cursor.execute("SELECT * FROM users WHERE name = %s", (name,))

// RIGHT — PreparedStatement
PreparedStatement stmt = conn.prepareStatement(
    "SELECT * FROM users WHERE name = ?");
stmt.setString(1, name);

// RIGHT — SqlParameter
using var cmd = new SqlCommand(
    "SELECT * FROM users WHERE name = @name", conn);
cmd.Parameters.AddWithValue("@name", name);

JSON serialization

import json

# WRONG — manual construction
response = '{"name": "' + user_input + '"}'

# RIGHT — json.dumps escapes automatically
response = json.dumps({"name": user_input})

Command line — never shell=True

import subprocess

# WRONG — command injection via shell
subprocess.run(f"convert {filename} output.png", shell=True)

# RIGHT — arguments as list, no shell
subprocess.run(["convert", filename, "output.png"])

Libraries per language

Language	Library	Functionality
JavaScript	DOMPurify	HTML sanitization (client-side)
JavaScript	he	HTML entity encode/decode
Python	bleach	HTML sanitization (server-side)
Python	html.escape	Basic HTML escaping
Python	markupsafe	Jinja2 auto-escaping
Java	OWASP Java Encoder	Context-specific encoding
Java	jsoup	HTML sanitization + parsing
Go	html/template	Auto-escaping templates
Go	bluemonday	HTML sanitization
C#	HtmlSanitizer	HTML sanitization
C#	System.Text.Encodings.Web	HTML/JS/URL encoding
PHP	htmlspecialchars	HTML escaping (built-in)
PHP	HTMLPurifier	HTML sanitization

DOMPurify (JavaScript, client-side)

// HTML sanitization with DOMPurify
const clean = DOMPurify.sanitize(userInput);

// With configuration — only allow certain tags
const clean = DOMPurify.sanitize(userInput, {
  ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p', 'br'],
  ALLOWED_ATTR: ['href'],
});

bleach (Python, server-side)

import bleach

# Basic sanitization
clean = bleach.clean(user_input)

# With allowlist
clean = bleach.clean(
    user_input,
    tags=['b', 'i', 'em', 'strong', 'a', 'p', 'br', 'ul', 'ol', 'li'],
    attributes={'a': ['href', 'title']},
    protocols=['https'],
)

Pitfalls

Double encoding

# user_input = "&lt;script&gt;"
# First time: already encoded
html.escape(user_input)
# Result: "&amp;lt;script&amp;gt;" — double encoded, visible as &lt;script&gt;

Solution: encode at one place, as late as possible (at the output).

Template engines and auto-escaping

Most modern template engines escape automatically:

Template engine	Auto-escape by default?	Bypass syntax
Jinja2 (Flask)	Yes	`{{ value\\|safe }}` or `{% autoescape false %}`
Django templates	Yes	`{{ value\\|safe }}` or `{% autoescape off %}`
Go html/template	Yes	`template.HTML(value)`
Thymeleaf (Java)	Yes	`th:utext` (unescaped)
Razor (C#)	Yes	`@Html.Raw(value)`
ERB (Ruby)	No (default)	`<%= value %>` escaped with `h()`
PHP	No	Manual `htmlspecialchars()`

Rule: Use |safe, Raw(), utext and similar bypass mechanisms only on values that you generated yourself or have already sanitized. Never on user input.

Mixed contexts

<!-- DANGEROUS — JavaScript in an HTML attribute -->
<a href="#" onclick="doSomething('{{ user_input }}')">Click</a>

Here you are in two contexts simultaneously: HTML attribute and JavaScript. You must first JavaScript-escape, then HTML-attribute-escape. This is error-prone and should be avoided. Use this instead:

<a href="#" id="action-link" data-value="{{ user_input }}">Click</a>
<script>
  document.getElementById('action-link').addEventListener('click', function() {
    doSomething(this.dataset.value);
  });
</script>

Validation at system boundaries

API endpoints

from pydantic import BaseModel, Field, validator

class CreateUserRequest(BaseModel):
    username: str = Field(min_length=3, max_length=30, pattern=r'^[a-zA-Z0-9_]+$')
    email: str = Field(max_length=254)
    age: int = Field(ge=18, le=150)

    @validator('email')
    def validate_email(cls, v):
        # Use a library for email validation
        if '@' not in v or '.' not in v.split('@')[1]:
            raise ValueError('Invalid email address')
        return v.lower()

@app.post('/api/users')
def create_user(data: CreateUserRequest):
    # data is validated by Pydantic
    ...

Database layer

# Limit query lengths
MAX_SEARCH_LENGTH = 200

def search_products(query: str):
    query = query[:MAX_SEARCH_LENGTH].strip()
    return db.execute(
        "SELECT * FROM products WHERE name LIKE %s LIMIT 50",
        (f"%{query}%",)
    )

File system

import os

UPLOAD_DIR = '/var/www/uploads'

def safe_save(filename: str, content: bytes):
    # Remove path components
    filename = os.path.basename(filename)

    # Allowlist file extensions
    allowed_ext = {'.pdf', '.png', '.jpg', '.docx'}
    _, ext = os.path.splitext(filename)
    if ext.lower() not in allowed_ext:
        raise ValueError(f"File type {ext} not allowed")

    # Generate a safe filename
    import uuid
    safe_name = f"{uuid.uuid4().hex}{ext.lower()}"

    # Verify that the path stays within UPLOAD_DIR
    full_path = os.path.join(UPLOAD_DIR, safe_name)
    if not os.path.realpath(full_path).startswith(os.path.realpath(UPLOAD_DIR)):
        raise ValueError("Path traversal detected")

    with open(full_path, 'wb') as f:
        f.write(content)
    return safe_name

CLI parameters

import subprocess
import shlex

# WRONG — shell injection
def run_tool(target):
    subprocess.run(f"nmap {target}", shell=True)

# RIGHT — arguments as list
def run_tool(target):
    # Validate first
    import re
    if not re.fullmatch(r'[\w.\-:]+', target):
        raise ValueError("Invalid target")
    subprocess.run(["nmap", target])

Checklist

Measure	Description	Priority
Allowlist validation	Define what is allowed, block the rest	Critical
Type and range checks	Number is number, date is date	Critical
Length limitation	Maximum length on all input fields	High
Unicode normalization	NFC/NFKC before validation	High
Parameterized queries	Never string concatenation in SQL	Critical
Template auto-escaping	Make sure it's enabled and don't bypass unnecessarily	Critical
Context-specific encoding	Use the correct encoding per sink	Critical
Filename sanitization	`os.path.basename()` + allowlist extensions	High
Command arguments as list	`subprocess.run(["cmd", arg])`, never `shell=True`	Critical
API schema validation	Pydantic, JSON Schema, or equivalent	High

It's actually quite simple. You need two rules. Two.

Rule one: trust nothing that comes from outside. Not the form field, not the URL, not the header, not the cookie, not the file, not the API response from the "trusted partner" whose system you pentested last year and that had three critical SQL injections at the time.

Rule two: when data leaves your system — to the browser, the database, the file system, the command line — encode it for that specific context. HTML in HTML, JavaScript in JavaScript, SQL via parameters.

Two rules. That's it. And yet SQL injection and XSS have existed for more than twenty-five years. We haven't solved them. We haven't even reduced them. They're still in the OWASP Top 10. They were in the first OWASP Top 10, in 2003. Twenty-three years ago.

The solution is known. The tools exist. The libraries are free. The documentation is excellent. But somewhere between "we know how to do it" and "we actually do it" there's a gap so wide that you could fit a data center in it. And in that gap there's a Post-it note with "TODO: input sanitization" that's been there since the first sprint.

Summary

Input validation and output encoding are the fundamental defenses against injection attacks. Validate with allowlists, restrict types and lengths, and normalize Unicode before processing. Encode every output for the specific context: HTML entities for HTML, parameterized queries for SQL, lists for command-line arguments.

In the next chapter, we cover the transport layer: how do you configure TLS so that all that carefully validated and encoded data also travels securely over the network?

← Previous Security Headers Next → TLS/SSL Configuration

Op de hoogte blijven?

Ontvang maandelijks cybersecurity-inzichten in je inbox.

← Web Security ← Home

Input Validation & Output Encoding

Input Validation & Output Encoding

Code With Boundaries, Production With Confidence

Immediate measures (15 minutes)

Why this matters

Core Principles

Validate input, encode output

Never trust

Input Validation

Allowlist over blocklist

Type, range, and format

Unicode normalization

Length limitation

Output encoding per context

Context matrix

HTML entity encoding

JavaScript string escaping

URL encoding

SQL — always parameterized queries

JSON serialization

Command line — never shell=True

Libraries per language

DOMPurify (JavaScript, client-side)

bleach (Python, server-side)

Pitfalls

Double encoding

Template engines and auto-escaping

Mixed contexts

Validation at system boundaries

API endpoints

Database layer

File system

CLI parameters

Checklist

Further reading in the knowledge base

Op de hoogte blijven?

Support this project

Input Validation & Output Encoding

Code With Boundaries, Production With Confidence

Immediate measures (15 minutes)

Why this matters

Core Principles

Validate input, encode output

Never trust

Input Validation

Allowlist over blocklist

Type, range, and format

Unicode normalization

Length limitation

Output encoding per context

Context matrix

HTML entity encoding

JavaScript string escaping

URL encoding

SQL — always parameterized queries

JSON serialization

Command line — never shell=True

Libraries per language

DOMPurify (JavaScript, client-side)

bleach (Python, server-side)

Pitfalls

Double encoding

Template engines and auto-escaping

Mixed contexts

Validation at system boundaries

API endpoints

Database layer

File system

CLI parameters

Checklist

Further reading in the knowledge base

Related security measures

Op de hoogte blijven?

Gerelateerde artikelen

Threat Modeling in Practice

ADR Security Decision-Making

Reference Architectures (Web, Network, Cloud)