XXE Prevention
Web Security Without Firefighting
Web risk is rarely mysterious. It usually lies in predictable mistakes that persist under time pressure.
For XXE Prevention, the biggest gains lie in safe defaults that are automatically enforced in every release.
This makes security less of a separate afterthought and more of a standard quality of your product.
Immediate measures (15 minutes)
Why this matters
The core of XXE Prevention is risk reduction in practice. Technical context supports the choice of measures, but implementation and assurance are central.
Defense: how to pull the teeth of XML parsers
After all this mayhem, it's time to talk about how to protect yourself. And the good news is: the defense against XXE is relatively simple. The bad news is that "simple" is not the same as "gets done."
Step 1: Disable DTD processing
The most effective defense is completely disabling DTD processing. No DTDs, no entities, no problem:
Python (defusedxml):
import defusedxml.ElementTree as ET
# Safe - blocks external entities and DTDs
tree = ET.parse('input.xml')The defusedxml library is a drop-in replacement
for Python's standard XML libraries that disables all dangerous
features by default.
Python (lxml):
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
dtd_validation=False,
load_dtd=False
)
tree = etree.parse('input.xml', parser)Java:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);PHP:
Step 2: Don't use XML if you don't have to
This sounds like an open door, but it's astonishing how often XML is used when JSON would be perfectly fine. JSON has no entities, no DTDs, and no external references. It is boring, predictable, and safe. Exactly what you want from a data format.
If you can choose between XML and JSON for a new API, choose JSON. If you must support XML for backward compatibility, make sure the parser is configured as if XML is a potentially dangerous animal -- because it is.
Step 3: Validate and sanitize
If you must accept XML:
- Block any document that contains a DOCTYPE declaration
- Reject documents with SYSTEM or PUBLIC entities
- Limit entity expansion depth
- Set a maximum for the size of expanded content (against Billion Laughs)
Step 4: Network segmentation
Even if a parser is vulnerable, network segmentation limits the impact. If the web server cannot initiate outbound traffic (except to specific backend services), OOB exfiltration cannot take place. The parser may read the file, but the data goes nowhere.
This is defense in depth: multiple layers of security that are each insufficient on their own, but together form a solid defense.
The uncomfortable conversation about XML parsers
Let's be honest. The fact that XML external entities are enabled by default in most parsers is one of the most absurd design decisions in the history of software. It's like selling a car whose doors don't lock by default, and then writing in the manual: "Don't forget to lock the doors."
The XML specification is from 1998. External entities were a feature. They were intended for document management in large organizations, for reusing text in technical documentation, for modularly building complex XML structures. Noble goals, all of them. But the people who wrote the specification worked in a world where XML documents came from trusted sources. Internal systems. Colleagues.
They had not foreseen that in 2026 any random website visitor could submit XML to a server. They had not foreseen that this XML would be parsed by a component that trusts everything by default. They had not foreseen that a feature intended for document management would be abused to steal password files.
And yet, now that we know -- for more than fifteen years -- external entities are still enabled by default in countless parsers. The documentation says "disable this in production." Developers don't read the documentation. The parser works fine without configuration. The XML document is correctly parsed. The unit tests pass. And somewhere on a server, files are being read by anyone who knows how to write a DOCTYPE.
It is the triumph of backward compatibility over common sense. It is the reason we need penetration testers. And it is the reason this chapter exists.
Summary
XXE is a vulnerability that stems from trust -- trust in the input, trust in the parser, trust in the default configuration. It is a reminder that features and vulnerabilities are sometimes the same thing, depending on who writes the XML.
The defense is simple: disable what you don't need. Don't trust input. Configure your parsers as if every XML that arrives was written by someone who wants to harm you. Because one day, it will be.
References
| Source | URL |
|---|---|
| OWASP XXE Prevention Cheat Sheet | https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html |
| PortSwigger Web Security Academy -- XXE | https://portswigger.net/web-security/xxe |
| PayloadsAllTheThings -- XXE Injection | https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/XXE%20Injection |
| defusedxml (Python) | https://github.com/tiran/defusedxml |
| W3C XML 1.0 Specification | https://www.w3.org/TR/xml/ |
| Billion Laughs Attack | https://en.wikipedia.org/wiki/Billion_laughs_attack |
Further reading in the knowledge base
These articles in the portal provide more background and practical context:
- APIs — the invisible glue of the internet
- SSL/TLS — why that padlock in your browser matters
- Encryption — the art of making things unreadable
- Password hashing — how websites store your password
- Penetration tests vs. vulnerability scans
You need an account to access the knowledge base. Log in or register.
Related security measures
These articles provide additional context and depth: