Backup & Disaster Recovery
Recovery Without Heroics
In network security, structure beats improvisation: clear paths, fewer privileges, and explicit trust boundaries.
With Backup & Disaster Recovery, recovery is only credible when restore tests demonstrate that time objectives are achievable.
This way you limit not only the likelihood of incidents, but especially the scope and duration when something goes wrong.
Immediate Measures (15 minutes)
Why This Matters
The core of Backup & Disaster Recovery is risk reduction in practice. Technical context supports the choice of measures, but implementation and assurance are central.
The 3-2-1-1-0 Rule
The classic 3-2-1 rule originates from photography — Peter Krogh formulated it in 2005 for managing digital photos. Three copies of your data, on two different media types, with one offsite. It was good advice. It is no longer enough.
In the era of ransomware, the rule has been extended to 3-2-1-1-0:
| Component | Meaning | Why |
|---|---|---|
| 3 copies | Production + 2 backups | Redundancy against hardware failure |
| 2 media types | Disk + tape, or disk + cloud | Protection against media-specific problems |
| 1 offsite | Geographically separated location | Protection against fire, flooding, theft |
| 1 offline/immutable | Not reachable via the network | Protection against ransomware and attackers with network access |
| 0 errors | Every restore is tested and verified | Certainty that the backup actually works |
That last one — zero errors on restore — is where most organizations fail. They make backups. They never test them. And then, on the day it matters, they discover the backup is corrupt, incomplete, or missing a configuration that is crucial for startup.
Backup Strategies
| Strategy | How it works | Backup speed | Restore speed | Storage space | RPO |
|---|---|---|---|---|---|
| Full | Copies everything, every time | Slow | Fast (1 backup needed) | Large | Per run |
| Incremental | Only changed data since previous backup | Fast | Slow (chain needed) | Small | Per run |
| Differential | Changed data since last full | Medium | Medium (full + 1 diff) | Medium | Per run |
| Continuous (CDP) | Every write operation is logged | Real-time | Variable | Large | Seconds |
RPO and RTO
Two numbers that define your disaster recovery plan:
- RPO (Recovery Point Objective): how much data loss is acceptable? An RPO of 4 hours means you lose a maximum of 4 hours of data. This determines how often you back up.
- RTO (Recovery Time Objective): how quickly must you be operational again? An RTO of 8 hours means recovery must be completed within 8 hours. This determines how you back up.
The mistake everyone makes: RPO and RTO are determined by management based on what they want, not on what is technically feasible. "We want an RPO of 15 minutes and an RTO of 1 hour" sounds wonderful in a meeting. The reality is that restoring a domain controller from a system state backup alone can take six hours.
Immutable Backups
A backup that the attacker can delete is not a backup. It is a deferred disappointment.
S3 Object Lock (AWS)
{
"Rules": [
{
"ID": "immutable-backup-rule",
"Status": "Enabled",
"Filter": {
"Prefix": "backups/"
},
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Days": 90
}
}
]
}# Enable Object Lock on the bucket (must be set at creation)
aws s3api create-bucket \
--bucket backup-immutable-2026 \
--object-lock-enabled-for-object-lock-configuration
# Apply Object Lock configuration
aws s3api put-object-lock-configuration \
--bucket backup-immutable-2026 \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Days": 90
}
}
}'COMPLIANCE mode means that nobody — not even
the root account — can delete the object during the retention period.
GOVERNANCE mode allows deletion with special permissions.
Use COMPLIANCE.
restic with append-only repository
# Initialize a restic repository with append-only backend
restic init --repo sftp:backup@vault:/restic-repo
# Create a backup
restic backup --repo sftp:backup@vault:/restic-repo /etc /var/lib
# On the backup server: configure the SFTP user as append-only
# /home/backup/.ssh/authorized_keys:
command="restic-server --append-only --path /restic-repo",restrict ssh-ed25519 AAAA...
# The client can create backups but not delete them
# restic forget and restic prune do not work via append-onlyLinux hardened backup repository
# Dedicated backup user without shell
useradd -r -s /usr/sbin/nologin -d /backup backupuser
# Immutable filesystem with btrfs snapshots
btrfs subvolume snapshot -r /backup/current /backup/snapshots/$(date +%Y%m%d)
# The -r flag makes the snapshot read-only
# Or with ZFS
zfs snapshot backup/data@$(date +%Y%m%d)
zfs hold keep backup/data@$(date +%Y%m%d)
# hold prevents automatic deletionActive Directory Backup
Maersk nearly lost everything because they had no offline AD backup. Active Directory is the heart of a Windows environment — without AD there is no authentication, no authorization, no GPOs, no DNS (if you run AD-integrated DNS, which almost everyone does).
System state backup
# Install Windows Server Backup
Install-WindowsFeature Windows-Server-Backup
# System state backup to dedicated disk
wbadmin start systemstatebackup -backuptarget:E: -quiet
# Schedule: daily, to a disk that is NOT domain-joined
# Use a local account, not a domain accountWhat is included in a system state backup?
- Active Directory database (NTDS.dit)
- SYSVOL (Group Policy Objects, scripts)
- Registry
- Boot files
- Certificate Services database (if the DC is also a CA — which it shouldn't be, but often is)
DSRM (Directory Services Restore Mode)
# Set the DSRM password (do this NOW, not during an incident)
ntdsutil
set dsrm password
reset password on server null
<password>
q
q
# Document and store the DSRM password in a physical safe
# Not in a password manager that depends on ADThe DSRM password is your emergency exit. If AD is corrupt, you boot the DC into DSRM and restore from the system state backup. Without the DSRM password, you cannot access the recovery procedure. Store it offline. On paper. In a safe.
Ransomware-Resilient Backup Architecture
Modern ransomware groups are not stupid. They know that backups undermine their business model. That is why one of the first actions after obtaining domain admin is: destroy the backups.
What you need to cover
| Action | Command/method | Impact |
|---|---|---|
| Delete shadow copies | vssadmin delete shadows /all /quiet |
Windows restore points gone |
| Steal Veeam credentials | Credential dump from Veeam database (see
cred_dpapi) |
Access to backup infrastructure |
| Stop backup agent | Stop-Service VeeamBackupSvc |
No new backups |
| Encrypt backup repository | Ransomware on the backup server | All backups unusable |
| Wipe NAS | SMB access to backup NAS with stolen credentials | Offsite copy destroyed |
Architecture that survives this
PRODUCTION NETWORK BACKUP NETWORK (separate VLAN)
+----------------+ +----------------------+
| DC01, DC02 | | BACKUP-SRV |
| File servers +---[FW: only ]--->| - Own credentials |
| Applications | backup ports | - Not domain-joined |
+----------------+ | - Append-only repo |
+----------+-----------+
|
+----------+-----------+
| OFFLINE / AIR-GAPPED |
| - Tape / USB |
| - Weekly rotation |
| - Physical safe |
+----------------------+
Design principles:
- Backup server not domain-joined — an attacker with domain admin does not have automatic access
- Separate credentials — the backup service runs with a local account, not a domain account
- Network segmentation — only backup ports are open, nothing else (see chapter 15)
- Append-only — the backup agent can write but not delete
- Air-gapped copy — at minimum weekly a copy to offline media
Recovery Testing
A backup you have not tested is a hypothesis. And hypotheses are uncomfortable when the building is on fire.
| Test type | Frequency | What to validate | Who is involved |
|---|---|---|---|
| Backup verification | Daily (automated) | Hash check, no errors in the backup job | Backup administrator |
| Single file restore | Monthly | Restore a random file, verify contents | Backup administrator |
| Full server restore | Quarterly | Fully restore a server on isolated hardware | System administration |
| AD restore | Semi-annually | System state restore on isolated DC, DSRM login, AD validation | AD team + management |
| Full DR drill | Annually | Restore the complete environment as if everything is gone | Entire IT department |
| Tabletop exercise | Semi-annually | Walk through a scenario without actual recovery | IT + management + communications |
What a full DR drill looks like
Day 0 (Friday evening):
"Scenario: ransomware has encrypted everything. All domain controllers,
file servers, and application servers are unreachable. Email is down.
Phone lists are on the file server (which is also encrypted).
Begin recovery."
Day 1:
- Who calls whom? (Without email, without phone lists on the server)
- Where are the backup media physically?
- Who has the DSRM password?
- In what order are systems restored?
- DC first, then DNS, then file server, then applications
Day 2-3:
- AD restored and operational?
- DNS works?
- Authentication works?
- Critical applications running?
Evaluation:
- What went well?
- What went wrong?
- What was missing?
- How long did it actually take? (compare with the promised RTO)
The most common discovery during a first DR drill: nobody knows where the tapes are, the DSRM password is unknown, and the recovery order was never documented. Better to discover that now than during an actual incident.
Cloud Backup
Cloud backup is not a replacement for local backups — it is the offsite component of the 3-2-1-1-0 rule.
| Platform | Service | Immutability | Cross-region | Cost model |
|---|---|---|---|---|
| AWS | AWS Backup + S3 Object Lock | Yes (COMPLIANCE mode) | Yes (cross-region replication) | Per GB stored + transfer |
| Azure | Azure Backup + immutable vault | Yes (locked immutability policy) | Yes (GRS / RA-GRS) | Per protected instance + storage |
| GCP | Cloud Storage + retention policy + bucket lock | Yes (locked retention) | Yes (dual/multi-region) | Per GB stored + operations |
Azure Backup immutable vault
# Create a Recovery Services Vault with immutability
$vault = New-AzRecoveryServicesVault `
-Name "backup-vault-immutable" `
-ResourceGroupName "rg-backup" `
-Location "westeurope"
# Enable immutability (LOCKED = irreversible)
Update-AzRecoveryServicesVaultProperty `
-VaultId $vault.ID `
-ImmutabilityState Locked
# Configure soft delete (14 days extra protection)
Set-AzRecoveryServicesVaultProperty `
-VaultId $vault.ID `
-SoftDeleteFeatureState EnableWarning about cloud-only backup
Cloud backup protects against local disasters. It does
not automatically protect against an attacker who has stolen
your cloud credentials. If the attacker compromises your Azure AD Global
Administrator account, they can also delete your Azure Backup — unless
you have enabled immutability. The incident chain is not hypothetical:
domain admin on-premises leads to Azure AD Connect credential dump (see
ad_entra_connect), leads to Global Administrator in Azure,
leads to access to the backup vault.
Defense: immutability locks, separate break-glass accounts for backup management with hardware MFA tokens, and conditional access policies that restrict backup management to specific workstations.
Summary
Backup and disaster recovery are the difference between "we were hit by ransomware and are now recovering" and "we were hit by ransomware and are now paying." The 3-2-1-1-0 rule is the minimum: three copies, two media types, one offsite, one offline or immutable, and zero errors on restore tests. Immutable backups — via S3 Object Lock, append-only repositories, or locked vault policies — are the only guarantee that an attacker with full network access cannot destroy your backups. Active Directory deserves special attention: a system state backup to offline media, with a known and documented DSRM password, is the difference between ten days of recovery and a total loss. The backup server should not be domain-joined, runs on its own credentials, and resides in a separate network segment (see chapter 15). And above all: test your restores. Regularly. Completely. Including AD recovery. A backup you have never tested is a promise you have never kept. Maersk survived NotPetya thanks to a power outage in Ghana. That is not a strategy. That is luck. And luck is finite.
Further reading in the knowledge base
These articles in the portal provide more background and practical context:
- Firewalls — the bouncer that doesn't stop everything
- Network segmentation — why you shouldn't connect everything to everything
- DNS — the phone book that holds the internet together
- Logging and monitoring — the security cameras of your IT environment
- Zero Trust — trust no one, not even yourself
You need an account to access the knowledge base. Log in or register.
Related security measures
These articles provide additional context and depth: