jan-karel.com
Home / Security Measures / Network & Active Directory / Backup & Disaster Recovery

Backup & Disaster Recovery

Backup & Disaster Recovery

Backup & Disaster Recovery

Recovery Without Heroics

In network security, structure beats improvisation: clear paths, fewer privileges, and explicit trust boundaries.

With Backup & Disaster Recovery, recovery is only credible when restore tests demonstrate that time objectives are achievable.

This way you limit not only the likelihood of incidents, but especially the scope and duration when something goes wrong.

Immediate Measures (15 minutes)

Why This Matters

The core of Backup & Disaster Recovery is risk reduction in practice. Technical context supports the choice of measures, but implementation and assurance are central.

The 3-2-1-1-0 Rule

The classic 3-2-1 rule originates from photography — Peter Krogh formulated it in 2005 for managing digital photos. Three copies of your data, on two different media types, with one offsite. It was good advice. It is no longer enough.

In the era of ransomware, the rule has been extended to 3-2-1-1-0:

Component Meaning Why
3 copies Production + 2 backups Redundancy against hardware failure
2 media types Disk + tape, or disk + cloud Protection against media-specific problems
1 offsite Geographically separated location Protection against fire, flooding, theft
1 offline/immutable Not reachable via the network Protection against ransomware and attackers with network access
0 errors Every restore is tested and verified Certainty that the backup actually works

That last one — zero errors on restore — is where most organizations fail. They make backups. They never test them. And then, on the day it matters, they discover the backup is corrupt, incomplete, or missing a configuration that is crucial for startup.

Backup Strategies

Strategy How it works Backup speed Restore speed Storage space RPO
Full Copies everything, every time Slow Fast (1 backup needed) Large Per run
Incremental Only changed data since previous backup Fast Slow (chain needed) Small Per run
Differential Changed data since last full Medium Medium (full + 1 diff) Medium Per run
Continuous (CDP) Every write operation is logged Real-time Variable Large Seconds

RPO and RTO

Two numbers that define your disaster recovery plan:

  • RPO (Recovery Point Objective): how much data loss is acceptable? An RPO of 4 hours means you lose a maximum of 4 hours of data. This determines how often you back up.
  • RTO (Recovery Time Objective): how quickly must you be operational again? An RTO of 8 hours means recovery must be completed within 8 hours. This determines how you back up.

The mistake everyone makes: RPO and RTO are determined by management based on what they want, not on what is technically feasible. "We want an RPO of 15 minutes and an RTO of 1 hour" sounds wonderful in a meeting. The reality is that restoring a domain controller from a system state backup alone can take six hours.

Immutable Backups

A backup that the attacker can delete is not a backup. It is a deferred disappointment.

S3 Object Lock (AWS)

{
    "Rules": [
        {
            "ID": "immutable-backup-rule",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "backups/"
            },
            "DefaultRetention": {
                "Mode": "COMPLIANCE",
                "Days": 90
            }
        }
    ]
}
# Enable Object Lock on the bucket (must be set at creation)
aws s3api create-bucket \
    --bucket backup-immutable-2026 \
    --object-lock-enabled-for-object-lock-configuration

# Apply Object Lock configuration
aws s3api put-object-lock-configuration \
    --bucket backup-immutable-2026 \
    --object-lock-configuration '{
        "ObjectLockEnabled": "Enabled",
        "Rule": {
            "DefaultRetention": {
                "Mode": "COMPLIANCE",
                "Days": 90
            }
        }
    }'

COMPLIANCE mode means that nobody — not even the root account — can delete the object during the retention period. GOVERNANCE mode allows deletion with special permissions. Use COMPLIANCE.

restic with append-only repository

# Initialize a restic repository with append-only backend
restic init --repo sftp:backup@vault:/restic-repo

# Create a backup
restic backup --repo sftp:backup@vault:/restic-repo /etc /var/lib

# On the backup server: configure the SFTP user as append-only
# /home/backup/.ssh/authorized_keys:
command="restic-server --append-only --path /restic-repo",restrict ssh-ed25519 AAAA...

# The client can create backups but not delete them
# restic forget and restic prune do not work via append-only

Linux hardened backup repository

# Dedicated backup user without shell
useradd -r -s /usr/sbin/nologin -d /backup backupuser

# Immutable filesystem with btrfs snapshots
btrfs subvolume snapshot -r /backup/current /backup/snapshots/$(date +%Y%m%d)
# The -r flag makes the snapshot read-only

# Or with ZFS
zfs snapshot backup/data@$(date +%Y%m%d)
zfs hold keep backup/data@$(date +%Y%m%d)
# hold prevents automatic deletion

Active Directory Backup

Maersk nearly lost everything because they had no offline AD backup. Active Directory is the heart of a Windows environment — without AD there is no authentication, no authorization, no GPOs, no DNS (if you run AD-integrated DNS, which almost everyone does).

System state backup

# Install Windows Server Backup
Install-WindowsFeature Windows-Server-Backup

# System state backup to dedicated disk
wbadmin start systemstatebackup -backuptarget:E: -quiet

# Schedule: daily, to a disk that is NOT domain-joined
# Use a local account, not a domain account

What is included in a system state backup?

  • Active Directory database (NTDS.dit)
  • SYSVOL (Group Policy Objects, scripts)
  • Registry
  • Boot files
  • Certificate Services database (if the DC is also a CA — which it shouldn't be, but often is)

DSRM (Directory Services Restore Mode)

# Set the DSRM password (do this NOW, not during an incident)
ntdsutil
  set dsrm password
  reset password on server null
  <password>
  q
  q

# Document and store the DSRM password in a physical safe
# Not in a password manager that depends on AD

The DSRM password is your emergency exit. If AD is corrupt, you boot the DC into DSRM and restore from the system state backup. Without the DSRM password, you cannot access the recovery procedure. Store it offline. On paper. In a safe.

Ransomware-Resilient Backup Architecture

Modern ransomware groups are not stupid. They know that backups undermine their business model. That is why one of the first actions after obtaining domain admin is: destroy the backups.

What you need to cover

Action Command/method Impact
Delete shadow copies vssadmin delete shadows /all /quiet Windows restore points gone
Steal Veeam credentials Credential dump from Veeam database (see cred_dpapi) Access to backup infrastructure
Stop backup agent Stop-Service VeeamBackupSvc No new backups
Encrypt backup repository Ransomware on the backup server All backups unusable
Wipe NAS SMB access to backup NAS with stolen credentials Offsite copy destroyed

Architecture that survives this

PRODUCTION NETWORK                   BACKUP NETWORK (separate VLAN)
+----------------+                   +----------------------+
| DC01, DC02     |                   | BACKUP-SRV           |
| File servers   +---[FW: only  ]--->| - Own credentials    |
| Applications   |   backup ports    | - Not domain-joined  |
+----------------+                   | - Append-only repo   |
                                     +----------+-----------+
                                                |
                                     +----------+-----------+
                                     | OFFLINE / AIR-GAPPED |
                                     | - Tape / USB         |
                                     | - Weekly rotation    |
                                     | - Physical safe      |
                                     +----------------------+

Design principles:

  1. Backup server not domain-joined — an attacker with domain admin does not have automatic access
  2. Separate credentials — the backup service runs with a local account, not a domain account
  3. Network segmentation — only backup ports are open, nothing else (see chapter 15)
  4. Append-only — the backup agent can write but not delete
  5. Air-gapped copy — at minimum weekly a copy to offline media

Recovery Testing

A backup you have not tested is a hypothesis. And hypotheses are uncomfortable when the building is on fire.

Test type Frequency What to validate Who is involved
Backup verification Daily (automated) Hash check, no errors in the backup job Backup administrator
Single file restore Monthly Restore a random file, verify contents Backup administrator
Full server restore Quarterly Fully restore a server on isolated hardware System administration
AD restore Semi-annually System state restore on isolated DC, DSRM login, AD validation AD team + management
Full DR drill Annually Restore the complete environment as if everything is gone Entire IT department
Tabletop exercise Semi-annually Walk through a scenario without actual recovery IT + management + communications

What a full DR drill looks like

Day 0 (Friday evening):
  "Scenario: ransomware has encrypted everything. All domain controllers,
   file servers, and application servers are unreachable. Email is down.
   Phone lists are on the file server (which is also encrypted).
   Begin recovery."

Day 1:
  - Who calls whom? (Without email, without phone lists on the server)
  - Where are the backup media physically?
  - Who has the DSRM password?
  - In what order are systems restored?
  - DC first, then DNS, then file server, then applications

Day 2-3:
  - AD restored and operational?
  - DNS works?
  - Authentication works?
  - Critical applications running?

Evaluation:
  - What went well?
  - What went wrong?
  - What was missing?
  - How long did it actually take? (compare with the promised RTO)

The most common discovery during a first DR drill: nobody knows where the tapes are, the DSRM password is unknown, and the recovery order was never documented. Better to discover that now than during an actual incident.

Cloud Backup

Cloud backup is not a replacement for local backups — it is the offsite component of the 3-2-1-1-0 rule.

Platform Service Immutability Cross-region Cost model
AWS AWS Backup + S3 Object Lock Yes (COMPLIANCE mode) Yes (cross-region replication) Per GB stored + transfer
Azure Azure Backup + immutable vault Yes (locked immutability policy) Yes (GRS / RA-GRS) Per protected instance + storage
GCP Cloud Storage + retention policy + bucket lock Yes (locked retention) Yes (dual/multi-region) Per GB stored + operations

Azure Backup immutable vault

# Create a Recovery Services Vault with immutability
$vault = New-AzRecoveryServicesVault `
    -Name "backup-vault-immutable" `
    -ResourceGroupName "rg-backup" `
    -Location "westeurope"

# Enable immutability (LOCKED = irreversible)
Update-AzRecoveryServicesVaultProperty `
    -VaultId $vault.ID `
    -ImmutabilityState Locked

# Configure soft delete (14 days extra protection)
Set-AzRecoveryServicesVaultProperty `
    -VaultId $vault.ID `
    -SoftDeleteFeatureState Enable

Warning about cloud-only backup

Cloud backup protects against local disasters. It does not automatically protect against an attacker who has stolen your cloud credentials. If the attacker compromises your Azure AD Global Administrator account, they can also delete your Azure Backup — unless you have enabled immutability. The incident chain is not hypothetical: domain admin on-premises leads to Azure AD Connect credential dump (see ad_entra_connect), leads to Global Administrator in Azure, leads to access to the backup vault.

Defense: immutability locks, separate break-glass accounts for backup management with hardware MFA tokens, and conditional access policies that restrict backup management to specific workstations.

Summary

Backup and disaster recovery are the difference between "we were hit by ransomware and are now recovering" and "we were hit by ransomware and are now paying." The 3-2-1-1-0 rule is the minimum: three copies, two media types, one offsite, one offline or immutable, and zero errors on restore tests. Immutable backups — via S3 Object Lock, append-only repositories, or locked vault policies — are the only guarantee that an attacker with full network access cannot destroy your backups. Active Directory deserves special attention: a system state backup to offline media, with a known and documented DSRM password, is the difference between ten days of recovery and a total loss. The backup server should not be domain-joined, runs on its own credentials, and resides in a separate network segment (see chapter 15). And above all: test your restores. Regularly. Completely. Including AD recovery. A backup you have never tested is a promise you have never kept. Maersk survived NotPetya thanks to a power outage in Ghana. That is not a strategy. That is luck. And luck is finite.

Op de hoogte blijven?

Ontvang maandelijks cybersecurity-inzichten in je inbox.

← Network & Active Directory ← Home