Secret Management and API Key Rotation
API Rules That Don't Leak
This topic works best as a practical framework: clear enough for decision-making and concrete enough for execution.
With Secret Management and API Key Rotation, security only truly works when authorization is explicitly enforced per object and action.
This keeps the chapter from being pure theory, making it a usable compass for consistent execution.
Immediate measures (15 minutes)
Why this matters
The core of Secret Management and API Key Rotation is risk reduction in practice. Technical context supports the choice of measures, but implementation and enforcement are central.
Core principles
A secret is an identity
An API key represents an actor with rights. Whoever has the key, is that actor.Exposure = incident
A leaked key means immediate incident classification, even without visible signs of abuse.Revoke before forensics
First limit damage (revoke/disable), then investigate what exactly happened.Short TTL, automatic rotation
The shorter the validity, the smaller your blast radius.Never distribute manually
Secrets come from secret managers or CI injection, not via chat, email, or ticket text.
What exactly went wrong here?
This pattern occurs frequently:
- Team member shares a key "real quick" to make something work.
- Key appears in a system with logging/retention.
- Anyone with read access to those logs can reuse the key.
- Abuse only becomes visible when the budget runs out or rate limits are hit.
The lesson: speed without secret hygiene is deferred downtime.
Technical Rotation Playbook (0-60 minutes)
Phase 0 - Trigger (0-5 min)
- Mark as SEV incident.
- Designate incident commander and key owner.
- Freeze non-essential deploys.
Phase 1 - Containment (5-15 min)
- Revoke the suspected key.
- Create new key with minimal scopes.
- Link new key to secret store (Vault / AWS Secrets Manager / Azure Key Vault / GCP Secret Manager).
Phase 2 - Redistribution (15-35 min)
- Update CI/CD secrets.
- Update runtime secrets (Kubernetes secret reference, app config, serverless env vars).
- Restart only components that are key-dependent.
Phase 3 - Verification (35-50 min)
- Run smoke tests on critical paths.
- Check auth errors and 401/403 ratio.
- Validate billing and usage patterns.
Phase 4 - Forensics + enforcement (50-60 min)
- Record timeline (discovery, revoke, recovery).
- Determine root cause: process error, tooling gap, training gap.
- Plan structural measure with owner and deadline.
Zero-Downtime Rotation Pattern (dual-key)
Use dual-key rollover where possible:
- activate
key_new; - let application accept both
key_oldandkey_new(brief overlap); - gradually shift traffic;
- monitor errors and latency;
- revoke
key_old.
This prevents the classic "replace everything at once and production breaks" mistake.
Detection and Prevention
Prevention
- Secret scanning in CI (
gitleaks,trufflehog, platform-native scanning). - Server-side redaction in logs (mask tokens).
- Pre-commit hooks for local blocking.
- Prohibit plaintext secrets in chat/tickets via policy + awareness.
Detection
- Alert on key usage from new geo/IP/ASN.
- Alert on spike in token consumption or costs.
- Alert on failed auth bursts after rotation (indicator of lagging clients).
Example CI gate:
# Blokkeer commit/build bij gedetecteerde secrets
gitleaks detect --source . --no-git --redact --exit-code 1Governance enforcement (KPI/KRI)
Report monthly at a minimum:
| Metric | Target | Signal value |
|---|---|---|
| MTTR for key compromise | < 60 min | > 120 min = escalation |
| % keys with rotation policy | 100% | < 95% = action plan |
| % keys with minimal scope | > 98% | < 95% = risk acceptance required |
| # plaintext secret incidents | 0 | any >0 = RCA mandatory |
| % workloads with secret manager instead of env-file | > 95% | < 90% = backlog priority |
Security without measurability is an opinion.
Practical checklist
For technical staff
For executives
For consumers/employees
Relation to other chapters
- Cloud 13 – Secrets Management: storage, distribution, cryptographic guarantees.
- Cloud 06 – CI/CD Pipeline Hardening: policy-as-code and gates.
- Reference 03 – Incident Response: escalation and reporting process.
- Executives 10 – Security Metrics: governance yardstick and prioritization.
Summary
A leaked API key is not a small operational mistake but a full-fledged security incident. By revoking immediately, rotating in a controlled manner, and measuring at governance level, you turn panic work into a reproducible process. Maturity is not in "we never had a leak", but in "we recover quickly, demonstrably, and structurally".
Further reading in the knowledge base
These articles in the portal give you more background and practical context:
- Incident Response — when things go wrong
- Compliance — following rules without losing your mind
- Least Privilege — give people only what they need
- Patch management — the most boring thing that can save your life
- Backups — the most boring topic that can save your life
You need an account to access the knowledge base. Log in or register.
Related security measures
These articles offer additional context and depth: