jan-karel.com
Home / Security Measures / Reference & Architecture / Secret Management and API Key Rotation

Secret Management and API Key Rotation

Secret Management and API Key Rotation

Secret Management and API Key Rotation

API Rules That Don't Leak

This topic works best as a practical framework: clear enough for decision-making and concrete enough for execution.

With Secret Management and API Key Rotation, security only truly works when authorization is explicitly enforced per object and action.

This keeps the chapter from being pure theory, making it a usable compass for consistent execution.

Immediate measures (15 minutes)

Why this matters

The core of Secret Management and API Key Rotation is risk reduction in practice. Technical context supports the choice of measures, but implementation and enforcement are central.

Core principles

  1. A secret is an identity
    An API key represents an actor with rights. Whoever has the key, is that actor.

  2. Exposure = incident
    A leaked key means immediate incident classification, even without visible signs of abuse.

  3. Revoke before forensics
    First limit damage (revoke/disable), then investigate what exactly happened.

  4. Short TTL, automatic rotation
    The shorter the validity, the smaller your blast radius.

  5. Never distribute manually
    Secrets come from secret managers or CI injection, not via chat, email, or ticket text.

What exactly went wrong here?

This pattern occurs frequently:

  1. Team member shares a key "real quick" to make something work.
  2. Key appears in a system with logging/retention.
  3. Anyone with read access to those logs can reuse the key.
  4. Abuse only becomes visible when the budget runs out or rate limits are hit.

The lesson: speed without secret hygiene is deferred downtime.

Technical Rotation Playbook (0-60 minutes)

Phase 0 - Trigger (0-5 min)

  • Mark as SEV incident.
  • Designate incident commander and key owner.
  • Freeze non-essential deploys.

Phase 1 - Containment (5-15 min)

  • Revoke the suspected key.
  • Create new key with minimal scopes.
  • Link new key to secret store (Vault / AWS Secrets Manager / Azure Key Vault / GCP Secret Manager).

Phase 2 - Redistribution (15-35 min)

  • Update CI/CD secrets.
  • Update runtime secrets (Kubernetes secret reference, app config, serverless env vars).
  • Restart only components that are key-dependent.

Phase 3 - Verification (35-50 min)

  • Run smoke tests on critical paths.
  • Check auth errors and 401/403 ratio.
  • Validate billing and usage patterns.

Phase 4 - Forensics + enforcement (50-60 min)

  • Record timeline (discovery, revoke, recovery).
  • Determine root cause: process error, tooling gap, training gap.
  • Plan structural measure with owner and deadline.

Zero-Downtime Rotation Pattern (dual-key)

Use dual-key rollover where possible:

  1. activate key_new;
  2. let application accept both key_old and key_new (brief overlap);
  3. gradually shift traffic;
  4. monitor errors and latency;
  5. revoke key_old.

This prevents the classic "replace everything at once and production breaks" mistake.

Detection and Prevention

Prevention

  • Secret scanning in CI (gitleaks, trufflehog, platform-native scanning).
  • Server-side redaction in logs (mask tokens).
  • Pre-commit hooks for local blocking.
  • Prohibit plaintext secrets in chat/tickets via policy + awareness.

Detection

  • Alert on key usage from new geo/IP/ASN.
  • Alert on spike in token consumption or costs.
  • Alert on failed auth bursts after rotation (indicator of lagging clients).

Example CI gate:

# Blokkeer commit/build bij gedetecteerde secrets
gitleaks detect --source . --no-git --redact --exit-code 1

Governance enforcement (KPI/KRI)

Report monthly at a minimum:

Metric Target Signal value
MTTR for key compromise < 60 min > 120 min = escalation
% keys with rotation policy 100% < 95% = action plan
% keys with minimal scope > 98% < 95% = risk acceptance required
# plaintext secret incidents 0 any >0 = RCA mandatory
% workloads with secret manager instead of env-file > 95% < 90% = backlog priority

Security without measurability is an opinion.

Practical checklist

For technical staff

For executives

For consumers/employees

Relation to other chapters

  • Cloud 13 – Secrets Management: storage, distribution, cryptographic guarantees.
  • Cloud 06CI/CD Pipeline Hardening: policy-as-code and gates.
  • Reference 03 – Incident Response: escalation and reporting process.
  • Executives 10 – Security Metrics: governance yardstick and prioritization.

Summary

A leaked API key is not a small operational mistake but a full-fledged security incident. By revoking immediately, rotating in a controlled manner, and measuring at governance level, you turn panic work into a reproducible process. Maturity is not in "we never had a leak", but in "we recover quickly, demonstrably, and structurally".

Op de hoogte blijven?

Ontvang maandelijks cybersecurity-inzichten in je inbox.

← Reference & Architecture ← Home