← Back to Guide
GitOps & CI/CD L3 · ADVANCED ~60 min

GitOps Break-Glass Emergency Change Procedure

Design and implement a GitOps break-glass procedure for critical hotfixes that bypasses normal PR review without losing auditability. Demonstrate a full trace from hotfix commit to running pod and implement the post-incident cleanup process.

Objective

Pure GitOps requires all changes go through Git, but rigid enforcement during a P0 incident can make the outage worse. The break-glass procedure defines a controlled, auditable exception: a special branch that bypasses review requirements, with compensating controls to ensure accountability. This exercise implements the mechanism and produces a full audit trail from incident trigger to pod update to post-incident normalisation.

The break-glass procedure should be rare (target: 0x/quarter) and every use must be followed by a post-incident review and a Git PR that brings the emergency change through the normal process.

Prerequisites

Steps

01

Design the break-glass policy

Document the policy before implementing the mechanism. The policy must define: who can invoke it, under what conditions, and what audit trail is required.

## Break-Glass Policy (document this in your wiki)

## Trigger conditions
- Active P0/P1 incident (SEV1/SEV2 in PagerDuty)
- Normal PR review cannot complete within required response time
- Hotfix is understood, scoped, and reversible

## Authorisation
- Must be approved verbally by on-call lead AND engineering manager
- Slack notification to #incidents channel is mandatory
- Incident ticket number must be in the commit message

## Mechanism
- Create branch: hotfix/INC-{ticket-number}-{description}
- Merge directly to main using GitHub "Bypass branch protection" (admin only)
- Flux detects the commit and applies within 60 seconds
- Label the commit: type=break-glass, incident=INC-{number}

## Post-incident requirements (within 24 hours)
- Create follow-up PR with the same change through normal review
- Update runbook if break-glass revealed a gap
- File post-incident review with root cause and timeline
02

Set up GitHub branch protection with bypass rules

Configure branch protection so normal PRs require review, but repository admins can bypass in emergencies. This creates the controlled exception mechanism.

# Configure via GitHub CLI (gh)

# Enable branch protection on main with review requirement
gh api repos/:owner/:repo/branches/main/protection \
  --method PUT \
  --field required_pull_request_reviews='{"required_approving_review_count":2}' \
  --field enforce_admins=false \
  --field restrictions=null \
  --field required_status_checks='{"strict":true,"contexts":["ci/secure-pipeline"]}'

# GitHub's "Allow bypass" is set by adding admin users to
# the bypass list in Settings → Rules → Branch ruleset
# Admins can force-merge without reviews when needed

# Add CODEOWNERS for change notifications
cat > .github/CODEOWNERS << 'EOF'
# Global code owners - required reviewers for all PRs
*  @platform-team-leads

# Cluster configs require additional security review
/clusters/production/  @security-team @platform-team-leads
/apps/overlays/production/  @security-team
EOF

git add .github/CODEOWNERS
git commit -m "policy: add CODEOWNERS for change control"
git push
03

Simulate a P0 incident requiring a break-glass change

The scenario: a critical security vulnerability requires an immediate image update. The on-call engineer cannot wait for 2 reviewers to be available. Walk through the complete procedure.

# Step 1: Notify the team (simulate Slack notification)
echo "
[BREAK-GLASS INITIATED]
Incident: INC-2024-0042
Severity: P0
Reason: CVE-2024-XXXXX in nginx:1.25.3 — active exploitation
Approver: Jane Smith (on-call lead)
Change: nginx image update 1.25.3 → 1.25.4-patched
Time: $(date -u)
" | tee break-glass-audit.txt

# Step 2: Create the hotfix branch
git checkout -b hotfix/INC-2024-0042-nginx-cve-patch

# Step 3: Make the emergency change
sed -i 's/nginx:1.25.3/nginx:1.25.4-patched/' \
  apps/base/nginx/deployment.yaml

# Step 4: Commit with mandatory metadata
git add apps/base/nginx/deployment.yaml
git commit -m "fix(INC-2024-0042): patch nginx CVE-2024-XXXXX

BREAK-GLASS change - bypassed normal review process.
Approved by: jane.smith@company.com (on-call lead)
Incident: INC-2024-0042
CVE: CVE-2024-XXXXX
Change: nginx 1.25.3 → 1.25.4-patched (security patch)

Post-incident PR required within 24 hours.
Co-reviewed by: mark.jones@company.com (eng manager)"

git push origin hotfix/INC-2024-0042-nginx-cve-patch
04

Merge to main using admin bypass

# As admin: merge directly to main bypassing review requirement
gh pr create \
  --title "fix(INC-2024-0042): BREAK-GLASS nginx CVE patch" \
  --body "$(cat break-glass-audit.txt)" \
  --base main \
  --head hotfix/INC-2024-0042-nginx-cve-patch \
  --label "break-glass,security,incident"

# Admin bypass merge (requires admin role on the repo)
gh pr merge --admin --squash \
  --subject "fix(INC-2024-0042): BREAK-GLASS nginx CVE patch"

# Record the commit SHA for audit trail
HOTFIX_SHA=$(git log origin/main -1 --format="%H")
echo "Hotfix commit: $HOTFIX_SHA" | tee -a break-glass-audit.txt
05

Trace Flux reconciliation to running pod

Watch the complete chain: Git commit → Flux detects → applies → pod restarts.

# Force immediate Flux sync (don't wait 60s interval)
flux reconcile source git flux-system
flux reconcile kustomization nginx-app

# Watch the pod update
kubectl rollout status deployment/nginx -n default -w

# Verify the new image is running
kubectl get pods -l app=nginx \
  -o jsonpath='{.items[*].spec.containers[0].image}'
# Expected: nginx:1.25.4-patched

# Build the complete audit trace
echo "=== AUDIT TRAIL ===" > audit-report.txt
echo "Git commit: $HOTFIX_SHA" >> audit-report.txt
echo "Flux reconcile: $(flux get kustomization nginx-app | tail -1)" >> audit-report.txt
echo "Pod image: $(kubectl get pods -l app=nginx -o jsonpath='{.items[0].spec.containers[0].image}')" >> audit-report.txt
echo "Timestamp: $(date -u)" >> audit-report.txt
cat audit-report.txt
06

Post-incident cleanup procedure

Within 24 hours, the break-glass change must be normalised through the standard process.

# 1. Create a proper follow-up PR through normal process
git checkout -b follow-up/INC-2024-0042-nginx-cve-formal-review
# (change is already in main, this PR documents the review)
git commit --allow-empty \
  -m "docs(INC-2024-0042): formal review of break-glass change

This commit documents the post-incident review of the emergency
change applied during INC-2024-0042. The change has already been
deployed. This PR provides the retroactive review trail.

Reviewed by: jane.smith@company.com, mark.jones@company.com
Incident review: [link to PIR document]"
git push origin follow-up/INC-2024-0042-nginx-cve-formal-review

# 2. Create the PR for formal review
gh pr create \
  --title "follow-up(INC-2024-0042): post-incident formal review" \
  --body "Post-incident formal review of break-glass change from $(date -u -d '1 hour ago')" \
  --base main

# 3. Check git log shows the full timeline
git log origin/main --oneline --graph -10

# 4. Verify Flux audit events
flux events --for Kustomization/nginx-app | head -20

Success Criteria

Further Reading