Objective
Pure GitOps requires all changes go through Git, but rigid enforcement during a P0 incident can make the outage worse. The break-glass procedure defines a controlled, auditable exception: a special branch that bypasses review requirements, with compensating controls to ensure accountability. This exercise implements the mechanism and produces a full audit trail from incident trigger to pod update to post-incident normalisation.
Prerequisites
- Flux v2 bootstrapped with a GitHub repository
- GitHub repository with branch protection on main (requiring PR reviews)
- Slack or PagerDuty webhook for notifications (optional)
- CODEOWNERS file configured for the repository
Steps
Design the break-glass policy
Document the policy before implementing the mechanism. The policy must define: who can invoke it, under what conditions, and what audit trail is required.
## Break-Glass Policy (document this in your wiki) ## Trigger conditions - Active P0/P1 incident (SEV1/SEV2 in PagerDuty) - Normal PR review cannot complete within required response time - Hotfix is understood, scoped, and reversible ## Authorisation - Must be approved verbally by on-call lead AND engineering manager - Slack notification to #incidents channel is mandatory - Incident ticket number must be in the commit message ## Mechanism - Create branch: hotfix/INC-{ticket-number}-{description} - Merge directly to main using GitHub "Bypass branch protection" (admin only) - Flux detects the commit and applies within 60 seconds - Label the commit: type=break-glass, incident=INC-{number} ## Post-incident requirements (within 24 hours) - Create follow-up PR with the same change through normal review - Update runbook if break-glass revealed a gap - File post-incident review with root cause and timeline
Set up GitHub branch protection with bypass rules
Configure branch protection so normal PRs require review, but repository admins can bypass in emergencies. This creates the controlled exception mechanism.
# Configure via GitHub CLI (gh) # Enable branch protection on main with review requirement gh api repos/:owner/:repo/branches/main/protection \ --method PUT \ --field required_pull_request_reviews='{"required_approving_review_count":2}' \ --field enforce_admins=false \ --field restrictions=null \ --field required_status_checks='{"strict":true,"contexts":["ci/secure-pipeline"]}' # GitHub's "Allow bypass" is set by adding admin users to # the bypass list in Settings → Rules → Branch ruleset # Admins can force-merge without reviews when needed # Add CODEOWNERS for change notifications cat > .github/CODEOWNERS << 'EOF' # Global code owners - required reviewers for all PRs * @platform-team-leads # Cluster configs require additional security review /clusters/production/ @security-team @platform-team-leads /apps/overlays/production/ @security-team EOF git add .github/CODEOWNERS git commit -m "policy: add CODEOWNERS for change control" git push
Simulate a P0 incident requiring a break-glass change
The scenario: a critical security vulnerability requires an immediate image update. The on-call engineer cannot wait for 2 reviewers to be available. Walk through the complete procedure.
# Step 1: Notify the team (simulate Slack notification) echo " [BREAK-GLASS INITIATED] Incident: INC-2024-0042 Severity: P0 Reason: CVE-2024-XXXXX in nginx:1.25.3 — active exploitation Approver: Jane Smith (on-call lead) Change: nginx image update 1.25.3 → 1.25.4-patched Time: $(date -u) " | tee break-glass-audit.txt # Step 2: Create the hotfix branch git checkout -b hotfix/INC-2024-0042-nginx-cve-patch # Step 3: Make the emergency change sed -i 's/nginx:1.25.3/nginx:1.25.4-patched/' \ apps/base/nginx/deployment.yaml # Step 4: Commit with mandatory metadata git add apps/base/nginx/deployment.yaml git commit -m "fix(INC-2024-0042): patch nginx CVE-2024-XXXXX BREAK-GLASS change - bypassed normal review process. Approved by: jane.smith@company.com (on-call lead) Incident: INC-2024-0042 CVE: CVE-2024-XXXXX Change: nginx 1.25.3 → 1.25.4-patched (security patch) Post-incident PR required within 24 hours. Co-reviewed by: mark.jones@company.com (eng manager)" git push origin hotfix/INC-2024-0042-nginx-cve-patch
Merge to main using admin bypass
# As admin: merge directly to main bypassing review requirement gh pr create \ --title "fix(INC-2024-0042): BREAK-GLASS nginx CVE patch" \ --body "$(cat break-glass-audit.txt)" \ --base main \ --head hotfix/INC-2024-0042-nginx-cve-patch \ --label "break-glass,security,incident" # Admin bypass merge (requires admin role on the repo) gh pr merge --admin --squash \ --subject "fix(INC-2024-0042): BREAK-GLASS nginx CVE patch" # Record the commit SHA for audit trail HOTFIX_SHA=$(git log origin/main -1 --format="%H") echo "Hotfix commit: $HOTFIX_SHA" | tee -a break-glass-audit.txt
Trace Flux reconciliation to running pod
Watch the complete chain: Git commit → Flux detects → applies → pod restarts.
# Force immediate Flux sync (don't wait 60s interval) flux reconcile source git flux-system flux reconcile kustomization nginx-app # Watch the pod update kubectl rollout status deployment/nginx -n default -w # Verify the new image is running kubectl get pods -l app=nginx \ -o jsonpath='{.items[*].spec.containers[0].image}' # Expected: nginx:1.25.4-patched # Build the complete audit trace echo "=== AUDIT TRAIL ===" > audit-report.txt echo "Git commit: $HOTFIX_SHA" >> audit-report.txt echo "Flux reconcile: $(flux get kustomization nginx-app | tail -1)" >> audit-report.txt echo "Pod image: $(kubectl get pods -l app=nginx -o jsonpath='{.items[0].spec.containers[0].image}')" >> audit-report.txt echo "Timestamp: $(date -u)" >> audit-report.txt cat audit-report.txt
Post-incident cleanup procedure
Within 24 hours, the break-glass change must be normalised through the standard process.
# 1. Create a proper follow-up PR through normal process git checkout -b follow-up/INC-2024-0042-nginx-cve-formal-review # (change is already in main, this PR documents the review) git commit --allow-empty \ -m "docs(INC-2024-0042): formal review of break-glass change This commit documents the post-incident review of the emergency change applied during INC-2024-0042. The change has already been deployed. This PR provides the retroactive review trail. Reviewed by: jane.smith@company.com, mark.jones@company.com Incident review: [link to PIR document]" git push origin follow-up/INC-2024-0042-nginx-cve-formal-review # 2. Create the PR for formal review gh pr create \ --title "follow-up(INC-2024-0042): post-incident formal review" \ --body "Post-incident formal review of break-glass change from $(date -u -d '1 hour ago')" \ --base main # 3. Check git log shows the full timeline git log origin/main --oneline --graph -10 # 4. Verify Flux audit events flux events --for Kustomization/nginx-app | head -20
Success Criteria
Further Reading
- GitOps break-glass patterns — fluxcd.io/flux/guides/emergency-procedures
- GitHub branch protection rules — docs.github.com/repositories/configuring-branches-and-merges/managing-rulesets
- Incident management — atlassian.com/incident-management
- Post-incident review templates — postmortems.io