Stateful Workload Node Pool Upgrade Runbook

Objective

Stateful workloads require special handling during node drains — a database pod cannot be recreated instantly because it must wait for the PVC to detach from the old node and reattach to the new one. This exercise designs a complete upgrade runbook, executes it against a cluster with a real StatefulSet database, measures RTO, and documents gaps between expected and actual behaviour.

Prerequisites

Kubernetes cluster with at least 3 worker nodes across 2+ zones
kubectl with cluster-admin access and SSH to nodes (for timing validation)
A storage class that supports topology-aware binding
Velero or cloud snapshots configured for PVC backup

Steps

01

Deploy stateful and stateless test workloads

# Deploy PostgreSQL StatefulSet (stateful workload)
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: default
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels: {app: postgres}
  template:
    metadata:
      labels: {app: postgres}
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_PASSWORD
          value: testpassword
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        resources:
          requests: {cpu: 100m, memory: 256Mi}
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 5Gi
EOF

# Deploy stateless app with strict PDB
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateless-app
  namespace: default
spec:
  replicas: 4
  selector:
    matchLabels: {app: stateless-app}
  template:
    metadata:
      labels: {app: stateless-app}
    spec:
      containers:
      - name: app
        image: nginx:alpine
        resources:
          requests: {cpu: 50m, memory: 64Mi}
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: stateless-pdb
spec:
  minAvailable: 3
  selector:
    matchLabels: {app: stateless-app}
EOF

# Write test data to PostgreSQL
kubectl wait pod/postgres-0 --for=condition=Ready --timeout=120s
kubectl exec postgres-0 -- psql -U postgres -c \
  "CREATE TABLE upgrade_test (id serial, note text);
   INSERT INTO upgrade_test (note) VALUES ('pre-upgrade-data');"

# Verify test data
kubectl exec postgres-0 -- psql -U postgres -c \
  "SELECT * FROM upgrade_test;"

02

Pre-drain checks (runbook Phase 1)

# Record baseline state
echo "=== UPGRADE RUNBOOK EXECUTION LOG ===" | tee runbook.log
echo "Start: $(date -u)" | tee -a runbook.log

# P1.1: All workloads healthy
kubectl get pods --all-namespaces | grep -v Running | grep -v Completed | \
  tee -a runbook.log

# P1.2: PDB status (all must show DISRUPTIONS ALLOWED > 0)
kubectl get pdb --all-namespaces | tee -a runbook.log

# P1.3: PVC status (all must be Bound)
kubectl get pvc --all-namespaces | grep -v Bound | tee -a runbook.log

# P1.4: Node that hosts postgres-0
POSTGRES_NODE=$(kubectl get pod postgres-0 -o jsonpath='{.spec.nodeName}')
POSTGRES_ZONE=$(kubectl get node $POSTGRES_NODE \
  --output=jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}')
echo "postgres-0 is on $POSTGRES_NODE in zone $POSTGRES_ZONE" | tee -a runbook.log

# P1.5: Create backup
velero backup create pre-drain-$(date +%Y%m%d%H%M) \
  --include-namespaces default --wait | tee -a runbook.log

03

Drain the StatefulSet node (runbook Phase 2)

# Phase 2: Drain stateful node
echo "=== PHASE 2: DRAIN START $(date -u) ===" | tee -a runbook.log

kubectl cordon $POSTGRES_NODE
CORDON_TIME=$(date -u +%s)

# Monitor PDB blocking (open in separate terminal)
# watch -n 2 'kubectl get pdb --all-namespaces'

DRAIN_START=$(date -u +%s)
kubectl drain $POSTGRES_NODE \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=300s 2>&1 | tee -a runbook.log
DRAIN_END=$(date -u +%s)

echo "Drain took: $((DRAIN_END - DRAIN_START))s" | tee -a runbook.log

04

Monitor PVC detachment and pod rescheduling

# Watch postgres-0 reschedule sequence
kubectl get pod postgres-0 -w 2>&1 | head -20

# Expected sequence and timings:
# postgres-0  Running    → Terminating   (drain initiated)
# postgres-0  Terminating                (PVC detach: 5-30s)
# postgres-0  Pending                    (scheduling on new node)
# postgres-0  ContainerCreating          (PVC attach: 10-60s on cloud)
# postgres-0  Running                    (ready)

# Record timing until pod is Running again
POD_READY_START=$(date -u +%s)
kubectl wait pod/postgres-0 --for=condition=Ready --timeout=300s
POD_READY_END=$(date -u +%s)

STATEFUL_RTO=$((POD_READY_END - DRAIN_START))
echo "StatefulSet RTO: ${STATEFUL_RTO}s" | tee -a runbook.log

# Verify PVC reattached to new node
NEW_POSTGRES_NODE=$(kubectl get pod postgres-0 -o jsonpath='{.spec.nodeName}')
echo "postgres-0 rescheduled to: $NEW_POSTGRES_NODE" | tee -a runbook.log

# Verify data integrity
kubectl exec postgres-0 -- psql -U postgres -c \
  "SELECT * FROM upgrade_test;" | tee -a runbook.log
# Must show: pre-upgrade-data still present

05

Verify stateless PDB enforcement

# Check that PDB prevented stateless workload from dropping below minAvailable
# During drain, stateless-app should never have gone below 3 pods

# Check deployment health
kubectl get deployment stateless-app | tee -a runbook.log

# Review events for PDB blocking during drain
kubectl get events --field-selector reason=DisruptionAllowed \
  --sort-by='.lastTimestamp' | tee -a runbook.log

# Uncordon the node (simulate upgrade completion)
kubectl uncordon $POSTGRES_NODE
echo "Node uncordoned: $(date -u)" | tee -a runbook.log

06

Document findings and runbook gaps

# Generate the execution report
cat << 'EOF' >> runbook.log

=== EXECUTION SUMMARY ===
Drain duration:              ___s
StatefulSet RTO:             ___s  (target: <120s)
Stateless workload impact:   min available maintained? YES/NO
Data integrity:              Intact? YES/NO

=== GAPS IDENTIFIED ===
[ ] PVC detach time exceeded expected: expected 30s, actual ___s
[ ] PDB blocked drain longer than expected (record which PDB)
[ ] No alerting triggered for pod unavailability during drain
[ ] postgres-0 not in a separate zone from stateless workload
    (recommendation: use pod anti-affinity or dedicated node pool)

=== RUNBOOK IMPROVEMENTS ===
- Add PVC snapshot step before drain (currently manual)
- Add zone check: never drain all nodes in the same AZ simultaneously
- Set terminationGracePeriodSeconds on postgres pod to allow clean shutdown
- Consider separate node pool for stateful workloads to isolate drain blast radius
EOF

cat runbook.log

Success Criteria

Pre-drain checklist completed and all checks pass postgres-0 drains, reschedules, and reattaches PVC successfully Data written before drain is readable after rescheduling Stateless PDB maintains minAvailable throughout drain All timings recorded: drain duration, PVC attach time, StatefulSet RTO At least 3 runbook improvement items identified and documented

Key Concepts

ReadWriteOnce PVC constraint — EBS and Azure disk volumes can only be mounted on one node at a time; detach/attach adds 30-60s to StatefulSet RTO
StatefulSet ordering — Kubernetes honours pod ordering guarantees (pod N-1 must be ready before N starts) even during eviction
terminationGracePeriodSeconds — set long enough for your database to flush writes; default 30s may not be enough for large PostgreSQL checkpoints
Dedicated stateful node pool — isolating stateful workloads to a separate node pool lets you control drain order precisely