Automated Canary Release with Flagger and Prometheus

Objective

Flagger automates progressive delivery by gradually shifting traffic to a canary deployment while monitoring real-time metrics. If the canary's error rate exceeds the threshold, Flagger rolls back automatically. This exercise wires Flagger to Prometheus, defines a custom metric template for 5xx rates, and demonstrates both a successful promotion and an automated rollback scenario.

Prerequisites

Kubernetes cluster with Istio or NGINX Ingress Controller installed
Prometheus + kube-prometheus-stack deployed
Helm installed
A test application that returns HTTP status codes you can control

Steps

Install Flagger with Prometheus support

# Add Flagger Helm repo
helm repo add flagger https://flagger.app
helm repo update

# Install Flagger with NGINX mesh provider
helm install flagger flagger/flagger \
  --namespace flagger-system \
  --create-namespace \
  --set meshProvider=nginx \
  --set metricsServer=http://prometheus-kube-prometheus-prometheus.monitoring:9090 \
  --wait

# Install Flagger's load tester for traffic generation
helm install flagger-loadtester flagger/loadtester \
  --namespace flagger-system

kubectl get pods -n flagger-system

Deploy the primary workload that Flagger will manage

# Create namespace with Flagger label for auto-injection
kubectl create namespace canary-demo

# Deploy the primary deployment (Flagger will duplicate this as canary)
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
  namespace: canary-demo
spec:
  replicas: 2
  selector:
    matchLabels: {app: podinfo}
  template:
    metadata:
      labels: {app: podinfo}
    spec:
      containers:
      - name: podinfo
        image: stefanprodan/podinfo:6.5.0
        ports:
        - containerPort: 9898
        readinessProbe:
          httpGet: {path: /readyz, port: 9898}
        resources:
          requests: {cpu: 50m, memory: 64Mi}
---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  namespace: canary-demo
spec:
  selector: {app: podinfo}
  ports:
  - port: 9898
    targetPort: 9898
EOF

Create a MetricTemplate for 5xx error rate

# metric-template-5xx.yaml
cat << 'EOF' | kubectl apply -f -
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: not-found-percentage
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus-kube-prometheus-prometheus.monitoring:9090
  query: |
    100 - sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          pod=~"{{ target }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)",
          status!~"5.."
        }[{{ interval }}]
      )
    ) / sum(
      rate(
        http_requests_total{
          namespace="{{ namespace }}",
          pod=~"{{ target }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)"
        }[{{ interval }}]
      )
    ) * 100
EOF

Create the Canary resource

The Canary resource tells Flagger how to shift traffic, what metrics to check, and when to roll back. Flagger will create a podinfo-canary deployment automatically.

# canary.yaml
cat << 'EOF' | kubectl apply -f -
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: canary-demo
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  service:
    port: 9898
    targetPort: 9898
  analysis:
    interval: 30s          # Check metrics every 30 seconds
    threshold: 5           # Max 5 failed checks before rollback
    maxWeight: 50          # Max 50% traffic to canary
    stepWeight: 10         # Increment by 10% each interval
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99            # 99% success rate required
      interval: 1m
    - name: not-found-percentage
      templateRef:
        name: not-found-percentage
        namespace: flagger-system
      thresholdRange:
        max: 5             # Max 5% 5xx error rate
      interval: 30s
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.flagger-system/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://podinfo.canary-demo:9898/"
EOF

Trigger a canary release

Update the deployment image to trigger Flagger's canary analysis. Flagger detects the change and starts incrementally routing traffic.

# Trigger canary by updating the image
kubectl set image deployment/podinfo \
  podinfo=stefanprodan/podinfo:6.5.1 \
  -n canary-demo

# Watch Flagger progress the canary
watch -n 5 'kubectl describe canary podinfo -n canary-demo | \
  grep -A20 "Status:"'

# Watch traffic weights shift
kubectl get canary podinfo -n canary-demo -w
# STATUS   WEIGHT   FAILEDCHECKS
# Progressing  10       0
# Progressing  20       0
# Progressing  30       0
# ... → Succeeded 0 0 (promoted to primary)

Simulate a bad release and trigger automatic rollback

The podinfo image supports a --faults flag that artificially returns 5xx errors. Update to a version that returns errors and watch Flagger roll back.

# Update to a "bad" version that returns errors
kubectl set image deployment/podinfo \
  podinfo=stefanprodan/podinfo:6.5.2 \
  -n canary-demo

# Immediately inject errors by patching the deployment
kubectl patch deployment podinfo -n canary-demo \
  --type=json \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/command",
        "value":["./podinfo","--level=error","--status-code=500"]}]'

# Watch Flagger detect failures and roll back
watch -n 3 'kubectl get canary podinfo -n canary-demo && \
  kubectl describe canary podinfo -n canary-demo | \
  grep -E "Failed|rollback|Canary weight"'

# Expected progression:
# Progressing  10    1
# Progressing  10    2
# Progressing  10    5  ← threshold reached
# Failed        0    5  ← rolled back!

Verify rollback and check events

# Verify rollback — primary should be back at the previous image
kubectl get deployment podinfo -n canary-demo \
  -o jsonpath='{.spec.template.spec.containers[0].image}'
# Expected: stefanprodan/podinfo:6.5.1 (the previously good version)

# View Flagger events for the full timeline
kubectl get events -n canary-demo \
  --field-selector involvedObject.name=podinfo \
  --sort-by='.lastTimestamp'

# Check Flagger logs for analysis details
kubectl logs -n flagger-system \
  -l app.kubernetes.io/name=flagger \
  --tail=50 | grep podinfo

Success Criteria

Flagger pod running and Canary CRD installed Canary resource initialises and Flagger creates podinfo-canary deployment Successful canary release (6.5.0 → 6.5.1) progresses from 10% to 100% and is promoted Bad release triggers failedChecks to increment Rollback occurs automatically when threshold (5 failed checks) is reached Primary deployment reverts to previous good image after rollback Flagger events show the full timeline including rollback reason