Python Script — Find Single-Replica Deployments

Objective

Platform engineering frequently involves scanning the cluster for risk patterns that policies alone can't prevent — like teams deploying with a single replica during off-hours. This exercise builds a standalone Python script using the kubernetes client library that identifies availability risk, groups results by namespace, and enriches the output with the owning team label. The script is the foundation for a recurring CronJob-based report or a Slack notification workflow.

Prerequisites

Python 3.9+ installed
kubectl configured with a valid kubeconfig (~/.kube/config)
The target cluster has Deployments across multiple namespaces
pip available for installing the kubernetes client

Steps

Install the Kubernetes Python client

# Install into a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install kubernetes

# Verify
python3 -c "import kubernetes; print(kubernetes.__version__)"
## 28.1.0

The kubernetes Python client version tracks Kubernetes releases. Version 28.x supports K8s 1.28 API groups. Always pin the version in requirements.txt for reproducibility.

Create seed workloads for testing

# Create two test namespaces with single-replica Deployments
kubectl create namespace team-alpha 2>/dev/null || true
kubectl create namespace team-beta  2>/dev/null || true

kubectl -n team-alpha create deployment api-server --image=nginx:1.25 --replicas=1
kubectl -n team-alpha label deployment api-server \
  app.kubernetes.io/team=alpha-engineers

kubectl -n team-beta create deployment worker --image=nginx:1.25 --replicas=1
kubectl -n team-beta label deployment worker \
  app.kubernetes.io/team=beta-platform

# Multi-replica deployment should NOT appear in the report
kubectl -n team-alpha create deployment frontend --image=nginx:1.25 --replicas=3
kubectl -n team-alpha label deployment frontend \
  app.kubernetes.io/team=alpha-engineers

Write the detection script

find_single_replicas.py

#!/usr/bin/env python3
"""
Find all Deployments with replicas == 1 (single-point-of-failure).
Groups output by namespace and annotates with owning team label.
"""
import sys
from collections import defaultdict
from kubernetes import client, config

TEAM_LABEL = "app.kubernetes.io/team"
SPOF_THRESHOLD = 2   # flag deployments with fewer than this many replicas

def load_config():
    try:
        config.load_incluster_config()   # running inside a pod
    except config.ConfigException:
        config.load_kube_config()        # local ~/.kube/config

def get_single_replica_deployments():
    v1 = client.AppsV1Api()
    # list_deployment_for_all_namespaces fetches across every namespace
    deployments = v1.list_deployment_for_all_namespaces(watch=False)

    findings = defaultdict(list)

    for deploy in deployments.items:
        ns   = deploy.metadata.namespace
        name = deploy.metadata.name
        replicas = deploy.spec.replicas or 0
        labels   = deploy.metadata.labels or {}
        team     = labels.get(TEAM_LABEL, "<unlabelled>")

        # Skip system namespaces
        if ns.startswith(("kube-", "kyverno", "cert-manager")):
            continue

        if replicas < SPOF_THRESHOLD:
            findings[ns].append({
                "name":     name,
                "replicas": replicas,
                "team":     team,
            })

    return findings

def print_report(findings):
    if not findings:
        print("✓ No single-replica Deployments found.")
        return

    total = sum(len(v) for v in findings.values())
    print(f"\n⚠  Single-Replica Deployment Report ({total} found)\n")
    print(f"{'NAMESPACE':<20} {'DEPLOYMENT':<30} {'REPLICAS':<10} {'TEAM'}")
    print("-" * 80)

    for ns in sorted(findings):
        for item in sorted(findings[ns], key=lambda x: x["name"]):
            print(
                f"{ns:<20} {item['name']:<30} {item['replicas']:<10} {item['team']}"
            )
        print()   # blank line between namespaces

if __name__ == "__main__":
    load_config()
    findings = get_single_replica_deployments()
    print_report(findings)
    sys.exit(1 if findings else 0)  # non-zero exit for CI integration

Run the script and review output

python3 find_single_replicas.py

## ⚠  Single-Replica Deployment Report (2 found)
##
## NAMESPACE            DEPLOYMENT                     REPLICAS   TEAM
## --------------------------------------------------------------------------------
## team-alpha           api-server                     1          alpha-engineers
##
## team-beta            worker                         1          beta-platform
##

# frontend (3 replicas) should NOT appear above

# Exit code is 1 when violations found — useful in CI
echo "Exit code: $?"
## Exit code: 1

Export as JSON for downstream consumption

# Add JSON output mode to the script
# Replace print_report() call with this at the bottom of __main__:

import json, sys

if "--json" in sys.argv:
    output = []
    for ns, items in findings.items():
        for item in items:
            output.append({"namespace": ns, **item})
    print(json.dumps(output, indent=2))
else:
    print_report(findings)

# Run with JSON output
python3 find_single_replicas.py --json
## [
##   { "namespace": "team-alpha", "name": "api-server", "replicas": 1, "team": "alpha-engineers" },
##   { "namespace": "team-beta",  "name": "worker",     "replicas": 1, "team": "beta-platform" }
## ]

# Pipe to jq for further filtering (e.g. only unlabelled deployments)
python3 find_single_replicas.py --json | jq '[.[] | select(.team == "<unlabelled>")]'

Package as a Kubernetes CronJob

To run this check automatically, build a container image containing the script and schedule it as a CronJob. The ServiceAccount needs read access to Deployments cluster-wide.

rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: spof-scanner
  namespace: platform-ops
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployment-reader
rules:
  - apiGroups: [apps]
    resources: [deployments]
    verbs: [get, list]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spof-scanner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: deployment-reader
subjects:
  - kind: ServiceAccount
    name: spof-scanner
    namespace: platform-ops

cronjob.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: spof-scanner
  namespace: platform-ops
spec:
  schedule: "0 8 * * 1-5"   # weekdays at 08:00
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: spof-scanner
          restartPolicy: OnFailure
          containers:
            - name: scanner
              image: your-registry/spof-scanner:latest
              command: [python3, /app/find_single_replicas.py, --json]
              resources:
                requests: { cpu: "50m", memory: "64Mi" }
                limits:   { cpu: "100m", memory: "128Mi" }

Success Criteria

Script runs without errors against the current kubeconfig context Only Deployments with replicas < 2 appear in the report (frontend with 3 replicas is absent) Team label is populated from app.kubernetes.io/team; unlabelled shows <unlabelled> Script exits with code 1 when violations exist, 0 when clean --json flag produces valid JSON array consumable by jq System namespaces (kube-system, kyverno, etc.) are excluded from the report