Objective
Platform engineering frequently involves scanning the cluster for risk patterns that policies alone can't prevent — like teams deploying with a single replica during off-hours. This exercise builds a standalone Python script using the kubernetes client library that identifies availability risk, groups results by namespace, and enriches the output with the owning team label. The script is the foundation for a recurring CronJob-based report or a Slack notification workflow.
Prerequisites
- Python 3.9+ installed
- kubectl configured with a valid kubeconfig (
~/.kube/config) - The target cluster has Deployments across multiple namespaces
- pip available for installing the kubernetes client
Steps
Install the Kubernetes Python client
# Install into a virtual environment (recommended) python3 -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install kubernetes # Verify python3 -c "import kubernetes; print(kubernetes.__version__)" ## 28.1.0
Create seed workloads for testing
# Create two test namespaces with single-replica Deployments kubectl create namespace team-alpha 2>/dev/null || true kubectl create namespace team-beta 2>/dev/null || true kubectl -n team-alpha create deployment api-server --image=nginx:1.25 --replicas=1 kubectl -n team-alpha label deployment api-server \ app.kubernetes.io/team=alpha-engineers kubectl -n team-beta create deployment worker --image=nginx:1.25 --replicas=1 kubectl -n team-beta label deployment worker \ app.kubernetes.io/team=beta-platform # Multi-replica deployment should NOT appear in the report kubectl -n team-alpha create deployment frontend --image=nginx:1.25 --replicas=3 kubectl -n team-alpha label deployment frontend \ app.kubernetes.io/team=alpha-engineers
Write the detection script
#!/usr/bin/env python3 """ Find all Deployments with replicas == 1 (single-point-of-failure). Groups output by namespace and annotates with owning team label. """ import sys from collections import defaultdict from kubernetes import client, config TEAM_LABEL = "app.kubernetes.io/team" SPOF_THRESHOLD = 2 # flag deployments with fewer than this many replicas def load_config(): try: config.load_incluster_config() # running inside a pod except config.ConfigException: config.load_kube_config() # local ~/.kube/config def get_single_replica_deployments(): v1 = client.AppsV1Api() # list_deployment_for_all_namespaces fetches across every namespace deployments = v1.list_deployment_for_all_namespaces(watch=False) findings = defaultdict(list) for deploy in deployments.items: ns = deploy.metadata.namespace name = deploy.metadata.name replicas = deploy.spec.replicas or 0 labels = deploy.metadata.labels or {} team = labels.get(TEAM_LABEL, "<unlabelled>") # Skip system namespaces if ns.startswith(("kube-", "kyverno", "cert-manager")): continue if replicas < SPOF_THRESHOLD: findings[ns].append({ "name": name, "replicas": replicas, "team": team, }) return findings def print_report(findings): if not findings: print("✓ No single-replica Deployments found.") return total = sum(len(v) for v in findings.values()) print(f"\n⚠ Single-Replica Deployment Report ({total} found)\n") print(f"{'NAMESPACE':<20} {'DEPLOYMENT':<30} {'REPLICAS':<10} {'TEAM'}") print("-" * 80) for ns in sorted(findings): for item in sorted(findings[ns], key=lambda x: x["name"]): print( f"{ns:<20} {item['name']:<30} {item['replicas']:<10} {item['team']}" ) print() # blank line between namespaces if __name__ == "__main__": load_config() findings = get_single_replica_deployments() print_report(findings) sys.exit(1 if findings else 0) # non-zero exit for CI integration
Run the script and review output
python3 find_single_replicas.py ## ⚠ Single-Replica Deployment Report (2 found) ## ## NAMESPACE DEPLOYMENT REPLICAS TEAM ## -------------------------------------------------------------------------------- ## team-alpha api-server 1 alpha-engineers ## ## team-beta worker 1 beta-platform ## # frontend (3 replicas) should NOT appear above # Exit code is 1 when violations found — useful in CI echo "Exit code: $?" ## Exit code: 1
Export as JSON for downstream consumption
# Add JSON output mode to the script # Replace print_report() call with this at the bottom of __main__: import json, sys if "--json" in sys.argv: output = [] for ns, items in findings.items(): for item in items: output.append({"namespace": ns, **item}) print(json.dumps(output, indent=2)) else: print_report(findings) # Run with JSON output python3 find_single_replicas.py --json ## [ ## { "namespace": "team-alpha", "name": "api-server", "replicas": 1, "team": "alpha-engineers" }, ## { "namespace": "team-beta", "name": "worker", "replicas": 1, "team": "beta-platform" } ## ] # Pipe to jq for further filtering (e.g. only unlabelled deployments) python3 find_single_replicas.py --json | jq '[.[] | select(.team == "<unlabelled>")]'
Package as a Kubernetes CronJob
To run this check automatically, build a container image containing the script and schedule it as a CronJob. The ServiceAccount needs read access to Deployments cluster-wide.
apiVersion: v1
kind: ServiceAccount
metadata:
name: spof-scanner
namespace: platform-ops
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: deployment-reader
rules:
- apiGroups: [apps]
resources: [deployments]
verbs: [get, list]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spof-scanner
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: deployment-reader
subjects:
- kind: ServiceAccount
name: spof-scanner
namespace: platform-opsapiVersion: batch/v1 kind: CronJob metadata: name: spof-scanner namespace: platform-ops spec: schedule: "0 8 * * 1-5" # weekdays at 08:00 jobTemplate: spec: template: spec: serviceAccountName: spof-scanner restartPolicy: OnFailure containers: - name: scanner image: your-registry/spof-scanner:latest command: [python3, /app/find_single_replicas.py, --json] resources: requests: { cpu: "50m", memory: "64Mi" } limits: { cpu: "100m", memory: "128Mi" }
Success Criteria
Further Reading
- Kubernetes Python client: github.com/kubernetes-client/python
- In-cluster config: kubernetes.io/docs/tasks/run-application/access-api-from-pod
- Kubernetes CronJob: kubernetes.io/docs/concepts/workloads/controllers/cron-jobs
- kubectl-who-can: quickly audit RBAC for a service account