Objective
Platform engineers are often asked to review application team manifests before they reach production. This exercise trains you to read a Deployment YAML and spot reliability, security, and operability gaps — then communicate the findings in an advisory format that helps developers understand the why, not just the what.
Prerequisites
- kubectl configured against a cluster
- Familiarity with Deployment spec fields (probes, resources, strategy, security context)
- Optional: kube-score or kubeconform for automated scoring
Steps
Review the flawed "before" manifest
This Deployment represents a real pattern submitted by application teams that have not yet gone through platform onboarding. Count the issues before reading the advisory table.
## ── BEFORE: app-team submission (do not deploy to production) ── apiVersion: apps/v1 kind: Deployment metadata: name: payment-api namespace: default # ISSUE: deploying to default namespace spec: replicas: 1 # ISSUE: single replica = no HA selector: matchLabels: app: payment-api template: metadata: labels: app: payment-api spec: containers: - name: payment-api image: myregistry.io/payment-api:latest # ISSUE: mutable :latest tag ports: - containerPort: 8080 env: - name: DB_PASSWORD value: "superSecretPassword123" # ISSUE: plaintext secret in manifest - name: API_KEY value: "sk_live_abc123xyz" # ISSUE: plaintext secret in manifest # ISSUE: no resource requests or limits # ISSUE: no readiness probe — traffic sent before app is ready # ISSUE: no liveness probe — stuck pods never restarted # ISSUE: no securityContext — runs as root # ISSUE: no pod disruption budget # ISSUE: no update strategy configured (defaults to 25% maxUnavailable)
Score the manifest with kube-score (optional automated check)
# Install kube-score curl -L https://github.com/zegl/kube-score/releases/download/v1.18.0/kube-score_1.18.0_linux_amd64.tar.gz \ | tar xz kube-score chmod +x kube-score && mv kube-score /usr/local/bin/ # Save the before manifest and score it kubectl get deployment payment-api -o yaml > payment-api-before.yaml # if already applied # or save the before YAML above as payment-api-before.yaml kube-score score payment-api-before.yaml ## Expected output (excerpt): ## [CRITICAL] Container Security Context ## · payment-api → container securityContext.runAsNonRoot is not set ## [CRITICAL] Container Resources ## · payment-api → CPU limit is not set ## · payment-api → Memory limit is not set ## [CRITICAL] Pod Probes ## · payment-api → no readiness probe is configured ## [WARNING] Deployment Strategy ## · payment-api → maxSurge is not set
Advisory findings table
Document each finding with severity, impact, and recommendation before writing the fix. This is the format to use when communicating with application teams.
| Finding | Severity | Operational Impact | Recommendation |
|---|---|---|---|
| default namespace | MEDIUM | No RBAC isolation, no resource quotas, collides with other teams | Deploy to dedicated namespace with ResourceQuota and LimitRange |
| replicas: 1 | HIGH | Single point of failure. Node drain or pod restart = downtime | Set replicas: 3 with topologySpreadConstraints across zones |
| image: :latest | HIGH | Rollouts are not reproducible. Rollback impossible without registry tag history | Pin to immutable SHA digest or semver tag (e.g. v1.4.2) |
| Plaintext secrets in env | HIGH | Secrets visible in etcd, kubectl get pod -o json, kubectl describe, CI logs | Use secretKeyRef pointing to a Secret, or use External Secrets Operator |
| No resource requests | HIGH | BestEffort QoS — first to be evicted under node pressure. Scheduler cannot place optimally | Set requests and limits based on p95 observed usage (Goldilocks recommended) |
| No readiness probe | HIGH | Traffic is sent to the pod before the app finishes starting. 502s during rollouts | Add httpGet readiness probe on /healthz or /ready with initialDelaySeconds |
| No liveness probe | MEDIUM | Deadlocked pods are not restarted. Pod shows Running but stops serving requests | Add httpGet liveness probe with failureThreshold: 3 and periodSeconds: 15 |
| No securityContext | HIGH | Container runs as root (UID 0). Container escape grants root on host | Set runAsNonRoot: true, readOnlyRootFilesystem: true, drop ALL capabilities |
| No PodDisruptionBudget | MEDIUM | Node drain can evict all replicas simultaneously causing full outage | Create PDB with minAvailable: 1 or maxUnavailable: 1 |
| Default update strategy | LOW | Default maxUnavailable:25% can take 1 replica offline during rolling update | Set maxUnavailable: 0, maxSurge: 1 for zero-downtime rollout |
Apply the corrected "after" manifest
## ── AFTER: production-ready corrected manifest ── # First: create the namespace and secret kubectl create namespace payments kubectl create secret generic payment-api-secrets \ -n payments \ --from-literal=DB_PASSWORD='superSecretPassword123' \ --from-literal=API_KEY='sk_live_abc123xyz' # Apply the corrected Deployment cat << 'EOF' | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: payment-api namespace: payments # dedicated namespace labels: app: payment-api version: "1.4.2" spec: replicas: 3 # HA: 3 replicas across zones selector: matchLabels: app: payment-api strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # zero downtime rollout maxSurge: 1 template: metadata: labels: app: payment-api version: "1.4.2" spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: payment-api securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000 containers: - name: payment-api image: myregistry.io/payment-api:v1.4.2 # pinned tag ports: - containerPort: 8080 env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: payment-api-secrets # reference secret, not inline value key: DB_PASSWORD - name: API_KEY valueFrom: secretKeyRef: name: payment-api-secrets key: API_KEY resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "256Mi" readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 15 failureThreshold: 3 securityContext: readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: [ALL] volumeMounts: - name: tmp mountPath: /tmp # writable tmp if app needs it volumes: - name: tmp emptyDir: {} EOF # Create the PodDisruptionBudget cat << 'EOF' | kubectl apply -f - apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: payment-api-pdb namespace: payments spec: minAvailable: 2 selector: matchLabels: app: payment-api EOF
Validate the corrected manifest
# Check pods are spreading across zones kubectl get pods -n payments -o wide \ -l app=payment-api ## NAME READY STATUS NODE ## payment-api-7d9f4b8c6-4xkzp 1/1 Running node-az1 ## payment-api-7d9f4b8c6-9mnrq 1/1 Running node-az2 ## payment-api-7d9f4b8c6-vxpqr 1/1 Running node-az3 # Confirm QoS class is Burstable (requests != limits) kubectl get pod -n payments -l app=payment-api \ -o jsonpath='{.items[0].status.qosClass}' ## Burstable # Confirm the pod is running as non-root kubectl exec -n payments \ $(kubectl get pod -n payments -l app=payment-api -o name | head -1) \ -- id ## uid=1000 gid=0(root) groups=2000 # Re-score with kube-score to verify findings are resolved kubectl get deployment payment-api -n payments -o yaml \ | kube-score score - ## All checks passed (or only informational warnings remain) # Verify PDB is protecting the deployment kubectl get pdb -n payments ## NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS ## payment-api-pdb 2 N/A 1 # Simulate a rolling update (change image tag) kubectl set image deployment/payment-api \ payment-api=myregistry.io/payment-api:v1.4.3 \ -n payments # Watch zero-downtime rollout (maxUnavailable:0 means always 3 ready) kubectl rollout status deployment/payment-api -n payments ## Waiting for deployment "payment-api" rollout to finish: 1 out of 3 new replicas have been updated... ## Waiting for deployment "payment-api" rollout to finish: 2 out of 3 new replicas have been updated... ## deployment "payment-api" successfully rolled out
Automate advisory checks as a CI gate
Prevent future regressions by running automated policy checks in the CI pipeline. This converts advisory findings into enforced policy.
# Option 1: kube-score in CI (GitHub Actions) # .github/workflows/manifest-review.yaml - name: Score Kubernetes manifests run: | kube-score score deploy/*.yaml \ --ignore-test container-image-tag \ --output-format ci continue-on-error: false # Option 2: Kyverno policy to block at admission time # (combines several findings into cluster-enforced rules) cat << 'EOF' | kubectl apply -f - apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: deployment-best-practices spec: validationFailureAction: Enforce rules: - name: require-non-root match: any: - resources: kinds: [Pod] validate: message: "Containers must not run as root" pattern: spec: containers: - securityContext: runAsNonRoot: "true" - name: require-resource-requests match: any: - resources: kinds: [Pod] validate: message: "Resource requests must be set" pattern: spec: containers: - resources: requests: memory: "?*" cpu: "?*" EOF # Test: submitting the original bad manifest should now be rejected kubectl apply -f payment-api-before.yaml ## Error from server: admission webhook "validate.kyverno.svc" denied the request: ## require-non-root: Containers must not run as root ## require-resource-requests: Resource requests must be set
Success Criteria
Further Reading
- kube-score — github.com/zegl/kube-score
- Production best practices — learnk8s.io/production-best-practices
- Pod Security Standards — kubernetes.io/docs/concepts/security/pod-security-standards
- PodDisruptionBudget — kubernetes.io/docs/tasks/run-application/configure-pdb
- Secrets management — kubernetes.io/docs/concepts/configuration/secret