Application Deployment Best Practices Advisory

Objective

Platform engineers are often asked to review application team manifests before they reach production. This exercise trains you to read a Deployment YAML and spot reliability, security, and operability gaps — then communicate the findings in an advisory format that helps developers understand the why, not just the what.

Prerequisites

kubectl configured against a cluster
Familiarity with Deployment spec fields (probes, resources, strategy, security context)
Optional: kube-score or kubeconform for automated scoring

Steps

Review the flawed "before" manifest

This Deployment represents a real pattern submitted by application teams that have not yet gone through platform onboarding. Count the issues before reading the advisory table.

## ── BEFORE: app-team submission (do not deploy to production) ──
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
  namespace: default       # ISSUE: deploying to default namespace
spec:
  replicas: 1              # ISSUE: single replica = no HA
  selector:
    matchLabels:
      app: payment-api
  template:
    metadata:
      labels:
        app: payment-api
    spec:
      containers:
      - name: payment-api
        image: myregistry.io/payment-api:latest   # ISSUE: mutable :latest tag
        ports:
        - containerPort: 8080
        env:
        - name: DB_PASSWORD
          value: "superSecretPassword123"            # ISSUE: plaintext secret in manifest
        - name: API_KEY
          value: "sk_live_abc123xyz"                 # ISSUE: plaintext secret in manifest
        # ISSUE: no resource requests or limits
        # ISSUE: no readiness probe — traffic sent before app is ready
        # ISSUE: no liveness probe — stuck pods never restarted
        # ISSUE: no securityContext — runs as root
  # ISSUE: no pod disruption budget
  # ISSUE: no update strategy configured (defaults to 25% maxUnavailable)

Score the manifest with kube-score (optional automated check)

# Install kube-score
curl -L https://github.com/zegl/kube-score/releases/download/v1.18.0/kube-score_1.18.0_linux_amd64.tar.gz \
  | tar xz kube-score
chmod +x kube-score && mv kube-score /usr/local/bin/

# Save the before manifest and score it
kubectl get deployment payment-api -o yaml > payment-api-before.yaml  # if already applied
# or save the before YAML above as payment-api-before.yaml

kube-score score payment-api-before.yaml

## Expected output (excerpt):
## [CRITICAL] Container Security Context
##   · payment-api → container securityContext.runAsNonRoot is not set
## [CRITICAL] Container Resources
##   · payment-api → CPU limit is not set
##   · payment-api → Memory limit is not set
## [CRITICAL] Pod Probes
##   · payment-api → no readiness probe is configured
## [WARNING] Deployment Strategy
##   · payment-api → maxSurge is not set

Advisory findings table

Document each finding with severity, impact, and recommendation before writing the fix. This is the format to use when communicating with application teams.

Finding	Severity	Operational Impact	Recommendation
default namespace	MEDIUM	No RBAC isolation, no resource quotas, collides with other teams	Deploy to dedicated namespace with ResourceQuota and LimitRange
replicas: 1	HIGH	Single point of failure. Node drain or pod restart = downtime	Set replicas: 3 with topologySpreadConstraints across zones
image: :latest	HIGH	Rollouts are not reproducible. Rollback impossible without registry tag history	Pin to immutable SHA digest or semver tag (e.g. v1.4.2)
Plaintext secrets in env	HIGH	Secrets visible in etcd, kubectl get pod -o json, kubectl describe, CI logs	Use secretKeyRef pointing to a Secret, or use External Secrets Operator
No resource requests	HIGH	BestEffort QoS — first to be evicted under node pressure. Scheduler cannot place optimally	Set requests and limits based on p95 observed usage (Goldilocks recommended)
No readiness probe	HIGH	Traffic is sent to the pod before the app finishes starting. 502s during rollouts	Add httpGet readiness probe on /healthz or /ready with initialDelaySeconds
No liveness probe	MEDIUM	Deadlocked pods are not restarted. Pod shows Running but stops serving requests	Add httpGet liveness probe with failureThreshold: 3 and periodSeconds: 15
No securityContext	HIGH	Container runs as root (UID 0). Container escape grants root on host	Set runAsNonRoot: true, readOnlyRootFilesystem: true, drop ALL capabilities
No PodDisruptionBudget	MEDIUM	Node drain can evict all replicas simultaneously causing full outage	Create PDB with minAvailable: 1 or maxUnavailable: 1
Default update strategy	LOW	Default maxUnavailable:25% can take 1 replica offline during rolling update	Set maxUnavailable: 0, maxSurge: 1 for zero-downtime rollout

Apply the corrected "after" manifest

## ── AFTER: production-ready corrected manifest ──
# First: create the namespace and secret
kubectl create namespace payments

kubectl create secret generic payment-api-secrets \
  -n payments \
  --from-literal=DB_PASSWORD='superSecretPassword123' \
  --from-literal=API_KEY='sk_live_abc123xyz'

# Apply the corrected Deployment
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
  namespace: payments            # dedicated namespace
  labels:
    app: payment-api
    version: "1.4.2"
spec:
  replicas: 3                    # HA: 3 replicas across zones
  selector:
    matchLabels:
      app: payment-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0           # zero downtime rollout
      maxSurge: 1
  template:
    metadata:
      labels:
        app: payment-api
        version: "1.4.2"
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: payment-api
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
      - name: payment-api
        image: myregistry.io/payment-api:v1.4.2   # pinned tag
        ports:
        - containerPort: 8080
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: payment-api-secrets   # reference secret, not inline value
              key: DB_PASSWORD
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: payment-api-secrets
              key: API_KEY
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "256Mi"
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 15
          failureThreshold: 3
        securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop: [ALL]
        volumeMounts:
        - name: tmp
          mountPath: /tmp              # writable tmp if app needs it
      volumes:
      - name: tmp
        emptyDir: {}
EOF

# Create the PodDisruptionBudget
cat << 'EOF' | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-api-pdb
  namespace: payments
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: payment-api
EOF

Validate the corrected manifest

# Check pods are spreading across zones
kubectl get pods -n payments -o wide \
  -l app=payment-api

## NAME                           READY   STATUS    NODE       
## payment-api-7d9f4b8c6-4xkzp   1/1     Running   node-az1   
## payment-api-7d9f4b8c6-9mnrq   1/1     Running   node-az2   
## payment-api-7d9f4b8c6-vxpqr   1/1     Running   node-az3   

# Confirm QoS class is Burstable (requests != limits)
kubectl get pod -n payments -l app=payment-api \
  -o jsonpath='{.items[0].status.qosClass}'
## Burstable

# Confirm the pod is running as non-root
kubectl exec -n payments \
  $(kubectl get pod -n payments -l app=payment-api -o name | head -1) \
  -- id
## uid=1000 gid=0(root) groups=2000

# Re-score with kube-score to verify findings are resolved
kubectl get deployment payment-api -n payments -o yaml \
  | kube-score score -
## All checks passed (or only informational warnings remain)

# Verify PDB is protecting the deployment
kubectl get pdb -n payments
## NAME              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS
## payment-api-pdb   2               N/A               1

# Simulate a rolling update (change image tag)
kubectl set image deployment/payment-api \
  payment-api=myregistry.io/payment-api:v1.4.3 \
  -n payments

# Watch zero-downtime rollout (maxUnavailable:0 means always 3 ready)
kubectl rollout status deployment/payment-api -n payments
## Waiting for deployment "payment-api" rollout to finish: 1 out of 3 new replicas have been updated...
## Waiting for deployment "payment-api" rollout to finish: 2 out of 3 new replicas have been updated...
## deployment "payment-api" successfully rolled out

In a real advisory workflow, the corrected manifest and findings table are delivered as a pull request comment or a platform team review document. The application team fixes their own manifest — you provide the knowledge transfer, not just the fix.

Automate advisory checks as a CI gate

Prevent future regressions by running automated policy checks in the CI pipeline. This converts advisory findings into enforced policy.

# Option 1: kube-score in CI (GitHub Actions)
# .github/workflows/manifest-review.yaml
- name: Score Kubernetes manifests
  run: |
    kube-score score deploy/*.yaml \
      --ignore-test container-image-tag \
      --output-format ci
  continue-on-error: false

# Option 2: Kyverno policy to block at admission time
# (combines several findings into cluster-enforced rules)
cat << 'EOF' | kubectl apply -f -
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: deployment-best-practices
spec:
  validationFailureAction: Enforce
  rules:
  - name: require-non-root
    match:
      any:
      - resources:
          kinds: [Pod]
    validate:
      message: "Containers must not run as root"
      pattern:
        spec:
          containers:
          - securityContext:
              runAsNonRoot: "true"
  - name: require-resource-requests
    match:
      any:
      - resources:
          kinds: [Pod]
    validate:
      message: "Resource requests must be set"
      pattern:
        spec:
          containers:
          - resources:
              requests:
                memory: "?*"
                cpu: "?*"
EOF

# Test: submitting the original bad manifest should now be rejected
kubectl apply -f payment-api-before.yaml
## Error from server: admission webhook "validate.kyverno.svc" denied the request:
## require-non-root: Containers must not run as root
## require-resource-requests: Resource requests must be set

Success Criteria

Identified all 10 findings in the before manifest without reading the advisory table first Corrected manifest deploys with 3 pods spread across zones DB_PASSWORD and API_KEY are sourced from a Secret, not hardcoded in the Deployment Container runs as UID 1000 with readOnlyRootFilesystem and all capabilities dropped Rolling update completes with zero pods unavailable at any point PDB prevents draining the last two replicas simultaneously Kyverno policy blocks re-submission of the non-compliant manifest