Right-Size Workloads with Goldilocks VPA — K8s Platform Engineering

Objective

Over-provisioning resource requests wastes cluster capacity (and money). Under-provisioning causes OOMKills and CPU throttling. Goldilocks wraps VPA in recommendation mode — it never changes pods automatically, only suggests optimal requests and limits based on observed usage. This exercise installs both tools, runs several workloads, and applies the recommendations to reduce wasted capacity.

In production, let workloads run for 24-48 hours of representative traffic before trusting recommendations. For this exercise, generate artificial load to accelerate the recommendation gathering.

Prerequisites

Kubernetes cluster with metrics-server installed
Helm installed
kubectl with port-forward capability

Steps

Install VPA (Vertical Pod Autoscaler)

# Install VPA CRDs and admission controller
helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm repo update

helm install vpa cowboysysop/vertical-pod-autoscaler \
  --namespace kube-system \
  --set "admissionController.enabled=true" \
  --wait

# Verify VPA is running
kubectl get pods -n kube-system | grep vpa

Install Goldilocks

# Install Goldilocks
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace \
  --set dashboard.enabled=true \
  --wait

kubectl get pods -n goldilocks

Enable Goldilocks on namespaces

Goldilocks watches namespaces with the goldilocks.fairwinds.com/enabled=true label and creates VPA objects in recommendation mode for every Deployment.

# Enable Goldilocks on the default namespace
kubectl label namespace default \
  goldilocks.fairwinds.com/enabled=true

# Enable on staging namespace too
kubectl create namespace staging --dry-run=client -o yaml | \
  kubectl apply -f -
kubectl label namespace staging \
  goldilocks.fairwinds.com/enabled=true

# Verify VPA objects are created for each Deployment
kubectl get vpa --all-namespaces
# Should see a VPA for each Deployment in labeled namespaces

Deploy intentionally over-provisioned workloads

Create workloads with inflated resource requests to simulate the typical state of a cluster that hasn't been right-sized.

# Deploy 3 over-provisioned workloads
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels: {app: api-service}
  template:
    metadata:
      labels: {app: api-service}
    spec:
      containers:
      - name: api
        image: nginx:alpine
        resources:
          requests: {cpu: "2", memory: "2Gi"}   # WAY over-provisioned
          limits:   {cpu: "4", memory: "4Gi"}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
  namespace: default
spec:
  replicas: 5
  selector:
    matchLabels: {app: worker}
  template:
    metadata:
      labels: {app: worker}
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "while true; do echo working; sleep 5; done"]
        resources:
          requests: {cpu: "500m", memory: "1Gi"}  # Over-provisioned
          limits:   {cpu: "1",    memory: "2Gi"}
EOF

Generate load to accelerate recommendations

# Generate CPU and memory load on the api-service pods
kubectl exec -n default \
  $(kubectl get pod -l app=api-service -o name | head -1) \
  -- sh -c "
    for i in \$(seq 1 10); do
      dd if=/dev/zero of=/dev/null bs=1M count=100 &
    done
    wait
  " &

# Repeat for 2-3 minutes to build VPA history
# In production: wait 24 hours for accurate recommendations
sleep 120

# Check if VPA has generated recommendations yet
kubectl describe vpa -n default api-service | grep -A20 "Recommendation"

Access the Goldilocks dashboard

# Port-forward to the Goldilocks dashboard
kubectl port-forward svc/goldilocks-dashboard \
  -n goldilocks 8080:80 &

# Open http://localhost:8080 in your browser
# The dashboard shows:
# - Current requests vs recommended requests per container
# - Three recommendation quality levels: Guaranteed, Burstable, BestEffort
# - Ready-to-apply YAML snippets for each container

# Get VPA recommendations via kubectl
kubectl get vpa api-service -n default -o yaml | \
  python3 -c "
import sys, yaml
data = yaml.safe_load(sys.stdin)
recs = data.get('status', {}).get('recommendation', {}).get('containerRecommendations', [])
for c in recs:
    print(f'Container: {c[\"containerName\"]}')
    print(f'  Target (recommended requests):  {c[\"target\"]}')
    print(f'  Lower bound (min safe):          {c[\"lowerBound\"]}')
    print(f'  Upper bound (max observed):      {c[\"upperBound\"]}')
    print()
"

Apply recommendations and measure improvement

# Record current resource consumption
kubectl top nodes
kubectl top pods --all-namespaces | sort -k4 -rn | head -20

# Get recommended values from VPA (example output)
# api-service: target cpu=25m, memory=32Mi (was 2000m, 2048Mi)
# worker:      target cpu=10m, memory=16Mi (was 500m, 1024Mi)

# Apply the recommendations
kubectl patch deployment api-service -n default --type=json -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/cpu","value":"25m"},
  {"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/memory","value":"32Mi"},
  {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/cpu","value":"500m"},
  {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"128Mi"}
]'

# Wait for rollout
kubectl rollout status deployment/api-service -n default

# Measure headroom improvement
kubectl describe nodes | grep -A10 "Allocated resources"
# Compare CPU/memory allocated before and after

Success Criteria

VPA and Goldilocks pods running in their respective namespaces VPA objects created for every Deployment in labeled namespaces Goldilocks dashboard accessible and shows recommendations VPA recommendations differ from the over-provisioned values Recommendations applied to at least 2 workloads without pod restarts failing kubectl top nodes shows reduced allocation percentage after applying recommendations